ALGORITHMS FOR DE NOVO GENOME ASSEMBLY FROM THIRD GENERATION SEQUENCING DATA

Sović, Ivan

izvor podataka: crosbi !

ALGORITHMS FOR DE NOVO GENOME ASSEMBLY FROM THIRD GENERATION SEQUENCING DATA (CROSBI ID 408651)

Ocjenski rad | doktorska disertacija

Sović, Ivan ALGORITHMS FOR DE NOVO GENOME ASSEMBLY FROM THIRD GENERATION SEQUENCING DATA / Šikić, Mile (mentor); Zagreb, Fakultet elektrotehnike i računarstva, . 2016

Podaci o odgovornosti

Autori

Sović, Ivan

Mentori

Šikić, Mile

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

ALGORITHMS FOR DE NOVO GENOME ASSEMBLY FROM THIRD GENERATION SEQUENCING DATA

Sažetak

During the past ten years, genome sequencing has been an extremely hot and active topic, with an especial momentum happening right now. New, exciting and more affordable technologies have been released, requiring the rapid development of new algorithmic methods to cope with the data. Affordable commercial availability of the sequencing technology and algorithmic methods which can leverage the data could open doors to a vast number of very important applications, such as diagnosis and treatment of chronic diseases through personalized medicine or identification of pathogenic microorganisms from soil, water, food or tissue samples. Sequencing the entire genome of an organism is a difficult problem, because all sequencing technologies to date have limitations on the length of the molecule that they can read (much smaller than the genomes of a vast majority of organisms). In order to obtain the sequence of an entire genome, reads need to be either stitched together (assembled) in a de novo fashion when the genome of the organism is unknown in advance, or mapped and aligned to the reference genome if one exists (reference assembly or mapping). The main problem in both approaches stems from the repeating regions in the genomes which, if longer than the reads, prevent complete assembly of the genome. The need for technologies that would produce longer reads which could solve the problem of repeating regions has resulted in the advent of new sequencing approaches – the so-called third generation sequencing technologies which currently include two representatives: Pacific Biosciences (PacBio) and Oxford Nanopore. Both technologies are characterized, aside from long reads, by high error rates which existing assembly algorithms of the time were not capable of handling. This caused the development of time- consuming read error correction methods which were applied as a pre- processing step prior to assembly. Instead, the focus of the work conducted in the scope of this thesis is to develop novel methods for de novo DNA assembly from third generation sequencing data, which provide enough sensitivity and precision to completely omit the error- correction phase. Strong focus is put on nanopore data. In the scope of this thesis, four new methods were developed: (I) NanoMark - an evaluation framework for comparison of assembly methods from nanopore sequencing data ; (II) GraphMap - a fast and sensitive mapper for long error- prone reads ; (III) Owler - a sensitive overlapper for third generation sequencing ; and (IV) Racon - a rapid consensus module for correcting raw assemblies. Owler and Racon were used as modules in the development of a novel de novo genome assembler Aracon. The results show that Aracon reduces the overall assembly time by at least 3x and up to even an order of magnitude less compared to the state-of-the-art methods, while retaining comparable or better quality of assembly.

Ključne riječi

de novo ; assembly ; PacBio ; nanopore ; NanoMark ; GraphMap ; Racon ; Aracon

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Broj stranica

185

Datum obrane

04.10.2016.

Status objave rada

obranjeno

Podaci o ustanovi koja je dodijelila akademski stupanj

Ustanova / Organizacija

Fakultet elektrotehnike i računarstva

Mjesto

Zagreb

Povezanost rada

Povezane osobe

Ivan Sović (autor/i)

Mile Šikić (mentor/i)

Povezane ustanove

Fakultet elektrotehnike i računarstva (036) (autorova ustanova)

Područje

Računarstvo