The seminar will be held over Microsoft Teams and to simplify matters, I would like to ask everyone interested in the seminar series to register here: https://forms.gle/C7S1Vcvw8gAVexiP6
|Tizian Schulz, Bielefeld University|
|Searching for Local Similarity in Pangenome Graphs|
|December 7, 2020, 4:00pm CET|
|Click here to join the meeting|
Increasing amounts of individual genomes sequenced per species motivate the usage of pangenomic approaches. Pangenomes may be represented as graphical structures, e.g. compacted colored de Bruijn graphs, which offer a low memory usage and facilitate reference-free sequence comparisons. While sequence-to-graph mapping to graphical pangenomes has been studied for quite some time, no local alignment search tool in the vein of BLAST has been proposed yet. Here we present a new heuristic method to find all maximum scoring local alignments of a DNA query sequence to a pangenome represented as a compacted colored de Bruijn graph. In addition to a mere comparison of query and pangenome sequences, it also allows a comparison of similarity among sequences within the pangenome. We show that local alignment scores follow an exponential-tail distribution similar to BLAST scores, and we discuss how to estimate its parameters to separate local alignments representing sequence homology from spurious findings.
Furthermore, we introduce the notions of quorum and search color set allowing to concentrate searches on specific parts of the pangenome without reconstructing the graph. An implementation of our method is presented and its performance and usability are shown. Our approach scales sublinearly in running time and memory usage with respect to the number of genomes under consideration. Thus, it is superior to classical methods that do not make use of sequence similarity within the pangenome.
Tizian Schulz is a PhD student in the working group Genome Informatics at Bielefeld University, headed by Jens Stoye. He is the student representative speaker of the graduate school DILS (Digital Infrastructure for the Life Sciences) at the Bielefeld Institute for Bioinformatics Infrastructure. His research interests lie in the field of sequence analysis and comparative genomics with a major focus on computational pangenomics. Prior to joining DILS, he was an affiliated member of the international research training group Computational Methods for the Diversity and Dynamics of Genomes and took an half year internship at the Vancouver Prostate Centre hosted by Faraz Hach. Large parts of the results presented in this talk were achieved during this time. He received his Master and Bachelor degrees in Bioinformatics and Genome Research at Bielefeld University in 2014 and 2016, respectively.