Projects for the course "Algorithms for Computational Biology" (I semester of the academic year 2015/2016)
Masters in Molecular and Medical Biotechnology
Here you can get back to the main page of the course.
Organisation:
If you want the credits of this course to replace 6 credits of the future Fundamental Algorithms in Bioinformatics course, then you need to do an additional assignment. This consists of preparing a presentation on one of the topics below.
Topics:
The chapter or page numbers refer to the Setubal and Meidanis book (S&M), the Bielefeld Lecture Notes on Sequence Analysis (BielSeqAn), the Cormen et al. (Cormen), the Gusfield (Gusfield), or the Jones and Pevzner book (J&P). See the main page of the course for full details on books. I can give you copies of all this material. You can also use other material you find, but please check with me. (The material could either be not sufficiently deep, or too advanced.)
When you have decided on a project, please come and see me, or write me
an email. If you have questions about the topics, or want to know more before deciding, please come and see me, or write me an email.
List of topics:
- pairwise alignment in linear space: S&M 3.3.1, BielSeqAn 12.2
A technique which allows to compute in linear space not only the score of an optimal alignment but also the alignment itself. (Emanuele)
- general gap functions: S&M 3.3.2
In the lecture, we only looked at alignment scoring schemes where k gaps are always scored the same, whether they occurred one by one, or they were together in a block. Here we look at how to deal with gap functions that allow for differently scoring gaps that occur together in a block. The alignment algorithm has to be adjusted, and it runs now in O(n^3) time.
- affine gap functions: S&M 3.3.3
This is a special group of functions, where the first gap has a fixed cost x (i.e. high negative score), but then every following gap adds the same cost y to the previous gap. (I.e. 4 gaps have score x+3y.) These can be dealt with much better than general gap functions, namely in quadratic time O(n^2). (Franco)
- Carillo-Lipman space reduction for multiple sequence alignment: S&M p. 73-76 (ch. 3.4.1 second part), BielSeqAn 14.2
The huge space requirement of the DP algorithm for MSA reduced. (Matteo)
- q-gram distance: BielSeqAn 3.7
A different way of computing the distance between two strings: For every q-gram (string of length q), we count how many times it occurs in s and how many times in t, and build the absolute difference. The sum of these, for all q-grams, is the q-gram distance of s and t.
(Valentina)
- progressive alignment heuristic for MSA: Biel 16.1
Another heuristic for multiple sequence alignment (like star alignment). Given a guide tree (usually a phylogenetic tree) which tells us how the sequences are related, the MSA is computed along the tree, by progressively aligning more and more sequences. (Samuele)
- Divide-and-conquer for multiple sequence alignment: Biel 14.4
A heuristic for sum-of-pairs MSA. (Giuseppe)
- Karp-Rabin algorithm for exact string matching: Cormen 34.2.
A very nice exact matching algorithm which uses modular arithmetics for exact matching of strings. (Luca)
- Burrows-Wheeler-Transform: BielSeqAn ch. 11
The BWT allows storing a long string in such a way that it can be compressed well (i.e. can be stored in a way that it takes less space than the length of the text), but at the same time it allows fast searching (i.e. exact matching). (Andrei)
- RNA secondary structure prediction: BielSeqAn ch. 8
Nussinov algorithm: a DP algorithm for predicting how an RNA sequence will fold. (Ana)
Here you can get back to the main page of the course.