Fundamental Algorithms for Bioinformatics: Bioinformatics Algorithms
(academic year 2017/2018, second term)

Masters in Medical Bioinformatics

Title of course:Bioinformatics Algorithms
(Module 2 of Fundamental Algorithms for Bioinformatics)
Lecturer:Zsuzsanna Lipták
email: zsuzsanna DOT liptak AT univr DOT it  
Course times: Thu 11.30 - 13.30 (aula I) (starts at 11:45)
Fri 11.30 - 14.30 (aula C) (ends at 14:15)
Student hours:
Wed 10.00-12.00 and by appointment
Office:Ca' Vignal, 2, 1st floor, right corridor, office 1.79



News:


GOALS of the course: To learn about some of the basic problems and algorithms behind common bioinformatics applications (sequence alignment, sequence similarity, sequence assembly, phylogenetics).

CREDITS: 12 CFU, together with module 1 of this course (Algorithm Design). This module contributes half of the grade. It is gained via a written exam, followed by oral exam. You are admitted to the oral exam only if you passed the written exam.

If you are taking this course as a student of Molecular and Medical Biotechnology (LM9) ("mutuazione Algorithms for Computational Biology"), then you will get 6 CFU (credits) for this course, and will have a different exam to take. However, it still holds that you have to take a written and an oral exam.

SYLLABUS:
Here is an overview of the topics that will be covered. The topics may vary slightly.


SUGGESTED BOOKS:


Lectures

Disclaimer: Here I list the topics which we covered in the lectures, and put up slides if there are some, as well as pointers to the respective chapters in books, and possibly other material of interest. However, you will always need the class notes, too! (even if there are slides)

When I include corrections of slides, this refers to the handed out version. The slides which you can download here have already been corrected. If you find any mistakes, please let me know!

No. Date Topic Slides (if any) Supporting material or chapters in books, comments
1 Thu, 1/3/2018 Organisation of course. Introduction: Strings and sequences in Biology and in Computer Science. Organisation (printable),
Strings and Sequences in Biology (printable),
Strings and Sequences in Computer Science (printable)
 
2 Thu, 8/3/2018 Pairwise alignment I: Number of alignments: recursion, explicit exponential lower bound, exhaustive search. Needleman-Wunsch DP-algorithm for global alignment, analysis. Pairwise Alignment (printable, part 1) This and the following lectures follow pretty closely chapter 3 of the Setubal & Meidanis book. (The Ohlebusch book and the Bielefeld Lecture Notes on Sequence Analysis both use distance functions instead of similariy functions for their presentation of alignment algorithms.)
Number of alignments: Bielefeld Lecture Notes on Sequence Analysis (see above), appendix B.
3 Fri, 9/3/2018 Pairwise alignment II: Backtracing without and with traceback pointers. Space-saving variant of NW-algorithm. Local alignment: Smith-Waterman algorithm. same as before; (printable, part 2) global al.: Setubal & Meidanis, ch. 3.2.1,
local al.: ch. 3.2.2
4 Thu, 15/3/2018 Pairwise Alignment III: Semiglobal alignment. Affine gap functions in quadratic time and space. Pairwise Alignment 2 (printable, part 1, printable, part 2) semiglobal al.: Setubal & Meidanis, ch. 3.2.3
Correction of handout slides: on p.3 the score of the right alignment is -1 (not -3)
5 Fri, 16/3/2018 Pairwise Alignment IV: Affine gap functions, cont'ed (example). Computing an optimal pairwise alignment in linear space via Divide-and-Conquer. Pairwise Alignment 3 (printable) affine gap functions: Setubal & Meidanis, ch. 3.3.3
6 Thu, 22/3/2018 Pairw. alignment IV cont'ed: Computing an optimal pairwise alignment in linear space, cont'ed.
String distances I: edit distance.
same as before D&C algorithm for optimal al. in linear space: Bielefeld Lecture Notes on Sequence Analysis, ch. 5.4.;
(a different algorithm for optimal al. in linear space: Setubal & Meidanis, ch. 3.3.1, and Ohlebusch ch. 8.1.2)
String distances: Bielefeld Lecture Notes Seq. Analysis, ch. 3
7 Fri, 23/3/2018 String distances II: metric, DP algorithm for computing the edit distance, Connection between edit distance and alignment score, LCS-distance, Hamming distance.  String distances (printable) distance vs. similarity: Setubal & Meidanis ch. 3.6.1, Ohlebusch ch. 8.1.4 and 8.1.5
8 Thu, 5/4/2018 String distances III: The q-gram distance. The q-gram distance (printable) q-gram distance: Bielefeld Lecture Notes on Sequence Analysis, ch. 3.7
9 Fri, 6/4/2018 De Bruijn sequences and de Bruijn graphs I: de Bruijn sequences over different alphabets, existence of de Bruijn sequences.   see class notes and slides for Lecture 22 (2) and the Compeau et al. article there
10 Thu, 12/4/2018 De Bruijn sequences and de Bruijn graphs II: (full) de Bruijn graphs, construction of de Bruijn sequences via Euler tours.   see class notes and slides for Lecture 22 (2) and the Compeau et al. article there
10a Thu, 12/4/2018 Extra lecture: Computational Efficiency I. Computational Efficiency 1 (printable)  
11 Fri, 13/4/2018 De Bruijn sequences and de Bruijn graphs III: de Bruijn subgraphs and sequences with q-gram distance 0.   Bielefeld Lecture Notes on Sequence Analysis ch. 3.7
12 Fri, 20/4/2018 Scoring matrices: PAM, BLOSUM. Scoring Matrices (printable) Setubal & Meidanis, ch. 3.5.1
Here are two PAM mutation matrices, and two PAM scoring matrices (the ones handed out in the lecture). You can read the original article on PAM matrices (more exactly, book chapter), by Margret Dayhoff et al, 1978. It is very understandably written! Here is the BLOSUM62 matrix. See also this nice little article by Sean Eddy about the BLOSUM matrices (S. Eddy, Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22, pp. 1035-1036, 2004).
12a Fri, 20/4/2018 Extra lecture: Computational Efficiency II. Computational Efficiency 2 (printable)  
13 Thu, 3/5/2018 Multiple Sequence Alignment (MSA) I: definitions, projections, scoring, sum-of-pairs score, theorem about optimal projections and optimal MSAs.   Setubal & Meidanis, ch. 3.4.1, Ohlebusch, ch. 8.2
14 Fri, 4/5/2018 MSA II: DP-algorithm, Star Alignment Algorithm   DP-algorithm: Setubal & Meidanis, ch. 3.4.1, Bielefeld Lecture Notes on Sequence Analysis: ch. 11.1, Ohlebusch 8.2
star alignment for similarity score: Setubal & Meidanis, ch. 3.4.2
15 Thu, 10/5/2018 MSA III: star alignment as a 2-approximation algorithm (for edit distance, and in general, for distance functions with triangle inequality)   star alignment as a 2-approximation: Ohlebusch, ch. 8.2.2, Bielefeld Lecture Notes on Sequence Analysis, ch. 11.3
16 Fri, 11/5/2018 MSA IV: progressive alignment   Bielefeld Lecture Notes on Sequence Analysis: Appendix H.1, Ohlebusch ch. 8.2.3
17 Thu, 17/5/2018 Jens Stoye (Univ. Bielefeld): Phylogenetics I: Introduction. Character-based methods. Definitions, perfect phylogeny (PP), Gusfield's algorithm for binary characters. Small Parsimony: Fitch' algorithm.   Bielefeld Phylogenetics Lecture Notes, ch. 2 (Introduction), ch. 3 (characters and states), ch. 4 (Perfect Phylogeny, Gusfield's algorithm), ch. 5.2 (Fitch' algorithm)
18 Fri, 18/5/2018 Jens Stoye (Univ. Bielefeld): Phylogenetics II: distance-based methods. UPGMA, Neighbor Joining (NJ).   Bielefeld Phylogenetics Lecture Notes, ch. 7 (7.1, 7.3.1 and 7.3.4)
19 Thu, 24/5/2018 Phylogenetics III: Number of rooted and unrooted trees. More on distance based methods: ultrametric matrices, 3-point condition, and rooted trees; additive metrics, 4-point conditon, and unrooted trees. Phylogenetics 1 (printable) same as previous lecture
20 Fri, 25/5/2018 Phylogenetics IV: More on character-based methods: homoplasies and the definitions of compatibility and perfect phylogeny (PP); parsimony problem: Large Parsimony and Small Parsimony; characterization of PP via number of state-changes. Phylogenetics 2 (printable) same as lecture 17
21 Thu, 31/5/2018 Phylogenetics IV cont'ed: Algorithms for Large Parsimony: Greedy Sequential Addition Algorithm, Branch-and-Bound Algorithm, Nearest Neighbor Interchange. same as before Large Parsimony (=Maximum Parsimony): Bielefeld Phylogenetics Lecture Notes, ch. 6.3 (Greedy Sequential Addition, NNI); for the branch-and-bound algorithm that we did, see here an excerpt from the Felsenstein book (pages 61 to 64).
Correction of handout slides: Note that we did three algorithms for Large Parsimony (and not two, as it said on the handout slides!)
22 Fri, 1/6/2018 Two more applications of q-grams:
(1) Database search with BLAST.
(2) Sequence Assembly with de Bruijn graphs.
BLAST (summary only!) (printable)

Fragment Assembly with de Bruijn graphs (printable)
(1) Here is the orginal article on BLAST: S.F. Altschul et al. Basic Local Alignment Search Tool, J. Mol. Biol. 215: 403-410 (1990); and the one on BLAST2: S.F. Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Research, Vol. 25, No. 17: 3389-3402 (1997). There is a very detailed section on BLAST in the book by Mount (full ref. see above), within chapter 7, p. 300 onwards. BLAST is also described in Chapter 6.3 of the BielSeqAn Lecture Notes (see above).
There are excellent help pages on the NCBI Blast page, among them on how to submit a request and how to read the results (version of Sept. 2016, always get the latest one!).

(2) See this article: How to apply de Bruijn graphs to genome assembly by Phillip E.C. Compeau, Pavel A. Pevzner, and Glenn Tesler, Nature Biotechnology, vol. 29 no. 11 (Nov. 2011); there is also supplementary material for this article.
23 Fri, 8/6/2018 Summary of course.   Study list for students of Medical Bioinformatics and for students of Molecular and Medical Biotechnology.