|Title of course:||Bioinformatics Algorithms
(Module 2 of Fundamental Algorithms for Bioinformatics)
zsuzsanna DOT liptak AT univr DOT it
(Please put "Course Bioinformatics Algorithms" in the subject line.)
|Course times:||Mon 11.30 - 13.30 (aula G)
Mon 14.30 - 16.30 (aula C) - from 8/5/17
Thu 8.30 - 10.30 (aula G)
||Thu 11.00-13.00 and by appointment|
|Office:||Ca' Vignal, 2, 1st floor, right corridor, office 1.79|
GOALS of the course: To learn about some of the basic problems and algorithms behind common bioinformatics applications (sequence alignment, sequence similarity, sequence assembly, phylogenetics).
CREDITS: 12 CFU, together with module 1 of this course (Algorithm Design). This module contributes half of the grade. It is gained via a written exam, followed by oral exam. You are admitted to the oral exam only if you passed the written exam.
If you are taking this course as a student of Molecular and Medical Biotechnology (LM9) ("mutuazione Algorithms for Computational Biology"), then you will get 6 CFU (credits) for this course, and will have a different exam to take. However, it still holds that you have to take a written and an oral exam.
ATTENDANCE: As in most university courses, attendance of classes is not mandatory. All of what I teach in this course is completely standard and is contained in most algorithmically oriented bioinformatics books (see e.g. list on this page). In the exam, you will be asked to do things like compute an alignment of two strings, given a scoring function; or prove the running time of Fitch's algorithm. You can learn this from a book or good online course. However, I believe that one gains much more from attending a course than from only studying by oneself. University classes give the student the opportunity of following a live course given by a live lecturer. If nothing else, attending the course forces you to spend a certain amount of time each week with this course. No handouts or slide presentations can completely substitute a lecturer. So if you can make it, I would advise you to attend. If you can't, you have to rely on the notes of your colleagues and/or on books.
MATERIAL: If there are slides, I hand them out and put them up on the webpage. However, even if there are slides, we always do more in class than what is on the slides: examples, missing proofs, additional material, illustrations, applications, etc. So you will always also need your class notes (or someone else's).
HOMEWORK: I regularly give you homework, and I expect that you do it. If you do not, you will not gain the experience in solving exercises that you need for the exam.
Here is an overview of the topics that will be covered. The topics may vary slightly.
Disclaimer: Here I list the topics which we treated in the lectures, and put up slides if we have any, as well as pointers to the respective chapters in books, and possibly other material of interest. However, you will always need the class notes, too! (even if there were slides, as well)
When I include corrections of slides, this refers to the handed out version. The slides which you can download here have already been corrected. If you find any mistakes, please let me know!
|No.||Date||Topic||Slides (if any)||Homework (if any)||Supporting material or chapters in books,
|1||Thu, 9/3/2017||Organisation of course. Introduction: Strings and sequences in Biology and in Computer Science. Pairwise alignment, exhaustive search for optimal alignment.||Organisation (printable),
Strings and Sequences in Biology (printable),
Strings and Sequences in Computer Science (printable)
||Correction of handout slides: "Strings and Sequences in Biology": p. 2: the characters in protein sequences are amino acids|
|2||Mo, 13/3/2017||Number of alignments: recursion, explicit exponential lower bound. Needleman-Wunsch DP-algorithm for global alignment, analysis.||Pairwise Alignment (printable)||This and the following lectures follow pretty closely chapter 3 of the Setubal & Meidanis book.
Corrections of handout slides: p. 3, 4: the score of the third alignment is -4, not -2. On p. 28, it should read "Re 1" (not "Re 2")
|3||Thu, 15/3/2017||Backtracing without and with traceback pointers. Space-saving variant of NW-algorithm. Local alignment: Smith-Waterman algorithm.||same as before||Setubal & Meidanis, ch. 3.2.1|
|4||Mo, 20/3/2017||Local alignment: backtracing. Solution to second homework. Semiglobal alignment. Affine gap functions.||Pairwise Alignment 2 (printable)||Setubal & Meidanis, ch. 3.2.2 and 3.2.3|
|5||Thu, 23/3/2017||Affine gap functions, cont'ed. Computing an optimal pairwise alignment in linear space.||same as before||Setubal & Meidanis, ch. 3.3.3, Ohlebusch ch. 3.1.6|
|6||Thu, 30/3/2017||Computing an optimal pairwise alignment in linear space, cont'ed. The forward-backward technique.||Pairwise Alignment 3 (printable)||In the lecture, we computed an optimal al. for s=GAAGA and t=CACA, where for the matrix M, the priority left-diag-top was used. Redo the computation for priority top-diag-left.||Setubal & Meidanis, ch. 3.3.1; Bielefeld Lecture Notes on Sequence Analysis (see above), ch. 5.3 and 5.4
Correction of handout slides: p. 9: ... they all need linear space (not time), and total space is O(n+m) (not: O(m)) because we need the space to store the alignment itself, too.
|7||Mo, 3/4/2017||String distance measures: metric, edit distance, computing the edit distance, Connection between edit distance and alignment score, LCS-distance, Hamming distance.||String Distance (printable)||
||Setubal & Meidanis ch. 3.6.1, Ohlebusch ch. 8.1.5|
|8||Thu, 6/4/2017||String distance 2: The q-gram distance.||The q-gram distance (printable)||Bielefeld Lecture Notes on Sequence Analysis (see above), ch. 3.7
Correction of first version of slides: p. 5: |s| >= q (not <=q); p. 9: dist_q(s,t) (not: dist_1(s,t)); p. 10: proof slightly updated; p. 11: added examples; p. 19: total time O(n + m + sigma^q) (not: O(n + m sigma^q)
|9||Mo, 10/4/2017||De Bruijn graphs.||see slides for lecture 21|
|10||Thu, 20/4/2017||Database search: BLAST.||BLAST (summary only!) (printable)||Here is the orginal article on BLAST: S.F. Altschul et al. Basic Local Alignment Search Tool, J. Mol. Biol. 215: 403-410 (1990); and the one on BLAST2: S.F. Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Research, Vol. 25, No. 17: 3389-3402 (1997).
There is a very detailed section on BLAST in the book by Mount (full ref. see above), within chapter 7, p. 300 onwards. BLAST is also described in Chapter 6.3 of the BielSeqAn Lecture Notes (see above).
There are excellent help pages on the NCBI Blast page, among them on how to submit a request and how to read the results (version of Sept. 2016, always get the latest one!).
|11||Thu, 27/4/2017||BLAST cont'ed. Scoring matrices 1.||same as before|
|12||Thu, 4/5/2017||Scoring matrices: PAM, BLOSUM.||Scoring Matrices (printable)||Setubal & Meidanis, ch. 3.5.1
Here are two PAM mutation matrices, and two PAM scoring matrices (the ones handed out in the lecture). You can read the original article on PAM matrices (more exactly, book chapter), by Margret Dayhoff et al, 1978. It is very understandably written! Here is the BLOSUM62 matrix. See also this nice little article by Sean Eddy about the BLOSUM matrices (S. Eddy, Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22, pp. 1035-1036, 2004).
Correction of handout slides: "Scoring Matrices": p. 7, bullet 3: Prob(b changes into a in 2 steps) and not: in k steps
|13 & 14||Mo, 8/5/2017 (a.m. & p.m.)||Multiple Sequence Alignment (MSA) 1 & 2: definitions, DP solution, Carillo-Lipman search space reduction, star alignment as a heuristic (similarity scores)||Setubal & Meidanis, ch. 3.4: DP-algorithm (3.4.1), Carillo-Lipman (p. 73-76), star alignment for similarity score (ch. 3.4.2); Ohlebusch, ch. 8.2 (Carillo-Lipman: 8.2.1)|
|15||Thu, 11/5/2017||MSA 3: progressive alignment||Bielefeld Lecture Notes on Sequence Analysis: Appendix H.1, Ohlebusch ch. 8.2.3|
|16||Mo, 15/5/2017 (a.m.)||MSA 4: star alignment as an approximation (for distance functions with triangle inequality)||Exercises on MSA.||Ohlebusch, ch. 8.2.2: star alignment|
|17||Mo, 15/5/2017 (p.m.)||Sequence Assembly I: Overlap graphs, Shortest Common Superstrings, Greedy Algorithm||Fragment Assembly 1 (printable)||Setubal & Meidanis, ch. 4.3|
|18||Mo, 22/5/2017 (a.m.)||Stéphane Vialette (CNRS, Univ. Paris-Est): Algorithms for RNA-folding 1||slides||The Nussinov algorithm for RNA folding is also described in the Setubal & Meidanis book (ch. 8.1). And in the Bielefeld Lecture Notes on Sequence Analysis, Appendix E.|
|19||Mo, 22/5/2017 (p.m.)||Stéphane Vialette (CNRS, Univ. Paris-Est): Flexible RNA design under structure and sequence constraints using formal languages||Seminario del Dipartimento di Informatica, abstract see here. The slides for the talk can be found here. This lecture is not part of the exam.|
|20||Thu, 25/5/2017||Stéphane Vialette (CNRS, Univ. Paris-Est): Algorithms for RNA-folding 2||same as for lecture 18|
|21||Mo, 29/5/2017||Sequence Assembly II: de Bruijn graphs, Euler tours, SBH, NGS reads.||Fragment Assembly 2 (printable)||See this article: How to apply de Bruijn graphs to genome assembly by Phillip E.C. Compeau, Pavel A. Pevzner, and Glenn Tesler, Nature Biotechnology, vol. 29 no. 11 (Nov. 2011); there is also supplementary material for this article.|
|22||Thu, 1/6/2017||More details and exercises on Sequence Assembly and RNA folding.||See here for an example for the Nussinov-algorithm (by Ana Damatar).|
|23||Mo, 5/6/2017 a.m.||Travis Gagie (Diego Portales Univ., Santiago de Chile): Quick introduction to BWT||slides were distributed in class|
|24||Mo, 5/6/2017 p.m.||Travis Gagie (Diego Portales Univ., Santiago de Chile): Fast locating with the runlength-encoded BWT||See here for an abstract.|
|25||Tue, 6/6/2017||Travis Gagie (Diego Portales Univ., Santiago de Chile): Compressed coloured de Bruijn graphs||Compressed colored de Bruijn graphs|
|26||Thu, 8/6/2017||Summary of course: pairwise sequence alignment incl. special variants, string distance measures, multiple sequence alignment.|
|27||Fri, 9/6/2017||Summary of course (cont.): MSA (cont.), scoring matrices, BLAST, RNA folding, sequence assembly (overlap graphs, de Bruijn graphs)||See here for a study list (and here for the version for MMBiotech students).|