Title of course:  Bioinformatics Algorithms (Module 2 of Fundamental Algorithms for Bioinformatics) 
Lecturer:  Zsuzsanna Lipták 
email: 
zsuzsanna DOT liptak AT univr DOT it (Please put "Course Bioinformatics Algorithms" in the subject line.) 
Course times:  Mon 11.30  13.30 (aula G) Mon 14.30  16.30 (aula C)  from 8/5/17 Thu 8.30  10.30 (aula G) 
Student hours: 
Thu 11.0013.00 and by appointment 
Office:  Ca' Vignal, 2, 1st floor, right corridor, office 1.79 
News:
GOALS of the course: To learn about some of the basic problems and algorithms behind common bioinformatics applications (sequence alignment, sequence similarity, sequence assembly, phylogenetics).
CREDITS: 12 CFU, together with module 1 of this course (Algorithm Design). This module contributes half of the grade. It is gained via a written exam, followed by oral exam. You are admitted to the oral exam only if you passed the written exam.
If you are taking this course as a student of Molecular and Medical Biotechnology (LM9) ("mutuazione Algorithms for Computational Biology"), then you will get 6 CFU (credits) for this course, and will have a different exam to take. However, it still holds that you have to take a written and an oral exam.
ATTENDANCE: As in most university courses, attendance of classes is not mandatory. All of what I teach in this course is completely standard and is contained in most algorithmically oriented bioinformatics books (see e.g. list on this page). In the exam, you will be asked to do things like compute an alignment of two strings, given a scoring function; or prove the running time of Fitch's algorithm. You can learn this from a book or good online course. However, I believe that one gains much more from attending a course than from only studying by oneself. University classes give the student the opportunity of following a live course given by a live lecturer. If nothing else, attending the course forces you to spend a certain amount of time each week with this course. No handouts or slide presentations can completely substitute a lecturer. So if you can make it, I would advise you to attend. If you can't, you have to rely on the notes of your colleagues and/or on books.
MATERIAL: If there are slides, I hand them out and put them up on the webpage. However, even if there are slides, we always do more in class than what is on the slides: examples, missing proofs, additional material, illustrations, applications, etc. So you will always also need your class notes (or someone else's).
HOMEWORK: I regularly give you homework, and I expect that you do it. If you do not, you will not gain the experience in solving exercises that you need for the exam.
SYLLABUS:
Here is an overview of the topics that will be covered. The topics may vary slightly.
SUGGESTED BOOKS:
Disclaimer: Here I list the topics which we treated in the lectures, and put up slides if we have any, as well as pointers to the respective chapters in books, and possibly other material of interest. However, you will always need the class notes, too! (even if there were slides, as well)
When I include corrections of slides, this refers to the handed out version. The slides which you can download here have already been corrected. If you find any mistakes, please let me know!
No.  Date  Topic  Slides (if any)  Homework (if any)  Supporting material or chapters in books, comments 

1  Thu, 9/3/2017  Organisation of course. Introduction: Strings and sequences in Biology and in Computer Science. Pairwise alignment, exhaustive search for optimal alignment.  Organisation (printable), Strings and Sequences in Biology (printable), Strings and Sequences in Computer Science (printable) 

Correction of handout slides: "Strings and Sequences in Biology": p. 2: the characters in protein sequences are amino acids 
2  Mo, 13/3/2017  Number of alignments: recursion, explicit exponential lower bound. NeedlemanWunsch DPalgorithm for global alignment, analysis.  Pairwise Alignment (printable)  This and the following lectures follow pretty closely chapter 3 of the Setubal & Meidanis book. Corrections of handout slides: p. 3, 4: the score of the third alignment is 4, not 2. On p. 28, it should read "Re 1" (not "Re 2") 

3  Thu, 16/3/2017  Backtracing without and with traceback pointers. Spacesaving variant of NWalgorithm. Local alignment: SmithWaterman algorithm.  same as before  Setubal & Meidanis, ch. 3.2.1  
4  Mo, 20/3/2017  Local alignment: backtracing. Solution to second homework. Semiglobal alignment. Affine gap functions.  Pairwise Alignment 2 (printable)  Setubal & Meidanis, ch. 3.2.2 and 3.2.3  
5  Thu, 23/3/2017  Affine gap functions, cont'ed. Computing an optimal pairwise alignment in linear space.  same as before  Setubal & Meidanis, ch. 3.3.3, Ohlebusch ch. 3.1.6  
6  Thu, 30/3/2017  Computing an optimal pairwise alignment in linear space, cont'ed. The forwardbackward technique.  Pairwise Alignment 3 (printable)  In the lecture, we computed an optimal al. for s=GAAGA and t=CACA, where for the matrix M, the priority leftdiagtop was used. Redo the computation for priority topdiagleft.  Setubal & Meidanis, ch. 3.3.1; Bielefeld Lecture Notes on Sequence Analysis (see above), ch. 5.3 and 5.4
Correction of handout slides: p. 9: ... they all need linear space (not time), and total space is O(n+m) (not: O(m)) because we need the space to store the alignment itself, too. 
7  Mo, 3/4/2017  String distance measures: metric, edit distance, computing the edit distance, Connection between edit distance and alignment score, LCSdistance, Hamming distance.  String Distance (printable) 

Setubal & Meidanis ch. 3.6.1, Ohlebusch ch. 8.1.5 
8  Thu, 6/4/2017  String distance 2: The qgram distance.  The qgram distance (printable)  Bielefeld Lecture Notes on Sequence Analysis (see above), ch. 3.7
Correction of first version of slides: p. 5: s >= q (not <=q); p. 9: dist_q(s,t) (not: dist_1(s,t)); p. 10: proof slightly updated; p. 11: added examples; p. 19: total time O(n + m + sigma^q) (not: O(n + m sigma^q) 

9  Mo, 10/4/2017  De Bruijn graphs.  see slides for lecture 21  
10  Thu, 20/4/2017  Database search: BLAST.  BLAST (summary only!) (printable)  Here is the orginal article on BLAST: S.F. Altschul et al. Basic Local Alignment Search Tool, J. Mol. Biol. 215: 403410 (1990); and the one on BLAST2: S.F. Altschul et al. Gapped BLAST and PSIBLAST: a new generation of protein database search programs, Nucleic Acids Research, Vol. 25, No. 17: 33893402 (1997). There is a very detailed section on BLAST in the book by Mount (full ref. see above), within chapter 7, p. 300 onwards. BLAST is also described in Chapter 6.3 of the BielSeqAn Lecture Notes (see above). There are excellent help pages on the NCBI Blast page, among them on how to submit a request and how to read the results (version of Sept. 2016, always get the latest one!). 

11  Thu, 27/4/2017  BLAST cont'ed. Scoring matrices 1.  same as before  
12  Thu, 4/5/2017  Scoring matrices: PAM, BLOSUM.  Scoring Matrices (printable)  Setubal & Meidanis, ch. 3.5.1 Here are two PAM mutation matrices, and two PAM scoring matrices (the ones handed out in the lecture). You can read the original article on PAM matrices (more exactly, book chapter), by Margret Dayhoff et al, 1978. It is very understandably written! Here is the BLOSUM62 matrix. See also this nice little article by Sean Eddy about the BLOSUM matrices (S. Eddy, Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22, pp. 10351036, 2004). Correction of handout slides: "Scoring Matrices": p. 7, bullet 3: Prob(b changes into a in 2 steps) and not: in k steps 

13 & 14  Mo, 8/5/2017 (a.m. & p.m.)  Multiple Sequence Alignment (MSA) 1 & 2: definitions, DP solution, CarilloLipman search space reduction, star alignment as a heuristic (similarity scores)  Setubal & Meidanis, ch. 3.4: DPalgorithm (3.4.1), CarilloLipman (p. 7376), star alignment for similarity score (ch. 3.4.2); Ohlebusch, ch. 8.2 (CarilloLipman: 8.2.1)  
15  Thu, 11/5/2017  MSA 3: progressive alignment  Bielefeld Lecture Notes on Sequence Analysis: Appendix H.1, Ohlebusch ch. 8.2.3  
16  Mo, 15/5/2017 (a.m.)  MSA 4: star alignment as an approximation (for distance functions with triangle inequality)  Exercises on MSA.  Ohlebusch, ch. 8.2.2: star alignment  
17  Mo, 15/5/2017 (p.m.)  Sequence Assembly I: Overlap graphs, Shortest Common Superstrings, Greedy Algorithm  Fragment Assembly 1 (printable)  Setubal & Meidanis, ch. 4.3  
18  Mo, 22/5/2017 (a.m.)  Stéphane Vialette (CNRS, Univ. ParisEst): Algorithms for RNAfolding 1  slides  The Nussinov algorithm for RNA folding is also described in the Setubal & Meidanis book (ch. 8.1). And in the Bielefeld Lecture Notes on Sequence Analysis, Appendix E.  
19  Mo, 22/5/2017 (p.m.)  Stéphane Vialette (CNRS, Univ. ParisEst): Flexible RNA design under structure and sequence constraints using formal languages  Seminario del Dipartimento di Informatica, abstract see here. The slides for the talk can be found here. This lecture is not part of the exam.  
20  Thu, 25/5/2017  Stéphane Vialette (CNRS, Univ. ParisEst): Algorithms for RNAfolding 2  same as for lecture 18  
21  Mo, 29/5/2017  Sequence Assembly II: de Bruijn graphs, Euler tours, SBH, NGS reads.  Fragment Assembly 2 (printable)  See this article: How to apply de Bruijn graphs to genome assembly by Phillip E.C. Compeau, Pavel A. Pevzner, and Glenn Tesler, Nature Biotechnology, vol. 29 no. 11 (Nov. 2011); there is also supplementary material for this article.  
22  Thu, 1/6/2017  More details and exercises on Sequence Assembly and RNA folding.  See here for an example for the Nussinovalgorithm (by Ana Damatar).  
23  Mo, 5/6/2017 a.m.  Travis Gagie (Diego Portales Univ., Santiago de Chile): Quick introduction to BWT  slides were distributed in class  
24  Mo, 5/6/2017 p.m.  Travis Gagie (Diego Portales Univ., Santiago de Chile): Fast locating with the runlengthencoded BWT  See here for an abstract.  
25  Tue, 6/6/2017  Travis Gagie (Diego Portales Univ., Santiago de Chile): Compressed coloured de Bruijn graphs  Compressed colored de Bruijn graphs  
26  Thu, 8/6/2017  Summary of course (part 1): pairwise sequence alignment incl. special variants, string distance measures, multiple sequence alignment.  
27  Fri, 9/6/2017  Summary of course (part 2): MSA (cont.), scoring matrices, BLAST, RNA folding, sequence assembly (overlap graphs, de Bruijn graphs)  See here for a study list (and here for the version for MMBiotech students). 