Note: This page will be updated every week as the course progresses.



Algoritmi e Linguaggi per Bioinformatica: Algoritmi

Here you can get back to the main page of the course.


Projects:

Final exam (either 2-hour version or 3-hour version) on Tuesday, 5 June 2012, in class.
The exam will take place in Aula D. Be there at 13.00.
Here is a list of things you should know.

New: Final exam results here.


There is now a projects webpage with descriptions of the projects.


Note: This schedule is supposed to give information about what we covered. You still need the notes from the lecture; if you cannot attend, please get in touch with someone who does. Here I try to give the corresponding chapters in the two books Setubal & Meidanis (S&M) and Jones & Pevzner (J&P). For more books, see the main page of the course.

There are very nice lecture notes on Sequence Analysis by Jens Stoye and others from Bielefeld University. They are much too detailed for our purposes, but I do use some chapters. I refer to them below by BielefeldLN.

I once (or twice) gave a one-week intensive course on algorithms for biology masters students at the University of the Western Cape, Cape Town, South Africa. The CT course notes are short and easy to understand. All of it is useful for you except the part on exact matching (sec. 3.3 to 3.6). (But if you are interested, read that part, too. It is explained in a very simple and straightforward manner.)

For the phylogenetics part, I use mostly the Bielefeld Phylogenetics Lecture Notes: BielefeldPhylo.

Schedule

No. Date Topic Chapters Other materials (supporting material, suggested reading)
1+2 Tue+Wed, 6+7/3/2012 Overview; Pairwise sequence alignment 1: Def. of alignment, exhaustive search, DP algorithm of Needleman-Wunsch; Formalism on strings S&M 3.1, 3.2.1; J&P 6.4, 6.6 Section 3.1 of the CT course notes (formalism on strings). Section 2.4 of the BielefeldLN.
3 Tue, 13/3/2012 Pairwise sequence alignment 2: Analysis of the DP algorithm of Needleman-Wunsch;
Introduction to algorithm analysis, Big-Oh notation
S&M 2.3, J&P 2.8 Please read section 4 (except 4.3) of these CT course notes. Section 2 could also be useful for non-computer scientists (a very basic explanation of algorithms). Here is also the "dirty trick version" of how to identify O-classes. Section 2.3 of the BielefeldLN.
4 Wed, 14/3/2012 Pairwise sequence alignment 3: Local alignment (Smith-Waterman); other variants (sketch);
PAM scoring matrices 1: motivation
S&M 3.2.2, J&P 6.8
5 Tue, 20/3/2012
PAM scoring matrices 2: how to compute PAM1, how to compute PAMk, a bit of probability, examples S&M 3.5.1, J&P 6.7 Here is a nice set of slides by David Gilbert, Univ. of Glasgow, on scoring matrices.
You can also read the original article on PAM matrices (more exactly, book chapter), by Margret Dayhoff and others. It is very understandable!
See also this nice little article by Sean Eddy about the BLOSUM matrices (S. Eddy, Where did the BLOSUM62 alignment score matrix come from?, Nature Biotechnology 22, 1035 - 1036, 2004).
6 Wed, 21/3/2012
Heuristics for sequence alignment 1: Dotplots, q-grams, FAST/FASTA
S&M 3.5.3, J&P 9.6 Sec. 6.1 and 6.4.1 of the BielefeldLN.
See also these nice slides on heuristic similarity search by Stuart M. Brown, NYU School of Medicine, New York.
7 Tue, 27/3/2012 Heuristics for sequence alignment 2: FAST/FASTA, BLAST; S&M 3.5.2, J&P 9.6 Sec. 6.4.2 of the BielefeldLN.
Here is the orginal article on BLAST: S.F. Altschul et al. Basic Local Alignment Search Tool, J. Mol. Biol. 215: 403-410 (1990).
8 Wed, 28/3/2012 String similarity and distance 1: string similarity measures: formal definition of alignment, sim(s,t); percent similarity; longest common subsequence (LCS) S&M 3.6.1, J&P 6.5
9 Tue, 3/4/2012 String similarity and distance 2: string distance measures: metrics; Hamming distance, edit distance, q-gram distance; relationship between edit distance and optimal alignment score S&M 3.6.1, J&P 6.4 Here is the original article on (unit-cost) edit distance (Levenshtein distance): Vladimir Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Soviet Physics-Doklady, vol. 10, no. 8, 1966. And here is the one on the q-gram distance: Esko Ukkonen, Approximate string matching with q-grams and maximal matches, Theor. Comp. Sc. 92 (1992), pp. 191-211 (beginning of Section 2).
10 Wed, 4/4/2012 Summary of first half: Pairwise sequence alignment (global, local; brute-force algorithm, DP-algorithms), Algorithm analysis, Scoring schemes: PAM matrices, Heuristics for sequence alignment (FASTA, BLAST, dotplots), String similarity and distance
11 Wed, 11/4/2012 Introduction to phylogenetics: History of systematics, what is a phylogenetic tree/phylogeny, different types of phylogenetic trees, different types of input BielefeldPhylo, ch. 1, 2.3
12 Tue, 24/4/2012 Midterm exam (about lectures 1-10) Midterm exam results here.
13+14 Tue+Wed, 8+9/5/2012 Introduction to graphs and trees;
Discussion of midterm exam; discussion of projects
BielefeldPhylo, ch. 2;
List of projects here.
15+16 Tue+Wed, 15+16/5/2012 More on trees: number of phylogenetic trees; distance-based data: ultrametrics, algorithm UPGMA S&M 6.5, J&P 10.6,10.8 BielefeldPhylo, ch. 6.1 and 6.2, Durbin-book ch. 7.3, Felsenstein-book ch. 11 (pp. 161-166).
17 Tue, 22/5/2012 Character-based data: Perfect Phylogeny; Small Parsimony: Fitch' Algorithm S&M 6.1,6.4, J&P 10.9, 10.10 BielefeldPhylo, ch. 3.1, 3.2., 4.1, 4.2, Durbin-book ch. 7.4.

(If you want to read about NP-completeness, here is a chapter I once wrote: Chapter on NP-completeness. It is not part of this course! NP-completeness is out of scope of this course. This is only for whoever is interested.)
18 Wed, 23/5/2012 Large Parsimony: exhaustive search, greedy sequential algorithm, Nearest Neighbour Interchange J&P 10.11 Felsenstein-book, ch. 4
18+19 Tue + Wed, 30+31/5/2012 Some very basic statistics: descriptive statistics vs. inferential statistics, sample vs. population; mean, median, percentiles, quartiles, variance, standard deviation; what is hypothesis testing, errors type I and type II, p-value A. Aczel and J. Sounderpandian: Complete business statistics, 2008, M. Bramati: Calcolo delle probabilityà e statistica - Teoria ed esercizi, 1997.
20 Tue, 5/6/2012 Second written exam (about lectures 11 and 13-19)
or: Final written exam (about the whole course)
Final exam results here.
21 Wed, 6/6/2012 Discussion of projects
22-24 Wed, Thu, Fri, 11-12-13/7/2012 Project presentations Here is the complete program of the project presentations.