Algoritmi e Linguaggi per Bioinformatica: Algoritmi (2013/2014)

Exam preparation (midterm exam).

The exam will take place on Wed 30 April, 2014, in Aula F, from 8.15 to 10.15. Please bring along a document of identification. Please arrive a bit ahead of time and make sure you have identification (with photo) with you.

Part I
Formal alphabet, strings, prefixes, suffixes, substrings, subsequences; matrix, factorial, binomial coefficient, logarithms, sums
Pairwise sequence alignment Practical: Compute the score of a given alignment, for given scoring scheme; compute sim(s,t) using the DP algorithm of Needleman-Wunsch; compute an optimal alignment/all optimal alignments using the DP-table; the connection between the DP-table and alignments: be able to give the alignment represented by a path in the table and vice versa; find all optimal local alignments using the DP-algorithm of Smith-Waterman; semiglobal alignment (algorithm)
Theoretical: Give the definition of the cell D(i,j) for global and for local alignment. Give the running times and space requirements of these algorithms. Why are these preferable to the brute-force solution? What are affine gap penalties? For which type of problem is which version of the algorithm appropriate?
Multiple sequence alignment Theoretical: Explain and analyse the DP algorithm. Is it feasible? Which heuristics do you know? What is the SP-score?
Practical: Score a multiple alignment with the SP-score. Heuristics: star alignment, progressive alignment, apply these to a small example.
Algorithm analysis Theoretical: What are the two parameters of algorithms we analyse? What do O-classes measure? With respect to what? Which classes are feasible (manageable) and which aren't? Why? What are heuristics, why are they used?
Practical: Put in order of O-classes a given set of functions (slowest growing first, fastest growing last). Compare two functions, which grows faster? Which is preferable for an algorithm's running time/space consumption? Say of certain functions whether they are polynomial, linear, quadratic, cubic, exponential, superexponential.
Scoring matrices Theoretical: Explain how the PAM scoring matrices are computed. Explain the biological motivation. What is the underlying idea? What data are used? What does the number k mean in PAMk? What do the entries represent? Interpret their values. Why do we use a "log-odds" matrix? What is the main difference between PAM and BLOSUM matrices?
Practical: Use a given PAMk or BLUSUM-k matrix (to be supplied in the exam) to score an alignment.