Algoritmi e Linguaggi per Bioinformatica: Algoritmi

Here you can get back to the main page of the course.

Algoritmi e Linguaggi per Bioinformatica: Algoritmi

Exam preparation (second exam/complete exam).

New: The exam will take place on Mon 3 June, 2013, in Aula B, from 9.00 to 11.30/12.30. Please bring along a document of identification.

Part I
Formal	alphabet, strings, prefixes, suffixes, substrings, subsequences; matrix, factorial, binomial coefficient
Pairwise sequence alignment	Practical: Compute the score of a given alignment, for given scoring scheme; compute sim(s,t) using the DP algorithm of Needleman-Wunsch; compute an optimal alignment/all optimal alignments using the DP-table; the connection between the DP-table and alignments: be able to give the alignment represented by a path in the table and vice versa; find all optimal local alignments using the DP-algorithm of Smith-Waterman; semiglobal alignment (algorithm) Theoretical: Give the definition of the cell D(i,j) for global and for local alignment. Give the running times and space requirements of these algorithms. Why are these preferable to the brute-force solution? What are affine gap penalties?
Multiple sequence alignment	Theoretical: Explain and analyse the DP algorithm. Is it feasible? Which heuristics do you know? Practical: Score a multiple alignment with the SP-score. Heuristics: star alignment, progressive alignment, apply these to a small example.
Algorithm analysis	Theoretical: What are the two parameters of algorithms we analyse? What do O-classes measure? With respect to what? Which classes are feasible (manageable) and which aren't? Why? Practical: Put in order of O-classes a given set of functions (slowest growing first, fastest growing last). Compare two functions, which grows faster? Which is preferable for an algorithm's running time/space consumption?
Scoring matrices	Theoretical: Explain how the PAM scoring matrices are computed. Explain the biological motivation. What is the underlying idea? What data are used? What does the number k mean in PAMk? What do the entries represent? Interpret their values. Why do we use a "log-odds" matrix? What is the main difference between PAM and BLOSUM matrices? Practical: Use a given PAMk or BLUSUM-k matrix (to be supplied in the exam) to score an alignment.
Heuristics for sequence alignment (included in second exam)	Theoretical: What is a heuristic? Explain the underlying ideas of FASTA and BLAST. What is the advantage over the DP-algorithms? What is the primary application? Why are heuristics used and not the DP-algorithms? Practical: Compute a dotplot for two sequences (simple and filtered). Interpret a given dotplot. Compute common q-grams (hotspots) as does FASTA. Find high-scoring q-grams (seeds) as does BLAST (for given scoring scheme).
Part II
Graphs, trees, phylogenetic trees	Theoretical: How many ways are there to root a tree? How many edges does a tree on n nodes have? What is a phylogenetic tree? How many phylogenetic trees are there on a given set of taxa? What are the types of input data? Practical: Check whether a given graph is a tree. Draw a graph. List a given graph's edges. Use terms like edge, simple path, cycle, labelled, degree, connected. In trees: leaf, root, rooted/unrooted, depth, parent, sibling, child, ancestor, descendant. Check whether two trees are isomorphic, i.e. "the same" (very small examples). Root unrooted trees.
Distance based data	Theoretical: Explain the aim of distance based phylogenetic reconstruction. What is given (input), what are we looking for (output)? Explain the running time of UPGMA. Def. of ultrametric. What does "molecular clock" mean? Practical: Check whether a given distance matrix is ultrametric. Apply the algorithm UPGMA to a distance matrix.
Character based data	Theoretical: Def. of compatibility (of a character with a tree). Def. of Perfect Phylogeny. Does is always exist? Define/explain Small Parsimony and Large Parsimony. How can these problems be solved? What algorithms have we seen? How efficient are they? Practical: Check whether a phylogenetic tree is a PP for a matrix M. Compute the parsimony score of a given labelled tree. Apply Fitch's algorithm. Apply Nearest Neighbor Interchange.

New: The exam will take place on Mon 3 June, 2013, in Aula B, from 9.00 to 11.30/12.30. Please bring along a document of identification.