Here you can get back to the main page of the course.

Algorithms for Computational Biology (fall term 2015/2016)

Exam preparation. (No guarantee of completeness! Of course you need to know all that we did in class!)

Please arrive a bit ahead of time and make sure you have identification (with photo) with you. You are not allowed to use any material or notes, and can only use the paper provided in the exam.

  Sequence analysis
Formal alphabet, strings, prefixes, suffixes, substrings, subsequences; matrix, factorial, binomial coefficient, logarithms, sums; number of substrings, subsequences, of strings of length n; examples of different types of alphabet; reverse complement of DNA strings; genetic code.
Pairwise sequence alignment Theoretical: Give the definition of the cell D(i,j) for global and of the cell L(i,j) for local alignment (as opposed to how they are computed). Give the running times and space requirements of these algorithms. Why are these preferable to the exhaustive search algorithm? Explain the exhaustive search algorithm. Recursive formula for number of alignments. Explain space-saving variant of N-W algorithm, semiglobal alignment.
Practical: Compute the score of a given alignment, for given scoring function; compute sim(s,t) using the DP algorithm of Needleman-Wunsch; compute an optimal alignment/all optimal alignments using the DP-table. Connection between the DP-table and alignments: be able to give the alignment represented by a path in the table and vice versa; find all optimal local alignments using the DP-algorithm of Smith-Waterman.
Multiple sequence alignment Theoretical: What are induced alignments? What is the SP-score? Explain and analyse the DP algorithm. Is it feasible? Which heuristics do you know?
Practical: Score a multiple alignment with the SP-score. Give the induced alignments of an MSA. Heuristics: apply star alignment algorithm to a small example.
String distance measures Theoretical: What is a metric? Define unit cost edit distance. How do we have to choose the scoring function in order to have a parallel between edit distance and alignment? Explain and analyse the DP algorithm for edit distance. Define LCS distance and Hamming distance.
Practical: Compute the edit distance between two strings, using the DP-algorithm. Given the series of edit operations corresponding to an alignment and vice versa. Compute the Hamming distance and the LCS distance for two strings.
  Algorithms
Algorithm analysis Theoretical: What are the two parameters of algorithms we analyse? What do O-classes measure? With respect to what? Which classes are feasible (manageable) and which aren't? Why? What are heuristics, why are they used?
Practical: Put in order of O-classes a given set of functions (slowest growing first, fastest growing last). Compare two functions, which grows faster? Which is preferable for an algorithm's running time/space consumption? Say of certain functions whether they are polynomial, linear, quadratic, cubic, exponential, superexponential.
  Sequence analysis in practice
Scoring matrices Theoretical: Explain how the PAM scoring matrices are computed. Explain the biological motivation. What is the underlying idea? What data are used? What does the number k mean in PAMk? What do the entries represent? Interpret their values. Why do we use a "log-odds" matrix? What is the main difference between PAM and BLOSUM matrices?
Practical: Use a given PAMk or BLOSUM-k matrix (to be supplied in the exam) to score an alignment.
BLAST Theoretical: What is a heuristic? Explain the underlying ideas of BLAST. What is the advantage over the DP-algorithms? What is the primary application? Why are heuristics used and not the DP-algorithms?
Practical: Given a small example (a query and a db sequence), explain what BLAST does on the example. Explain how to run a BLAST query on the NCBI webpage; explain the output from the NCBI BLAST webpage.
  Phylogenetics
Trees, phylogenetic trees Theoretical: What is a phylogenetic tree? Kinds of phylogenetic trees. How many ways are there to root a tree? How many edges does a tree on n nodes have? How many phylogenetic trees are there on a given set of taxa? What are the types of input data?
Practical: Check if a given graph is a tree. Identify in a rooted tree leaves, root, and for a given node, its parent, siblings, children, ancestors, descendants. Check whether two drawings depict the same phylogenetic tree (very small examples). Root unrooted trees. Identify whether a phylogenetic tree is rooted/unrooted, binary/multifurcating, whether the branch lengths matter.
Distance based data Theoretical: Explain the aim of distance based phylogenetic reconstruction. What is given (input), what are we looking for (output)? Explain the running time of UPGMA. Def. of ultrametric. What does "molecular clock" mean?
Practical: Check whether a given distance matrix is ultrametric. Apply the algorithm UPGMA to a distance matrix.
Character based data Theoretical: What are convergence and reversal? Def. of compatibility (of a character with a tree). Def. of Perfect Phylogeny. Does it always exist? Define parsimony (of a phylogenetic tree). Define the problems of Small Parsimony and Maximum Parsimony. What algorithms did we see for these? What are their advantages and disadvantages?
Practical: Check whether a phylogenetic tree is a PP for a character-state matrix M. Compute the parsimony score of a given labelled tree. Identify cases of reversal and convergence in a phylogenetic tree. Apply Fitch' algorithm to compute a most parsimonious labeling.