Here you can get back to the main page of the course.
Final exam preparation.
There will be two versions of the exam: short version (5 or 6 exercises - 2 hours), covering the second half of the course; and long version, covering the whole course (8 exercises - 3 hours). The exam questions will be in Italian and in English. You can answer either in Italian or in English. Here is a list of things you should know for the final exam:
New: The exam will take place on Tuesday 5 June, 2012, in Aula D (like last time). Please be there at 13.00, so we have time to explain the exercises.
Part I | |
Formal | alphabet, prefixes, suffixes, substrings, subsequences, matrix |
Pairwise sequence alignment | Practical: Compute the score of a given alignment, for given scoring scheme; compute sim(s,t) using the DP algorithm of Needleman-Wunsch; compute an optimal alignment using the DP-table; find all optimal local alignments using the DP-algorithm of Smith-Waterman; understand the connection between the DP-table and alignments: being able to give the alignment represented by a path in the table and vice versa. Theoretical: Give the running times and space requirements of these algorithms. Why are these preferable to the brute-force solution? |
Algorithm analysis | Theoretical: What are the two parameters of algorithms we analyse? What do O-classes measure? With respect to what? Which classes are feasible (manageable) and which aren't? Why? Practical: Put in order of O-classes a given set of functions (slowest first, fastest last). Compare two functions, which grows faster? Which is preferable for an algorithm's running time/space consumption? |
Scoring matrices | Theoretical: Explain how the PAM scoring matrices are computed. Explain the biological motivation. What is the underlying idea? What data are used? What does the number k mean in PAMk? What do the entries represent? Interpret their values. Practical: Use a given PAMk matrix (to be supplied in the exam) to score an alignment. Given a small probability matrix M (supplied), compute certain individual entries of the PAM2-matrix (all necessary data supplied, no need to be able to do complex logarithms in your head). |
Heuristics for sequence alignment |
Theoretical: What is a heuristic? Explain the underlying ideas of FASTA and BLAST. What is the advantage over the DP-algorithms? What is the primary application? Why are heuristics used and not the DP-algorithms? Practical: Compute a dotplot for two sequences (simple and filtered). Compute common q-grams (hotspots) as does FASTA. Find high-scoring q-grams (seeds) as does BLAST (for given scoring scheme). |
String similarity and distance | Practical: Give a series of edit operations (not necessarily optimal) which transform one string into the other. Compute the cost, given a cost measure. Compute the Hamming distance and q-gram distance of two strings. Give an upper bound on the edit distance. (extra: Compute the edit distance, using the theorem.) Theoretical: What similarity measures do you know? How does the q-gram distance relate to the unit-cost edit distance? Why is it useful? How do similarity score and edit distance relate? What is the mathematical abstraction for the notion of distance? |
Part II | |
Graphs, trees, phylogenetic trees | Practical: Check whether a given graph is a tree. Draw a graph. List a given graph's edges. Use terms like edge, simple path, cycle, labelled, degree, connected. In trees: leaf, root, rooted/unrooted, depth, parent, sibling, child, ancestor, descendant. Check whether two trees are isomorphic, i.e. "the same" (very small examples). Theoretical: How many ways are there to root a tree? How many edges does a tree on n nodes have? What is a phylogenetic tree? How many phylogenetic trees are there on a given set of taxa? What are the types of input data? |
Distance based data | Practical: Check whether a given distance matrix is ultrametric. Apply the algorithm UPGMA to a distance matrix. Theoretical: What is the running time of UPGMA? Connection between metric and ultrametric? Def. of ultrametric. What does "molecular clock" mean? |
Character based data |
Theoretical: Def. of compatibility (of a character with a tree). Def. of Perfect Phylogeny. Does is always exist? Define/explain Small Parsimony and Large Parsimony. How can these problems be solved? What algorithms have we seen? How efficient are they? Practical: Compute the parsimony score of a given labelled tree. Apply Fitch's algorithm. Apply greedy sequential algorithm. Apply Nearest Neighbor Interchange. |
Statistics |
Practical: Compute mean, median, variance, standard deviation, percentiles, quartiles of a small data set.
Define the null hypothesis for a given problem. Decide whether or not to reject the null hypothesis (given p-value and alpha). Theoretical: What is the difference between descriptive statistics and inferential statistics? What is a sample? What is a population? Def. of sample and population mean, variance, standard deviation. What does null hypothesis mean? What is the p-value? What are type I and type II errors? |
New: The exam will take place on Tuesday 5 June, 2012, in Aula D (like last time). Please be there at 13.00, so we have time to explain the exercises.