Here you can get back to the main page of the course.

Algoritmi e Linguaggi per Bioinformatica: Algoritmi (2013/2014)

Exam preparation.

Please bring along a document of identification. Please arrive a bit ahead of time and make sure you have identification (with photo) with you. You are not allowed to use any material or notes, and will need to use the paper provided in the exam.

Part I
Formal alphabet, strings, prefixes, suffixes, substrings, subsequences; matrix, factorial, binomial coefficient, logarithms, sums
Pairwise sequence alignment Practical: Compute the score of a given alignment, for given scoring scheme; compute sim(s,t) using the DP algorithm of Needleman-Wunsch; compute an optimal alignment/all optimal alignments using the DP-table; the connection between the DP-table and alignments: be able to give the alignment represented by a path in the table and vice versa; find all optimal local alignments using the DP-algorithm of Smith-Waterman; semiglobal alignment (algorithm)
Theoretical: Give the definition of the cell D(i,j) for global and for local alignment. Give the running times and space requirements of these algorithms. Why are these preferable to the brute-force solution? What are affine gap penalties? For which type of problem is which version of the algorithm appropriate?
Multiple sequence alignment Theoretical: Explain and analyse the DP algorithm. Is it feasible? Which heuristics do you know? What is the SP-score?
Practical: Score a multiple alignment with the SP-score. Heuristics: star alignment, progressive alignment, apply these to a small example.
Algorithm analysis Theoretical: What are the two parameters of algorithms we analyse? What do O-classes measure? With respect to what? Which classes are feasible (manageable) and which aren't? Why? What are heuristics, why are they used?
Practical: Put in order of O-classes a given set of functions (slowest growing first, fastest growing last). Compare two functions, which grows faster? Which is preferable for an algorithm's running time/space consumption? Say of certain functions whether they are polynomial, linear, quadratic, cubic, exponential, superexponential.
Scoring matrices Theoretical: Explain how the PAM scoring matrices are computed. Explain the biological motivation. What is the underlying idea? What data are used? What does the number k mean in PAMk? What do the entries represent? Interpret their values. Why do we use a "log-odds" matrix? What is the main difference between PAM and BLOSUM matrices?
Practical: Use a given PAMk or BLUSUM-k matrix (to be supplied in the exam) to score an alignment.
Heuristics for sequence alignment Theoretical: What is a heuristic? Explain the underlying ideas of BLAST. What is the advantage over the DP-algorithms? What is the primary application? Why are heuristics used and not the DP-algorithms?
Practical: Compute a dotplot for two sequences. Interpret a given dotplot. Find high-scoring w-words (= q-grams, seeds) as does BLAST (for given scoring scheme).
Part II
Descriptive Statistics Theoretical: What is the difference between descriptive statistics and inferential statistics? What is a sample? What is a population? Def. of sample and population mean, variance, standard deviation.
Practical: Compute mean, median, mode, variance, standard deviation, percentiles, quartiles, IQR of a small data set (only simple examples that can be done without a calculator). Interpret example histograms, pie charts, bar charts, box plots.
Hypothesis testing Theoretical: What does null hypothesis mean? What is the p-value? What are type I and type II errors? What are the significance and power of a test?
Practical: Define the null hypothesis for a given problem. Decide whether or not to reject the null hypothesis (given p-value and alpha). Compute the p-value using Z-statistic and the tables of the normal distribution (either one).
Part III
Graphs, trees, phylogenetic trees Theoretical: How many ways are there to root a tree? How many edges does a tree on n nodes have? What is a phylogenetic tree? How many phylogenetic trees are there on a given set of taxa? What are the types of input data?
Practical: Identify in a tree leaves, root, parent, sibling, child, ancestor, descendant of a node. Check whether two drawings depict the same phylogenetic tree (very small examples). Root unrooted trees. Identify whether a phylogenetic tree is rooted/unrooted, whether the branch lengths matter.
Character based data Theoretical: Def. of compatibility (of a character with a tree). Def. of Perfect Phylogeny. Does it always exist? Define/explain Small Parsimony and Large Parsimony. Which of these problems can be solved efficiently?
Practical: Check whether a phylogenetic tree is a PP for a character-state matrix M. Compute the parsimony score of a given labelled tree. Apply Fitch's algorithm.