This course will cover both the theoretical and practical aspects of performing phylogenetic analyses, with particular emphasis on understanding the differences and applications of the most commonly used methods. Topics to be covered will include: finding the appropriate data and getting it into the right format; using distance-based and probability-based methods to construct phylogenetic trees; performing ancestral state reconstruction to track the evolution of characters; understanding and calibrating the molecular clock to calculate divergence times; incorporating big data into phylogenetics, and estimating population parameters such as population size, growth, and migration rates.
Recommended Pre/Co-requisites: BINF 6201: Molecular Sequence Analysis; BINF 6200: Statistics for Bioinformatics
Learning Objectives:
There is NO required textbook for this course, and any required reading (in the form of journal articles or other papers) will be provided to students via Canvas. However, there is a book which some students may find helpful, particularly those interested in learning more of the mathematical details underlying genetic distance and maximum likelihood methods: The Phylogenetic Handbook: A practical approach to phylogenetic analysis and hypothesis testing, Edited by Phillippe Lemey, Marco Salemi and Anne-Mieke Vandamme, 2nd Edition, Cambridge University Press 2009.
Each class period will consist of a combination of lecture and in-class computer exercises. Each week, students will be expected to complete all parts of the assigned exercises; anything not finished in class must be completed at home and submitted by 6:00PM on the due date. Students are permitted to work together on the exercises, but must turn in completed assignments individually. Some weeks may also include assigned reading to supplement the lecture material. The final project will require students to analyze a real data set and submit a written report.
Class/Homework Exercises | 30% |
Quizzes | 20% |
Midterm Exam | 20% |
Final Project | 25% |
Attendance | 5% |
A | 90-100% |
B | 80-89% |
C | 70-79% |
D | 60-69% |
F | below 60% |
The course schedule is organized into roughly 4 sections. In the first part of the course, we will review and discuss how we select certain sequences and process them in order to use them in a phylogenetic analysis. Next, we discuss different methods for constructing phylogenetic trees with small (single gene) data sets, along with how to compare and evaluate these methods. After the midterm, we start to look at newer methods we can use to incorporate larger, multi-gene and next-gen sequencing data into a phylogeny. Finally, we look at how we can use phylogenies and phylogenetic inference to look beyond species relationships and answer other evolutionary questions.
Section 1 | Sequence Data for Phylogenetics |
---|---|
1.1 | Introduction and History of Phylogenetics |
1.2 | Multiple Sequence Alignment Methods |
1.3 | Identifying Homologous Genes |
1.4 | Gene Family Evolution: Gene Duplication and Gene Loss |
Section 2 | Constructing Trees with Single Genes |
---|---|
2.1 | Genetic Distance Methods: UPGMA and Neighbor-Joining |
2.2 | Maximum Parsimony |
2.3 | Bootstrapping to Find Branch Support |
2.4 | Maximum Likelihood Methods (RAxML) |
2.5 | Combining, Summarizing, and Comparing Trees |
2.6 | Ancestral State Reconstruction |
Section 3 | Advanced Methods for Larger Datasets |
---|---|
3.1 | Bayesian Inference Methods |
3.2 | Concatenating and Partitioning Multiple Genes |
3.3 | Coalescent Theory and the Multispecies Coalescent |
3.4 | Methods for SNP data: Quartets and SNAPP |
Section 4 | Using Trees to Test Evolutionary Models |
---|---|
4.1 | Estimating Divergence Times with the Molecular Clock |
4.2 | Gene Flow and Phylogenetic Networks |
4.3 | Population Size Changes |