BINF 6205/8205: Computational Molecular Evolution

Course Description

This course will cover both the theoretical and practical aspects of performing phylogenetic analyses, with particular emphasis on understanding the differences and applications of the most commonly used methods. Topics to be covered will include: finding the appropriate data and getting it into the right format; using distance-based and probability-based methods to construct phylogenetic trees; performing ancestral state reconstruction to track the evolution of characters; understanding and calibrating the molecular clock to calculate divergence times; incorporating big data into phylogenetics, and estimating population parameters such as population size, growth, and migration rates.

Recommended Pre/Co-requisites: BINF 6201: Molecular Sequence Analysis; BINF 6200: Statistics for Bioinformatics

Learning Objectives:

  1. Understand the basic concept of a phylogeny: what is represents and how to interpret it.
  2. Know where to find and how to generate data sets that are appropriate for phylogenetic analyses, and understand how to construct a data set for answering a particular evolutionary question.
  3. Gain a solid understanding of the algorithms underlying different methods and know the strengths and weaknesses of each.
  4. Be able to use several of the most commonly used phylogenetic software packages, including BEAST, RAxML, MrBayes, as well as several R packages.
  5. Know how to interpret trees and how to use them to test various evolutionary models.

Instructor

Liz Cooper, Ph.D | Asst. Professor
UNC Charlotte | Dept. of Bioinformatics and Genomics
Office: Bioinformatics 271
P: (704) 687-2402
E: lizcooper@uncc.edu

Optional Reading

There is NO required textbook for this course, and any required reading (in the form of journal articles or other papers) will be provided to students via Canvas. However, there is a book which some students may find helpful, particularly those interested in learning more of the mathematical details underlying genetic distance and maximum likelihood methods: The Phylogenetic Handbook: A practical approach to phylogenetic analysis and hypothesis testing, Edited by Phillippe Lemey, Marco Salemi and Anne-Mieke Vandamme, 2nd Edition, Cambridge University Press 2009.

Homework and Grading

Each class period will consist of a combination of lecture and in-class computer exercises. Each week, students will be expected to complete all parts of the assigned exercises; anything not finished in class must be completed at home and submitted by 6:00PM on the due date. Students are permitted to work together on the exercises, but must turn in completed assignments individually. Some weeks may also include assigned reading to supplement the lecture material. The final project will require students to analyze a real data set and submit a written report.

Class/Homework Exercises 30%
Quizzes 20%
Midterm Exam 20%
Final Project 25%
Attendance 5%
A 90-100%
B 80-89%
C 70-79%
D 60-69%
F below 60%

List of Course Topics

The course schedule is organized into roughly 4 sections. In the first part of the course, we will review and discuss how we select certain sequences and process them in order to use them in a phylogenetic analysis. Next, we discuss different methods for constructing phylogenetic trees with small (single gene) data sets, along with how to compare and evaluate these methods. After the midterm, we start to look at newer methods we can use to incorporate larger, multi-gene and next-gen sequencing data into a phylogeny. Finally, we look at how we can use phylogenies and phylogenetic inference to look beyond species relationships and answer other evolutionary questions.

Section 1 Sequence Data for Phylogenetics
1.1 Introduction and History of Phylogenetics
1.2 Multiple Sequence Alignment Methods
1.3 Identifying Homologous Genes
1.4 Gene Family Evolution: Gene Duplication and Gene Loss
Section 2 Constructing Trees with Single Genes
2.1 Genetic Distance Methods: UPGMA and Neighbor-Joining
2.2 Maximum Parsimony
2.3 Bootstrapping to Find Branch Support
2.4 Maximum Likelihood Methods (RAxML)
2.5 Combining, Summarizing, and Comparing Trees
2.6 Ancestral State Reconstruction
Section 3 Advanced Methods for Larger Datasets
3.1 Bayesian Inference Methods
3.2 Concatenating and Partitioning Multiple Genes
3.3 Coalescent Theory and the Multispecies Coalescent
3.4 Methods for SNP data: Quartets and SNAPP
Section 4 Using Trees to Test Evolutionary Models
4.1 Estimating Divergence Times with the Molecular Clock
4.2 Gene Flow and Phylogenetic Networks
4.3 Population Size Changes