1. Software
As of May 2015, all supplementary data and source code is available at github.com/sarisbro. This page will no longer be updated...
MCMCcodonsite: a program implementing some codon models allowing \omega = d_N/d_S to vary among sites (M0, M2a, M3, M7, M8) in a Bayesian framework
A test version is now available for M2a here and for the other models here (also as an untested DOS executable here). The only difference between these two scripts is a flag for M2a in the first one. Simulations can be performed with this script. You will need codeml (Ziheng Yang's PAML 3.14 prior to September 2004, available from this page) in your path. To run the MCMC sampler, format the codeml control file as you would do to run a maximum likelihood analysis; the settings of the sampler can be changed from the first lines of the scripts.
Reference: Aris-Brosou, S. 2006. Identifying sites under positive selection with uncertain parameter estimates. Genome. 49:767-776.
AHLC (ad hoc local clocks): adhockeries for estimating divergence times between species
Available: source code (updated Apr 16, 2012; now includes OS X_64 intel binaries) and binaries for OS X (PPC binaries), Linux (x86_64), Solaris (UltraSPARC only). Compile source code with the command: 'make mybaseml' or 'make mycodeml'. Note that you need to have R set up in your path with the stats, cluster and hopach packages installed. In the baseml or codeml control file, the ClusMethod option can be set to -1 (original AHRS algorithm), 0 (k-means), 1 (k-medoids), 2 (HOPACH) or 3 (MSS); use the SGE option if you run such a grid deamon. Two additional scripts make it possible to submit replicates (codeml version here) of the ML analyses, and then to extract the results. Please refer to the reference below and to the PAML manual for additional information. Available from source code only: you can use a custom 'rates.in' file with your own branch-specific rate estimates.
Reference: Aris-Brosou, S. 2007. Dating phylogenies with hybrid local molecular clocks. PLoS ONE. 2(9): e879.
MCMCcodonsite: a program implementing some codon models allowing \omega = d_N/d_S to vary among sites (M0, M2a, M3, M7, M8) in a Bayesian framework
A test version is now available for M2a here and for the other models here (also as an untested DOS executable here). The only difference between these two scripts is a flag for M2a in the first one. Simulations can be performed with this script. You will need codeml (Ziheng Yang's PAML 3.14 prior to September 2004, available from this page) in your path. To run the MCMC sampler, format the codeml control file as you would do to run a maximum likelihood analysis; the settings of the sampler can be changed from the first lines of the scripts.
Reference: Aris-Brosou, S. 2006. Identifying sites under positive selection with uncertain parameter estimates. Genome. 49:767-776.
PhyBayes: a program for Bayesian inference in phylogenetics
- estimation of divergence dates / rates of molecular evolution
- Bayes factor and test of evolutionary hypotheses
- tree space search and estimation of the posterior probabilities
- estimation of heterogeneities in a data set (still a project...)
Platforms: A single archive now contains all executable versions for Linux, DOS and OS X -- a crude and somewhat out-of-date "manual" is included in the archive.
References: (the first two ones deal with estimation of divergence dates from molecular data; the third one deals with comparisons of molecular phylogenies, also possible with MrBayes ver. 2.01 and above.)
Aris-Brosou, S. and Z. Yang. 2002. The effects of models of rate evolution on estimation of divergence dates with a special reference to the metazoan 18S rRNA phylogeny. Syst. Biol. 51:703-714.
Aris-Brosou, S. and Z. Yang. 2003. Bayesian models of episodic evolution support a late Precambrian explosive diversification of the Metazoa. Mol. Biol. Evol. 20:1947-1954.
Aris-Brosou, S. 2003. How Bayes tests of molecular phylogenies compare with frequentist approaches. Bioinformatics. 19:618-624. (here is the corresponding perl script, to extract from the output file the log-likelihoods sampled from the posterior, and the Mathematica script to estimate the Bayes factors).
TreeTest: a program for testing phylogenetic tree topologies in a frequentist framework
This program makes use of the baseml code (PAML), and implements tests of phylogenetic hypotheses (is the maximum likelihood tree the correct tree?) and significance tests. This test version of the program is only available for nucleotide data for the moment. I will post executables for amino-acid and codon data (along with the source code) later on.
Platforms: DOS / MS-Win / Linux
NB. the version uploaded before Sep. 17, 2002 had an error, now fixed (thanks to Nick Goldman).
Reference: Aris-Brosou, S. 2003. Least and most powerful phylogenetic tests to elucidate the origin of the seed plants in presence of conflicting signals under misspecified models. Syst. Biol. 52:781-793.
2. Data sets and other scripts
Evolution of drug resistance in influenza viruses
The data analyzed in this paper (Garcia and Aris-Brosou, 2014) are available here.
2009 H1N1 pandemic sequences
The data analyzed in this paper (#18 on the publications page) are available here.
A simple measure of the dymanics of segmented genomes
The data analyzed in this paper (#17 on the publications page) are available here.
Mining for positive selection
Perl scripts used in Mol. Biol. Evol. 22:200:
- database deflation and analysis
- parsing analyzed databased to extract results
- others: filter "pwise" (ML) & get PF names
- splus: the bootstrap
New tests of molecular phylogenetic trees
HIV data set (the original can be found on Nick Goldman's web server)
Timescale of the evolution of the Metazoa
Mitochondrial & nuclear genes: accession numbers (with tree topologies used).