Home | Sample Data | Links | References | Grunwald Lab

Analysis of microbial population genetic data

Niklaus J. Grünwald



Links

Below I provide links to a selection of most commonly used freeware for analysis of population genetic data.

Great places to check for links on genetic analysis software are:

http://courses.washington.edu/fish543/Software.htm
http://mep.bio.psu.edu/databases.html
http://www.bio.psu.edu/People/Faculty/Nei/Lab/software.htm
http://uwadmnweb.uwyo.edu/zoology/mcdonald/molmark/Data/WebSoft.html
http://www.csit.fsu.edu/~beerli/popgensoftware.html
http://evolution.genetics.washington.edu/phylip/software.html

Links on population genetics:

http://www.geocities.com/CapeCanaveral/Lab/4709/popgen.htm
McDonald, B.A. 2004. Population Genetics of Plant Pathogens. The Plant Health Instructor. DOI:10.1094/PHI-A-2004-0524-01.

Link
 Description
Arlequin

Arlequin is an exploratory population genetics software environment able to handle large samples of molecular data (RFLPs, DNA sequences, microsatellites), while retaining the capacity of analyzing conventional genetic data (standard multi-locus data or mere allele frequency data). A variety of population genetics methods have been implemented either at the intra-population or at the inter-population level, and they can be conveniently selected and parameterized through a graphical interface. Arlequin has no equivalent in his field and will be extremely useful to analyzed the large data sets which are now available by the use of the latest molecular engineering techniques.

Arlequin has been designed to be able to handle different types of molecular or conventional (non-molecular or frequency-type) data. It can also handle data either presented in the form of genotype frequencies, or the form of haplotype frequencies, as well as the possibility of treating codominant or recessive data (with the definition of a single recessive allele per locus). Molecular data can be analyzed by entering their definition (as DNA sequences, RFLP haplotypes, microsatellite profiles, or multilocus haplotypes), or by entering a distance matrix defining the relationships among the haplotypes (as was done with our previous AMOVA software). The data format is specified in an input file.

GenoType/GenoDive GenoType and GenoDive are two programs for analysing the genotypic diversity in clonal/asexual organisms. GenoType assigns genotypes to individuals, based on molecular marker data. Possible input data comes from microsatellites, allozymes, AFLP's or RAPD's. GenoType can also handle haploid and polyploid data and has some features to take scoring errors or mutations into account. GenoDive reads genotypic data (e.g. from GenoType) and calculates a number of indices of genotypic diversity. It can also perform a bootstrap test to see whether these indices are different for pairs of populations. Finally it can also test for differentiation between populations in genotypic composition (this is not the same as the test for differences in genetic diversity!!).
MatLab Molecular Biology & Evolution Toolbox

MATLAB, a high-performance language for technical computing, integrates computation, visualization, and programming in an easy-to-use environment. It has been widely used in many areas, such as, mathematics and computation, algorithm development, data acquisition, modeling, simulation, and scientific and engineering graphics. One of the most attractive features of MATLAB is that the basic data element of the system is an array that does not require dimensioning. This allows users to solve many technical computing problems, especially those with matrix and vector formulations, in a very effective way. The MATLAB environment itself offers a comprehensive set of built-in functions; moreover, many toolboxes have been developed and are often freely available for more specialized needs. However, to our knowledge, these advantages present in the MATLAB environment have not been fully taken and utilized in the area of molecular biology and evolution. Only few MATLAB toolboxes or functions are freely available for data analysis, exploration, and visualization of nucleotide and protein sequences.

The toolbox, MBEToolbox, presented here to fulfill most obvious needs in sequence manipulation under MATLAB environment. Moreover, it is an extensible functional framework to formulate and solve problems in evolutionary data analysis; it facilitates the rapid construction of both general applications as well as special-purpose tools for molecular evolutionary analysis, in a fraction of the time it would take to write a program in a scalar noninteractive language such as C or FORTRAN.

The main tools implemented in the MBEToolbox are: sequence manipulation functions, including sequences file input/output and sequence view; a set of nucleotide-based or codon-based substitution models and their corresponding implementations of evolutionary distances; phylogenetic tree construction by neighbor-joining, maximum parsimony or maximum likelihood algorithms; several statistical tests and graphics functions for visualize result of analyses.

MIGRATE Migrate estimates effective population sizes and past migration rates between n population assuming a migration matrix model with asymmetric migration rates and different subpopulation sizes. Migrate uses maximum likelihood or Bayesian inference to jointly estimate all parameters. It can use the followind data types: sequence data using Felsenstein's 84 model with or without site rate variation, single nucleotide polymorphism data, microsatellite data using a stepwise mutation model or a brownian motion mutation model, and electrophoretic data using an 'infinite' allele model. The output can contain: Estimates of all migration rates and all population sizes, assuming constant mutation rates among loci or a gamma distributed mutation rate among loci. Profile likelihood tables, Percentiles, Likelihood-ratio tests, and simple plots of the log-likelihood surfaces for all populations and all loci.
POPGENE

POPGENE is a user-friendly computer freeware for the analysis of genetic variation among and within populations using co-dominant and dominant markers. This package provides the Windows graphical user interface that makes population genetics analysis more accessible for the casual computer user and more convenient for the experienced computer user. Simple menus and dialog box selections enable you to perform complex analyses and produce scientifically sound statistics, thereby assisting you to adequately analyze population genetic structure using the target markers.

PC users can run POPGENE under Microsoft® Windows 3.11, 95, 98, 2000, ME and NT. APPLE users can run POPGENE on PowerPc, G3 and G4. However, they must first install a software such as "Virtual PC" or “Soft Windows” . The 16- and 32-bit version of POPGENE would ran effortlessly.

The current version (Version 1.32) is designed specifically for the analysis of co-dominant and dominant markers using haploid and diploid data. POPGENE computes both comprehensive genetic statistics (e.g., allele frequency, gene diversity, genetic distance, G-statistics, F-statistics) and complex genetic statistics (e.g., gene flow, neutrality tests, linkage disequilibria, multi-locus structure). The development of a new version to include the analysis of quantitative genetic variation and covariation and the comparison of marker genes and quantitative genetic variation is in progress. This version has a new graphics interface to produce publication quality dendrogram. The computing modules are limited to a maximum of 1400 populations, 150 groups, 1000 loci and 52 alleles per locus for allelic data, and limited by available Random-Access Memory (RAM) for quantitative data.

SNAP Workbench version 1.0

SNAP Workbench is a Java program that manages and coordinates a series of analysis programs for making inferences on population processes. SNAP workbench allows the user to customize the implementation of complex console programs and functions for the purpose of automating and enhancing data exploration. In our implementation, the workbench facilitates population parameter estimation by ensuring that the assumptions and program limitations of each analysis method are met and by providing a step-by-step methodology to effectively integrate both summary-statistic methods and coalescent-based population genetic models.

The workbench is programmed in Java to preserve platform independence across multiple operating systems. The program modules integrated in the workbench are written in C or Java and are available on a variety of computing platforms. The latest versions of the workbench for Mac OS X, Windows and Linux can be downloaded here.

TFPGA Tools for Population Genetic Analyses: A Windows program for the analysis of allozyme and molecular population genetic data. This program calculates descriptive statistics, genetic distances, and F-statistics. It also performs tests for Hardy-Weinberg equilibrium, exact tests for genetic differentiation, Mantel tests, and UPGMA cluster analyses. Additional features include the ability to analyze hierarchical data sets as well as data from either codominant markers such as allozymes or dominant markers such as AFLPs or RAPDs.
Virulence analysis tools in plant pathology
A package for calculating diversity within and between populations.
TCS: Phylogenetic network estimation using statistical parsimony CS is a Java computer program to estimate gene genealogies including multifurcations and/or reticulations (i.e. networks). The network estimation implemented in TCS is also known as Statistical Parsimony, which is described in Templeton, A. R., K. A. Crandall and C. F. Sing. 1992. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. Genetics 132:619-633. For a review on networks and instraspecific genealogies you may read Posada D and Crandall KA. 2001. Trends in Ecology and Evolution 16 (1): 37-45

Operative systems
A jar file is provided that should run in any OS (macintosh, windows, unix-like) with a java virtual machine installed

Transformer-3:
Transformer-3 streamlines the generation, storage, interpretation, processing and application of any kind of molecular population genetic data.

Because of its ease of use and versatility, Transformer-3 speeds up data transformations and analyses that are otherwise burdensome, complex and prone to error, thereby allowing the researcher to concentrate in what really matters: the critical discussion of quantitative results and hypotheses.

Helping the researcher save time, Transformer-3 permits the effective implementation of urgency in the growing number of practical applications of molecular population genetic information.

Populations Population genetic software (individuals or populations distances, phylogenetic trees)
http://www.rhizobia.co.nz/phylogenetics/modeltest.html The step by step guide to Modeltest.
BioNumerics Software platform to offer integrated analysis of major applications in Bioinformatics: 1D electrophoresis gels, all kinds of chromatographic and spectrometric profiles, 2D protein gels, phenotype characters, microarrays, and sequences. The unique power of BioNumerics lies in its ability to combine information from various genomic and phenotypic sources into one global database and conduct conclusive analyses. BioNumerics runs on industry leading database engines such as Oracle® and Microsoft® SQL ServerTM. With its integrated networking and client-server features, the software is the perfect backbone for universal data management and analysis within and between laboratories of any size.

 


Horticultural Crops Research Laboratory, USDA ARS
3420 NW Orchard Ave., Corvallis, OR 97330
Voice: (541) 738-4049 • Fax: (541) 738-4025
E-mail: grunwaln@onid.orst.edu



Questions or comments, contact Webmaster
Last Updated: March 24, 2006