Algorithms in Bioinformatics

A   A   A
Home > Software > MEGAN4

Skip to content. | Skip to navigation

MEGAN 4 - MEtaGenome ANalyzer

Software for analyzing metagenomes.

MEGAN splash(Download here.)

Over 7000 registered users.


Please use MEGAN6




MEGAN 4 written by D. H. Huson, original design by D. H. Huson and S.C. Schuster, with contributions from S. Mitra, D.C. Richter, P. Rupek, H.-J. Ruscheweyh and N. Weber.



In metagenomics, the aim is to understand the composition and operation of complex microbial consortia in environmental samples through sequencing and analysis of their DNA. Similarly, metatranscriptomics and metaproteomics target the RNA and proteins obtained from such samples. Technological advances in next-generation sequencing methods are fueling a rapid increase in the number and scope of environmental sequencing projects. In consequence, there is a dramatic increase in the volume of sequence data to be analyzed.


Basic computational questions


The first three basic computational tasks for such data are taxonomic analysis, functional analysis and comparative analysis. These are also known as the “who is out there?’, “what are they doing?” and “how do they compare?” questions. They pose an immense conceptual and computational challenge, and there is a great need for new bioinformatics tools and methods to address them.


History of MEGAN


In 2007, we published the first stand-alone analysis tool for metagenomic of short-read data, called MEGAN (MEta Genome ANalyzer, paper). Initially, the aim was to provide a tool for studying the taxonomic content of a single dataset. A subsequent version of the program allowed the comparative taxonomic analysis of multiple datasets (MEGAN 2). In version 3 of the program, we aimed at also providing a functional analysis of metagenome data, based on the GO ontology. Unfortunately, in our hands the GO ontology proved to be little suitable for this purpose. In version 4 of MEGAN, the GO analyzer has been replaced by two new functional analysis methods, one based on the SEED classification and the other based on KEGG (Kyoto Encyclopedia for Genes and Genomes). MEGAN 4 was released at the beginning of 2011 (paper accepted for publication in Genome Research).


Getting started


To prepare a dataset for use with MEGAN, one must first compare the given reads against a database of reference sequences, for example by performing a BLASTX search against the NCBI-NR database. The file of reads and the resulting BLAST file can then be directly imported into MEGAN and the program will automatically calculate a taxonomic classification of the reads and also, if desired, a functional classification, using either the SEED or KEGG classification, or both. The results can be interactively viewed and inspected. Multiple datasets can be opened simultaneously in a single comparative document that provides comparative views of the different classifications.


Metagenomics pipeline


Taxonomic analysis


 MEGAN can be used to interactively explore the dataset. In the following figure, we show the assignment of reads to the NCBI taxonomy. Each node is labeled by a taxon and the number of reads assigned to the taxon, The size of a node is scaled logarithmically to represent the number of assigned reads. Optionally, the program can also display the number of reads summarized by a node, that is, the number of reads that are assigned to the node or to any of its descendants in the taxonomy. The program allows one to interactively inspect the assignment of reads to a specific node, to drill down to the individual BLAST hits that support the assignment of a read to a node, and to export all reads (and their matches, if desired) that were assigned to a specific part of the NCBI taxonomy. Additionally, one can select a set of taxa and then use MEGAN to generate different types of charts for them.





Functional analysis using the SEED classification


To perform a functional analysis using the SEED classification, MEGAN attempts to map each read to a SEED functional role, using the highest scoring BLAST match to a protein sequence for which the functional role is known. The SEED classification is depicted as a rooted tree whose internal nodes represent the different subsystems and whose leaves represent the functional roles. Note that the tree is “multi-labeled” in the sense that different leaves may represent the same functional role, if it occurs in different types of subsystems. The current tree has about 13 000 nodes. The following figure shows a part of the SEED analysis of a marine metagenome sample.

SEED comparisonSEED analysis

Functional analysis using the KEGG classification


To perform a KEGG analysis, MEGAN attempts to match each read to a KEGG orthology (KO) accession number, using the best hit to a reference sequence for which a KO accession number is known. This information is then used to assign reads to enzymes and pathways. The KEGG classification is represented by a rooted tree (with approximately 13 000 nodes,) whose leaves represent different pathways. Each pathway can also be inspected visually, to see which reads were assigned to which enzymes. As an example, consider the citric acid cycle, which is of central importance for cells that use oxygen as part of cellular respiration. In the following figure we show the citric acid cycle pathway. In such a drawing of a pathway as provided by the KEGG database, different participating enzymes are represented by numbered rectangles. MEGAN shades each such rectangle is so as to indicate the number of reads assigned to the corresponding enzyme.


MEGAN KEGG pathway analysis

Comparative visualization


To compare a collection of different datasets visually, MEGAN provides a comparison view that is based on a tree in which each node shows the number of reads assigned to it for each of the datasets. This can be done either as a pie chart, a bar chart or as a heat map. To construct such a view using MEGAN, the datasets must first all be individually opened in the program. Using a provided “compare” dialog one can then setup a new comparison document containing the datasets of interest. The following figure shows the taxonomic comparison of all eight marine datasets. Here, each node in the NCBI taxonomy is shown as a bar chart indicating the number of reads (normalized, if desired) from each dataset that have been assigned to the node.

Taxonomy comparison  

In a similar fashion, MEGAN supports the simultaneous analysis and comparison of the SEED functional content of multiple metagenomes, see the next figure. Moreover, a comparative view of assignments to a KEGG pathway is also possible.


Computational comparison of metagenomes


 MEGAN provides an analysis window for comparing multiple datasets, which allows one to compute a distance matrix for a collection of datasets using a number of different ecological indices. The calculation can be based on the results of a taxonomic, SEED or KEGG analysis. If no nodes are selected, then the distances will be based on the number of reads assigned to the current leaves of the tree representation of the analysis. If some nodes are selected, then only the values for the selected nodes are used in the calculation.


MEGAN supports a number of different methods for calculating a distance matrix, such as Goodall’s ecological index, a simple version of UniFrac and euclidean distances. Such a distance matrix can be visualized either using a split network calculated using the neighbor-net algorithm, or using a multi-dimensional scaling plot. In the next figure we show the result of a comparison of the eight marine datasets based on the taxonomic content of the datasets and computed using Goodall’s index.


Taxonomic network analysis


Analysis of other types of data


MEGAN was originally designed to analyse metagenomic and metatranscriptomic data. However, it is easily possible to analyze metaproteomic data as well.
Please note that MEGAN can now be used to analyze sequencing reads obtained in an approach targeted at 16S rRNA sequences, as shown here:





Use of MEGAN requires a license key. Commercial users can obtain a single user license or site license here.

Academic users can obtain a free license under the condition that any use of MEGAN is cited from here. (Please use ASCII characters only, no accents, Umlaute etc).

License keys for commercial users are issued by the University of Tübingen for a fee. Please contact Daniel Huson for details.


MEGAN5 can be downloaded here.
License certificate server for academic users here

Non-academic users please contact Daniel Huson for a commercial or trial license.

(Download old MEGAN version 4 here.)

(License key server for academic users of MEGAN5 here.) 

(Download old MEGAN version 3 here.) 

Example datasets

Example datasets discussed in the MEGAN4 paper are available here.


Document Actions