About the database

The database is a comprehensive, freely accessible, standardized online database that meets FAIR principles for whole-genome sequencing metagenomic data of human fecal specimens. Compared to current databases that deposit only raw sequencing files generated in independent studies, the database also provides details on samples and data-generation process, multi-level processed data such as abundance tables for genes, microbiota and molecular functions in KEGG orthology (KO) database using a unified bioinformatic processing pipeline, and the statistics of data processing process. All processed data in the database are readily usable and comparable based on the unified non-redundant gene list that covers genes from all sequencing files. The incorporation of tools that enable searching with descriptive attributes, gene sequences, microbiota and KO functions, makes the database quite user-friendly. With the continuous updating of the database in terms of data volume, data types and sample types, the database will be of great value for microbial scientists, pharmacists, students and the general public in data reuse and unleashing he power of microbiomes to overcome critical challenges in disease control, treatment and even precision medicine.

Current release of the database (released 2020-01-02) contains raw whole-genome sequencing metagenomic sequencing reads in gzip format (more than 12 TB) for a total of 4470 sequencing files for human fecal specimens covering more than 10 types of diseases collected from 17 assays of 16 studies conducted in over 14 different countries, details on samples and data-generation process, and multi-level processed data using a unified bioinformatic-processing pipeline. The sequences of the 14,823,828 non-redundant genes, corresponding phylogenetic and functional annotation results are also provided.

The database is offered to the public as a freely available resource. Use and re-distribution of the data, in whole or in part, for commercial purposes (including internal use) requires a license. We ask that users who download significant portions of the database cite the MetaGeneBank paper in any resulting purposes. For feedback, suggestions, or for error/bug reports, please contact us.

The content of MetaGeneBank is intended for educational and scientific research purposes only. It is not intended as as substitute for professioal medical advice, diagnosis or treatment.