ICBO_2018_73: Taxa, metacoder, poppr and vcfR: Four packages for parsing, visualization, and manipulation of genetic, genomic and metagenomic data in R


Contemporary population genomic microbiome research are producing complex and large datasets that are difficult to manipulate and visualize. High throughput DNA sequencing projects typically result in files containing genetic variants in the Variant Call Format (VCF). These files are large and not strictly tabular, presenting an issue to investigators who wish to work with this data in the R environment. We created the package vcfR to facilitate exploration of VCF data in R. vcfR uses Rcpp to implement C++ functions, allowing R users to take advantage of the performance of compiled code without the user needing to know about the compiled code. This facilitates efficient use of VCF data in R. New users who are unfamiliar with VCF data can now explore the data to learn about their data. Quality control steps can be conducted. In addition, manipulations, such as subsetting samples or variants, can be performed. We provide conversion functions so analyses in the popular R genetics packages adegenet, poppr, ape and pegas. Analysis of population differentiation and copy number variation can be performed directly in vcfR. Our package vcfR facilitates efficient use, manipulation, and analysis of VCF data and integrates this data into the existing ecosystem of R genetic analysis and graphics packages. The taxa package provides a set of classes for the storage and manipulation of taxonomic data. Classes range from simple building blocks to project-level objects storing multiple user-defined datasets mapped to a taxonomy. It includes parsers that can read in taxonomic information in nearly any form. We have also developed the metacoder package for visualizing hierarchical data. Metacoder implements a novel visualization called heat trees that use the color and size of nodes and edges on a taxonomic tree to quantitatively depict up to 4 statistics. This allows for rapid exploration of data and information-dense, publication-quality graphics. This is an alternative to the stacked barcharts typically used in microbiome research. These complementary tools provide a new resource for analyzing hierarchical population genomic and microbiome data in R.

Year of Publication
Conference Name
International Conference on Biomedical Ontology (ICBO 2018)
Date Published
International Conference on Biological Ontology
Download citation