Synthetic data - Genome in a Bottle
In May, the National Institute of Standards and Technology (NIST) released its first genome in a bottle, a reference sample of DNA for validating human genome sequences. This so-called truth sequence comes from a decades-old sample donated by a Utah woman for (other) research purposes (NA12878 cell line), which, over the years, has been one of the most studied, and hence best-characterized, human samples. Seeing genomic medicine moving toward mainstream healthcare, researchers at NIST recognized the need for a reference human genome and assembled a private-public consortium in 2012 to create one. As detailed in a 2014 Nature Biotechnology paper (Nat. Biotechnol.32, 246–251, 2014), the group integrated and arbitrated among sequences from 14 data sets, five sequencing technologies, seven read mappers and three variant callers.
- Type: Other
- Archiver: European Genome-Phenome Archive (EGA)
Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data
| Dataset ID | Description | Technology | Samples | 
|---|---|---|---|
| EGAD00001008095 | 3 | ||
| EGAD00001008096 | Illumina HiSeq 2500 | 3 | |
| EGAD00001008097 | 3 | 
| Publications | Citations | 
|---|---|
| Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol 32: 2014 246-251 | 428 | 
| Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data 3: 2016 160025 | 348 | 
| Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol 37: 2019 555-560 | 154 | 
| An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol 37: 2019 561-566 | 159 | 
