CONP Portal | Dataset


1000 Genomes Project
Creators: 1000 Genomes Project
Contact: info@1000genomes.org
Licenses: CC BY-NC-SA
Version: 1.0
Modalities: genomics
Formats: VCF FASTA
Size: 16.2 GB
No of Files: 25
No of Subjects: 2504
Primary Publication: A global reference for human genetic variation. https://doi.org/10.1038/nature15393
Project Landing Page: https://www.internationalgenome.org/
Metadata file: DATS.json
Is About: Homo sapiens
Other Dates: Conp Dats Json Fileset Creation Date: 2019-07-31 11:21:58
Description:
The 1000 Genomes Project provides a comprehensive description of common human variation by applying a combination of whole-genome sequencing, deep exome sequencing and dense microarray genotyping to a diverse set of 2504 individuals from 26 populations. Over 88 million variants are characterised, including >99% of SNP variants with a frequency of >1% for a variety of ancestries.

Dataset README information

README.md

This directory contains 24 files with names of the format '1KGP_chrnn.vcf.gz', each of which connects to the gzipped .vcf file containing 1000 Genome Project sequence variation files for the appropriate chromosome, hosted at the European Bioinformatics Institute, and also 9 supplementary data files.

Download Using DataLad

CircleCI status

The following instructions require a basic understanding of UNIX/LINUX command lines. A subset of open datasets on the Portal are also available through a browser-based download button. The instructions below regard dataset download with the use of DataLad. To install DataLad on your system, please refer to the install section of the DataLad Handbook .

Note: For maximum compatibility with conp-dataset, the CONP recommends versions 3.12+ of Python, 10.20241202+ of git-annex, and 1.1.4+ of datalad.

1) Initiate the CONP dataset

Run the following command in the directory where you want the CONP dataset (conp-dataset) to be installed:

2) Install the 1000GenomesProject dataset

To install the 1000GenomesProject dataset, run the following commands to move into the "projects" subdirectory under the "conp-dataset" directory (created in the previous step) and run datalad install:

3) Download data from the 1000GenomesProject dataset

Now that the dataset has been installed, go into the 1000GenomesProject dataset directory.

The files visible after installing the dataset but before downloading (in the next step) are symbolic links and need to be downloaded manually using the datalad get command:

If you run datalad get * command, all the files available in the dataset directory will be downloaded.


For more information on how DataLad works, please visit the DataLad Handbook documentation.