CONP Portal | Dataset

Keywords: 1000 Genomes Project

1000 Genomes Project

Creators: 1000 Genomes Project

Contact: info@1000genomes.org

Licenses: CC BY-NC-SA

Version: 1.0

Modalities: genomics

Formats: VCF FASTA

Size: 16.2 GB

No of Files: 25

No of Subjects: 2504

Primary Publication: A global reference for human genetic variation. https://doi.org/10.1038/nature15393

Browse on GitHub: https://github.com/conpdatasets/1000GenomesProject

Project Landing Page: https://www.internationalgenome.org/

Metadata file: DATS.json

Is About:	Homo sapiens
Other Dates:	Conp Dats Json Fileset Creation Date: 2019-07-31 11:21:58

Description:

The 1000 Genomes Project provides a comprehensive description of common human variation by applying a combination of whole-genome sequencing, deep exome sequencing and dense microarray genotyping to a diverse set of 2504 individuals from 26 populations. Over 88 million variants are characterised, including >99% of SNP variants with a frequency of >1% for a variety of ancestries.

Dataset README information

README.md

This directory contains 24 files with names of the format '1KGP_chrnn.vcf.gz', each of which connects to the gzipped .vcf file containing 1000 Genome Project sequence variation files for the appropriate chromosome, hosted at the European Bioinformatics Institute, and also 9 supplementary data files.

Download Using DataLad

The following instructions require a basic understanding of UNIX/LINUX command lines. A subset of open datasets on the Portal are also available through a browser-based download button. The instructions below regard dataset download with the use of DataLad. To install DataLad on your system, please refer to the install section of the DataLad Handbook .

Note: For maximum compatibility with conp-dataset, the CONP recommends versions 3.12+ of Python, 10.20241202+ of git-annex, and 1.1.4+ of datalad.

1) Initiate the CONP dataset

Run the following command in the directory where you want the CONP dataset (conp-dataset) to be installed:

2) Install the 1000GenomesProject dataset

To install the 1000GenomesProject dataset, run the following commands to move into the "projects" subdirectory under the "conp-dataset" directory (created in the previous step) and run datalad install:

3) Download data from the 1000GenomesProject dataset

Now that the dataset has been installed, go into the 1000GenomesProject dataset directory.

The files visible after installing the dataset but before downloading (in the next step) are symbolic links and need to be downloaded manually using the datalad get command:

If you run datalad get * command, all the files available in the dataset directory will be downloaded.