Amazon: Complete 1000 Genomes Project on the Cloud


Amazon has announced the complete 1000 Genomes Project is available on Amazon Web Services, giving scientists free and instant access to the world largest collection of human genetics. The availability of this wealth of information will accelerate disease research all over the world.

The 1000 genomes Project is an international research effort coordinated by a consortium of 75 companies and establishes the most detailed catalogue of human genetic variation available. The project contains 200 terabytes of data including full DNA sequences of 1,700 individuals. The 1000 Genomes Project aims to have 2,600 individuals from 26 different population segments from around the world by the time it is complete. The United States National Institute of Health (NIH) will continue to add the remaining samples this year, but what they have so far is being made public now. Go to to access the 1000 Genomes Project on the cloud.

“Previously, researchers wanting access to public data sets such as the 1000 Genomes Project had to download them from government data centers to their own systems, or have the data physically shipped to them on discs. This process took a long time, and that’s assuming a lab had the bandwidth to download the data and sufficient storage and compute infrastructure to hold and analyze the data once they had it,” said Lisa D. Brooks, Ph.D., Program Director for the Genetic Variation Program, National Human Genome Research Institute, a part of NIH. “We are happy that the 1000 Genomes Project data are on AWS to give researchers anywhere in the world a simple way to access the data so they can put the data to work in their research.”

The publishing of the 1000 Genomes Project on the cloud makes the information available to small and large research facilities alike, eliminating the need for expensive hardware and data facilites to process the information. With more research facilities being able to concentrate on advancing science, hopefully more research will be done faster in finding cures for human disease.

“It took more than 10 years, and billions of dollars to sequence and publish the very first human genome. Recent advances in genome sequencing technology have enabled researchers to tackle projects like the 1000 Genomes by collecting far more data, faster. This has created a growing need for powerful and instantly available technology infrastructure to analyze that data,” said Deepak Singh, Ph.D. and Principal Product Manager, Amazon Web Services. “We’re excited to help scientists gain access to this important data set by making it available to anyone with access to the Internet. This means researchers and labs of all sizes and budgets have access to the complete 1000 Genomes Project data and can immediately start analyzing and crunching the data without the investment it would normally require in hardware, facilities and personnel. Researchers can focus on advancing science, not provisioning the resources required for their research.”