The Human Genome Project public consortium today announced
that it has assembled a working draft of the sequence of the
human genome -- the genetic blueprint for a human being.
This major milestone involved two tasks: placing large
fragments of DNA in the proper order to cover all of the
human chromosomes, and determining the DNA sequence of these
The assembly reported today consists of overlapping
fragments covering 97 percent of the human genome, of which
sequence has already been assembled for approximately 85
percent of the genome. The sequence has been threaded
together into a string of As, Ts, Cs, and Gs arrayed along
the length of the human chromosomes.
Production of genome sequence has skyrocketed over the past
year, with more than 60 percent of the sequence having been
produced in the past six months alone. During this time,
the consortium has been producing 1000 bases a second of raw
sequence - 7 days a week, 24 hours a day.
The average quality of the "working draft" sequence far
exceeds the consortium's original expectations for this
Consortium centers have produced far more sequence data than
expected (over 22.1 billion bases of raw sequence data,
comprising overlapping fragments totaling 3.9 billion bases
and providing 7-fold sequence coverage of the human genome).
As a result, the "working draft" is substantially closer to
the ultimate "finished" form than the consortium expected at
this stage. Approximately 50 percent of the genome sequence
is in near-"finished" form or better, and 24 percent of it
is in completely "finished" form. Across the genome, the
average DNA segment resides in a continuous gapless sequence
"contig" of 200,000 bases. The average accuracy of all of
the DNA sequence in this assembly is 99.9 percent.
The sequence information from the public project has been
continuously, immediately and freely released to the world,
with no restrictions on its use or redistribution. The
information is scanned daily by scientists in academia and
industry, as well as by commercial database companies
providing information services to biotechnologists.
Already, many tens of thousands of genes have been
identified from the genome sequence. Analysis of the
current sequence shows 38,000 predicted genes confirmed by
experimental evidence. There are many thousands of
additional gene predictions to be tested experimentally.
Dozens of disease genes have been pinpointed by access to
the working draft.
The consortium's goal for the spring of
2000 was to produce a "working draft" version of the human
sequence, an assembly containing overlapping fragments that
cover approximately 90 percent of the genome and that are
sequenced in "working draft" form, i.e.- with some gaps and
ambiguities. The consortium's ultimate goal is to produce
a completely "finished" sequence, i.e. one with no gaps and
99.99 percent accuracy. The target date for this ultimate
goal had been 2003, but today's results mean that the final,
stand-the-test-of-time sequence will likely be produced
considerably ahead of that schedule.
In a related announcement, Celera
Genomics announced today that it has completed its own first
assembly of the human genome DNA sequence.
The public and private projects use similar automation and
sequencing technology, but different approaches to
sequencing the human genome. The public project uses a
'hierarchical shotgun' approach in which individual large
DNA fragments of known position are subjected to shotgun
sequencing (i.e., shredded into small fragments that are
sequenced, and then reassembled on the basis of sequence
The Celera project uses a "whole genome shotgun" approach,
in which the entire genome is shredded into small fragments
that are sequenced and put back together on the basis of
The hierarchical shotgun method has the advantage that the
global location of each individual sequence is known with
certainty, but it requires constructing a map of large
fragments covering the genome. The whole shotgun method
does not require this step, but presents other challenges in
the assembly phase.
Both approaches align the sequence along the human
chromosomes by using landmarks contained in the physical map
produced by the Human Genome Project.
"The two approaches are quite complementary. The public
project and Celera plan to discuss the relative scientific
merits of the methods employed by the two projects. In the
end, the best approach may well be to use a combination of
the methods for sequencing future genomes," said Francis
Collins, M.D., Ph.D., director of the National Human Genome
Research Institute of the National Institutes of Health. In
fact, current plans by the public project to sequence the
genome of the laboratory mouse involve this hybrid strategy.
The Human Genome Project will now focus on
converting the "working draft" and near-"finished" sequences
to a "finished" form. This will be done by filling the gaps
in the "working draft" sequence and by increasing the
overall sequence accuracy to 99.99 percent. Although the
"working draft" version is useful for most biomedical
research, a highly accurate sequence that is as close to
perfect as possible is critical for obtaining all the
information there is to get from human sequence data. This
has already been achieved for chromosomes 21 and 22, as well
as for 24% of the entire genome.
Human Dna Variation
The greater-than-expected sequence
production has also yielded a bumper crop of human genetic
variations - called single nucleotide polymorphisms or SNPs.
The Human Genome Project had set a goal of discovering
100,000 SNPs by 2003. Already, with today's assembled
sequences and other data accumulated by The SNP Consortium,
scientists have now found more than 300,000 SNPs and will
likely have 1 million SNPs by year-end. These SNPs provide
a powerful tool for studies of human disease and human
Sequencing, which is determining the exact order of DNA's
four chemical bases, commonly abbreviated A, T, C and G, has
been expedited in the Human Genome Project by technological
advances in deciphering DNA and the collaborative nature of
the effort, which includes about 1,000 scientists worldwide
working together effectively.
The Human Genome Sequencing Project aims to determine the
sequence of the "euchromatic" portion of the human genome.
The "euchromatic" portion excludes certain regions
consisting of long stretches of highly repetitive DNA that
encode little genetic information, and that are not
recovered in the vector systems used by the genome project.
Such regions account for about 10% of the genome, and are
said to be "heterochromatic". (For example, the center of
chromosomes, called centromeres, consists of heterochromatic
The international Human Genome Sequencing consortium
includes scientists at 16 institutions in France, Germany,
Japan, China, Great Britain and the United States. The five
largest centers are located at: Baylor College of Medicine,
Houston, Texas; Joint Genome Institute in Walnut Creek, CA;
Sanger Centre near Cambridge, England; Washington University
School of Medicine, St. Louis; and Whitehead Institute,
Cambridge, Massachusetts. Together, these five centers have
generated about 82% of the sequence. The following list
provides more detail about the 16 centers and their
individual contributions to the Human Genome Project.
The project has been tightly coordinated so that no region
of the genome is left unattended to, and duplication is
minimized. Participants in the international consortium
have all adhered to the project's quality standards and to
the daily data release policy. The project is funded by
grants from government agencies and public charities in the
various countries. These include the National Human Genome
Research Institute at the National Institutes of Health, the
Wellcome Trust in England, and the US Department of Energy.
The total cost for the working draft is approximately $300
million worldwide, with roughly half ($150 million) being
funded by the US National Institutes of Health. The cost of
sequencing the human genome is sometimes reported as $3
billion. However, this figure refers to the original
estimate of total funding for the Human Genome Project over
a 15-year period (1990-2005) for a wide range of scientific
activities related to genomics. These include studies of
human diseases, experimental organisms (such as bacteria,
yeast, worms, flies and mice), development of new
technologies for biological and medical research,
computational methods to analyze genomes, and ethical, legal
and social issues related to genetics.
The sixteen institutions that form the Human Genome
Sequencing Consortium include:
1. Baylor College of Medicine, Houston, Texas, USA
2. Beijing Human Genome Center, Institute of Genetics,
Chinese Academy of Sciences, Beijing, China
3. Gesellschaft fur Biotechnologische Forschung mbH,
4. Genoscope, Evry, France
5. Genome Therapeutics Corporation, Waltham, MA, USA
- 6. Institute for Molecular Biotechnology, Jena, Germany
- 7. Joint Genome Institute, U.S. Department of Energy, Walnut
Creek, CA, USA
- 8. Keio University, Tokyo, Japan
9. Max Planck Institute for Molecular Genetics, Berlin,
10. RIKEN Genomic Sciences Center, Saitama, Japan
11. The Sanger Centre, Hinxton, U.K.
12. Stanford DNA Sequencing and Technology Development
Center, Palo Alto, CA, USA
13. University of Washington Genome Center, Seattle, WA, USA
14. University of Washington Multimegabase Sequencing
Center, Seattle, WA, USA
15. Whitehead Institute for Biomedical Research, MIT,
Cambridge, MA, USA
16. Washington University Genome Sequencing Center, St.
Louis, MO, USA
In addition, two institutions played a key role in providing
computational support and analysis for the Human Genome
Project over the course of the past eighteen months. These
- The National Center for Biotechnology Information at NIH
The European Bioinformatics Institute in Cambridge, UK
- Scientists at the University of California, Santa Cruz, and
Neomorphic, Inc. also assisted the assembly of the genome
sequence across chromosomes.