Using a combination of long read, single-molecular sequencing and highly accurate short reads technologies, scientists were finally able to assemble a complete genome sequence of Triticum aestivum, the species of wheat commonly used to make bread. Previously published versions of the bread wheat genome contained large gaps in its highly repetitive DNA sequence.

According to the Johns Hopkins scientists, bread wheat has one of the most complex genomes, containing an estimated 16 billion base pairs of DNA and six copies of seven chromosomes. By comparison, the human genome is about five times smaller, with about three billion base pairs and two copies of 23 chromosomes.

"The repetitive nature of this genome makes it difficult to fully sequence," says Steven Salzberg, Ph.D., Bloomberg Distinguished Professor of Biomedical Engineering at the Johns Hopkins University Whiting School of Engineering and the McKusick-Nathans Institute of Genetic Medicine at the Johns Hopkins University School of Medicine. "It's like trying to put together a jigsaw puzzle of a landscape scene with a huge blue sky. There are lots of very similar, small pieces to assemble."

Subscribe to eNewsletters
Get the latest industry news and technology updates
related to your research interests.

The newly assembled bread wheat genome, which cost $300,000 for the sequencing alone, took a year for the Johns Hopkins researchers to assemble 1.5 trillion bases of raw data into a final assembly of 15.34 billion base pairs. Study details were published last month in GigaScience.

Salzberg and his team used high throughput and nanopore sequencing to complete the project. High-throughput sequencing generates massive amounts of DNA base pairs very quickly and cheaply, although the fragments are very short, just 150 base pairs long for this project. To help assemble the repetitive areas, the Johns Hopkins team used nanopore sequencing, which forces DNA through tiny pores with an electric current running through them. The technology enables scientists to read up to 20,000 base pairs at a time by measuring changes in the flow of the current as a strand of DNA passes through the pore.

 wheat

Salzberg says that sequencing a genome of this size requires not only genetic expertise, but also very large computing resources available at relatively few research institutions around the world. The team relied heavily on the Maryland Advanced Research Computing Center, a computing center shared by Hopkins and the University of Maryland, which has over 20,000 computer cores (CPUs) and over 20 petabytes of data storage. The team used approximately 100 CPU years to put this genome together. 

Image: Triticum aestivum just before harvesting. Image courtesy of Wikimedia commons.