Clients in the News – Search Technique Helps CMU Researchers Find DNA Sequences in Minutes

Associate Professor of Computational Biology Carl Kingsford has developed a new method that will allow researchers to search databases of DNA sequences in a matter of minutes, not days.

Database searches for DNA sequences that can take biologists and medical researchers days can now be completed in a matter of minutes, thanks to a new search method developed by computer scientists at Carnegie Mellon University.

The method developed by Carl Kingsford, associate professor of computational biology, and Brad Solomon, a Ph.D. student in the Computational Biology Department, is designed for searching so-called “short reads” — DNA and RNA sequences generated by high-throughput sequencing techniques. It relies on a new indexing data structure, called Sequence Bloom Trees (SBTs), that the researchers describe in a report published online today by the journal Nature Biotechnology.

The National Institutes of Health maintains a huge database, called the Sequence Read Archive, that contains about three petabases, or sequences totaling three quadrillion base-pairs. The information is useful to a wide swath of researchers, from those asking questions about basic biological processes to those studying potential cancer cures.

“The database contains untold numbers of as-yet undiscovered insights and is heavily used,” Kingsford said. “Its main problem is that it’s very difficult to search.”

read more…