As we are confronted with more and more data, organizing and keeping track of samples and the next-generation sequencing (NGS) data associated with it can become a challenge. Even for the research lab that doesn’t have to deal with GMP, ISO, CLIA, FDA, or a myriad of other letters, the wealth of information can quickly overwhelm a paper notebook. How was the sample derived, and where and under what conditions was it stored? How pure and at what concentration was the DNA derived from it, and what protocols and reagents were used to create the library? Let alone who the technicians were at each step, what instruments were used, and how the data was analyzed.

While smaller labs with smaller projects can still handle their information the old-fashioned way, scientists are increasingly moving toward electronic solutions. These may include spreadsheets, information management software like electronic lab notebooks (ELNs), or even whole laboratory information management systems (LIMSs) integrated with automation.

Paper or pixel?

It used to be easy enough to write “Rat 3A” on a tube and note the details of the blood draw in a paper notebook, or better yet, in an app like Microsoft OneNote. Take out some blood, extract the DNA, copy down numbers from the spectrophotometer, staple in traces from a Bioanalyzer, and jot down a few calculations to determine how much is needed for sequencing. The NGS results are stapled to the page. The rest of the blood and extracted DNA go into a box in the freezer with today’s date in case they’re needed later.

But chances are that multiple loci are being queried. That same blood may also be needed to interrogate protein or metabolites as well, and the same subject may be tracked for imaging or histology. It may be part of a longitudinal study. And there are likely more than one subject, let alone controls.

“Now when you run one of these experiments there are reams and reams of valuable data, and gigs and gigs of information,” points out Stacey Willard, key account manager with Bio-ITech products, a subsidiary of Eppendorf. To track that subject, link it to its cohort, know how it was treated, and associate all of the assays would require thumbing back through the notes.

“Wouldn’t it be so much easier if you had that as a sample in your electronic system, and while you’re taking notes right in your ELN you could literally click a button and add that rat right to your experiment?,” Willard asks. “In a year when you’re writing it up and you have to recover all that information, you actually have a searchable database.”

Say other information needs to be garnered from those samples later, or you need to accumulate samples until you’re ready to perform your experiments. Placing the samples in indexed positions and noting those in an inventory management or location management system—which itself can be integrated with the ELN—allows samples to be found at the click of a mouse.

Barcoding

Genetic (DNA) barcodes are added to samples during NGS library preparation, allowing the libraries to be pooled and the sequencing reads traced back to the samples of origin.

Physical barcodes are also used in tracking samples. Here a two- or three-dimensional cypher denoting a sample or set of samples is affixed to a tube or a plate. The barcode can be read by a handheld scanner connected by Bluetooth to a laptop, for example, or by a reader built into instrumentation like plate readers. This automatically logs the identity of the sample along with any other information pre-programmed to be associated with the barcode.

The scheme helps track in other ways as well. “You can put a barcode on the shelves of your freezers, and when you open the door you read the sample ID and then you read the barcode from the freezer,” says Willem van Loon, director genomics automation at Tecan. Reagents, too can be barcoded, allowing the system to record the lot number and expiration date of the master mix used for the PCR step, for instance, and associate it with the sample.

Automation

Of course, “tracking is easy when you automate,” says Kevin Miller, senior market segment leader at Hamilton Company. With fully automated and integrated mechanical and informatics, robots can find samples and reagents that have already been entered into the system, performing operations on them prescribed by a worklist the system has created. Lasers and cameras tell the system where each sample is at all times, and log the time each step is performed. Data from previous steps determine the next steps—for example, the amount of each reagent to be added to normalize a reaction. All the data associated with the samples are linked in a log file detailed enough to stand up as evidence in court.

Such levels of automation and tracking—which may even include documenting personnel and equipment involved—may be a godsend to clinical, forensics, and other highly regulated labs. But it’s likely overkill for others. An average academic research facility just wants to know that they “didn’t mix up any samples,” notes van Loon. For that, “it’s probably just a simple barcode that you read from the beginning and then at the end of the process.”

At best.

Barcodes aren’t even in the picture for the University of California Davis Genome Center’s DNA Technologies and Expression Analysis Cores. Researchers submit an Excel spreadsheet detailing which wells contain which samples—information that is copied into a Google document, which is then printed out “so that we can make notes on what’s happening to the samples,” such as QC, explains director Lutz Froenicke. The entire project is assigned a unique ID, which lets the researchers access their sequence data online.

As for what happens at other academic cores, Lutz says that “everybody does it a little different, but with regard to sophistication it’s about the same.”