Functional genomics has been given a recent boost with the development of CRISPR-Cas9, a powerful gene-editing technology that can be used to systematically knock out gene function on a genome-wide scale. A high-throughput pooled library screen with CRISPR involves delivery of a sgRNA library by lentiviral vector resulting in the perturbation of every gene in the genome, followed by application of the required screening condition (for example treatment with a drug or toxin) and then identification by next-generation sequencing of those sgRNA that have been enriched or depleted. Pooled library screening with CRISPR-Cas9 is now a popular means to perform genome-wide loss of function studies, linking genotype with phenotype to characterize gene function, enable elucidation of biological pathways, and identify potential targets for drug discovery.

This article discusses the steps to take once the wet lab portion of the screen is complete —the analysis required to generate a gene hit list and the validation experiments to take the findings of your CRISPR screen forward.

Prepare for sequencing

As the random integration of the sgRNA cassette by the lentiviral vector provides each cell with a barcode, the majority of screens involve PCR amplification of the lentiviral backbone that contains the sgRNA in preparation for sequencing. After the screen is complete, genomic DNA from the selected cell population is collected, purified, and then sequenced by massively parallel sequencing. At minimum, cell pellets will be collected at baseline, as well as from the experimental and control populations with a sufficiently large enough cell population to maintain library diversity—typically 300–1000-fold representation, involving 100–200 million cells.

Maintaining library representation is critical to a successful screen, and especially important when attempting to identify small changes in sgRNA representation against a high background. Sequencing primers should be designed with a staggered approach, to minimize bias and ensure sgRNA complexity is maintained. An often-overlooked step is to ensure that the method of DNA purification chosen has been optimized and is able to be conducted at scale, so as to not overload the purification approach and potentially lose sgRNA non-specifically.

Each CRISPR library will require a different level of sequencing depth for sufficient interrogation—a positive selection screen that aims to identify those sgRNA that have been enriched with a strong selection pressure will require only a few million reads, whereas a negative, or dropout, screen will need sequencing at a greater depth of up to 1x108 reads to successfully identify subtle changes in sgRNA representation. Tools such as Preseq can help you predict library complexity and therefore optimize the sequencing depth for your chosen library.1 

QC the data

After screening, you will receive your FASTQ file containing your sequencing reads. Downstream analysis requires high-quality sequencing data and so a crucial first step is an initial quality control of the data—checking replicate correlation and technical variability will ensure any outliers, or sampling errors, are omitted from subsequent analysis. A high Phred score can be used as a measure of good sequencing quality—you may wish to visually compare representation between libraries, for example between baseline and after the screen. A waterfall plot displaying sgRNA representation relative to median abundance in the library is an easy way to see how the distribution of sgRNAs has altered over the screen. Alternatively, plotting cumulative frequency of reads can also highlight any problems that may have occurred during the screening process.

Quality control and subsequent downstream analysis can be aided with the addition of a positive control—however, in some cases the identification of positive genes is the aim of the screen and so this won’t be possible. Curated gold standard sets of essential and non-essential genes can be a useful resource for benchmarking your screen, including confirmation of Cas9 activity, sufficient sampling, and ensuring quality of sample processing.2

Bioinformatics and hit identification

Once you are confident in the quality of your screening data, you can then move onto analysis and determination of the statistical significance of each sgRNA change. An easy starting point is to look at the raw log fold change of each sgRNA, comparing the abundance of guides at timepoints or treatment conditions to identify hits. Hit identification leverages the inherent redundancy of library design, which includes multiple sgRNAs per gene—if all sgRNA targeting a single gene are present (for a positive screen) or absent (for a negative screen), this is more than likely to be a significant hit.

For more complete analysis, including guide- and gene-based rankings of hits, and where a more sensitive analysis is required for subtle changes in sgRNA representation to be identified against high noise, there are several software packages available—including MAGeCK, drugZ, and BAGEL. Each utilizes a different algorithm, and choice of design package is dependent on the choice of library and experimental conditions. Alternatively, you could employ a contract research organization (CRO) to perform the necessary bioinformatics and data analysis workflow (as discussed here) to outsource this element of the CRISPR screening workflow.

Moving forward with your hits

So you now have a hit list from your successful screen—what next? A good subsequent step is to conduct a follow-up, more focused study using all the genes that scored highly in your primary screen. This should result in a significant reduction in library size and complexity, allowing more sgRNA to be targeted per gene and therefore produce a cleaner dataset. A smaller library also comes with the added benefit of being able to be used in multiple cell lines, therefore increasing confidence in the biological relevance of hits. You could also think about tiling the sgRNAs across the genes in question, enriching for areas within functional domains as shown by Shi et al.3 This not only provides potential drug target validation information but is also good for poorly annotated genes where there is little functional information available. Alternatively, you could screen with an alternative technology, such as CRISPRi/a—although gene knockout is a powerful means to identify hits, it may not accurately mimic biological function or allow changes to essential genes under the screening condition to be evaluated.

Once the hits have been identified, the process of validation begins. The first step is to technically validate your findings—downstream steps involve confirmation of protein loss by western blot, or PCR across the cut site to confirm Cas9 activity. For CRISPRi/a screens quantitative PCR can be used to confirm changes in transcript levels. In both cases it is important to include appropriate controls—for CRISPR knockout make sure you include multiple sgRNA that target each gene in your validation steps to confirm that knockdown is due to specific on-target activity. For CRISPRi/a ensure that you include non-targeting sgRNA in your RT-qPCR validation.

Once you are happy with the technical validation, you can then move on to exploring and characterizing the hits from your primary and follow-up screens—experiments could include target essentiality assays, use of commercially available inhibitors, cDNA or functional hypomorph rescue of phenotype, and functional evaluation of biomarkers for on-target activity. Ideally your validation would involve identifying each hit in an orthogonal secondary screen dependent on the screening condition, to separate out any false positives. For example, in a primary screen where sgRNA have been enriched in the presence of an inhibitor, the gold standard would involve cell-line engineering to create cell lines where your hits are knocked down and subsequently tested with the appropriate biochemical assay to ensure involvement.

Analyze, identify, and validate

Large-scale CRISPR screens can now be performed within the technical capability of most laboratories, but the analysis portion of the workflow does require bioinformatic know-how to crunch the numbers of a large dataset, apply an appropriate statistical test, and generate an accurate hit list. A lot can be achieved with minimal computational and mathematical expertise—but there are also options available for those needing help, including easy-to-use design packages or the assistance of a service provider, so that you can take your CRISPR screening results to the next level.

References

1. Deng, C., Daley, T. & Smith, A. Applications of species accumulation curves in large-scale biological data analysis. Quantitative Biology 3,135–144 (2015)

2. Hart, T. et al. Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens. G3: Genes, Genomes, Genetics 7, 2719–2727 (2017).

3. Shi, J. et al. Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains. Nature Biotechnology 33, 661–667 (2015).