CRISPR-Cas9 has provided scientists with a powerful tool to make targeted and specific changes to the genome used in applications such as functional genomic screening, creation of transgenic animal models, crop breeding, and precision medicine. The system is simple—a short guide RNA (sgRNA) and Cas9 protein delivered to cells is enough to make specific changes anywhere you want in the genome. And it’s flexible, enabling a range of modifications—from gene knockout (CRISPR-KO) and modulation of gene expression (CRISPRi and CRISPRa), to specific edits to the genetic sequence.

But embarking on a CRISPR experimentis not a trivial undertaking, and the key to success is choosing the optimal target site and then designing the right sgRNA. A good sgRNA should efficiently recruit the Cas9 endonuclease to the target site to introduce a double-strand break, while at the same time display minimal off-target activity. Choosing the best sgRNA for your CRISPR experiment will make all downstream processes, especially the interpretation of your results, much easier. This article discusses what to consider when designing your sgRNA, what makes a good CRISPR guide, and the tools available to help you choose the right sgRNA and ensure success in your CRISPR experiments.

Not all sgRNA are created equal

During gene editing, the Cas protein scans and binds PAM sites (Protospacer Adjacent Motif)—if the sequence next to the PAM site shares significant sequence homology to the sgRNA, then the Cas endonuclease activity will be activated and CRISPR gene editing will ensue. Cas9, the most commonly used Cas from S.pyogenes, targets the 3’-NGG-5’ PAM site which occurs on average once every 8bp throughout the human genome, meaning there are many potential sgRNA to choose from. But it’s important to remember that there is significant variation in the cleavage efficiency of certain sgRNA, as well as potential for off-target activity. But fortunately, with good sgRNA design, you can circumvent any potential issues arising from efficiency and/or specificity.

Predicting on-target activity

To understand the inherent variability in CRISPR-mediated gene editing across different target sites, researchers have turned to large-scale CRISPR screening. Early work from the Doench lab screened 1,841 sgRNA’s to determine which sequence features led to efficient protein knockdown.1 This data was then used to generate scoring rules, and develop a computational model (Rule set 1, and later Rule set 2)2 to predict sgRNA efficacy. Subsequent studies have identified those characteristics that contribute to a high cleavage efficiency, such as nucleotide composition, GC content, chromatin accessibility, and energetics of sgRNA interactions. More recently, Michlits et al have developed a new prediction model, the Vienna Bio-activity CRISPR (VBC) Score,3 based on the observation that superior editing is mainly determined by the impact of in-frame mutations on protein function—and so these effects could be predicted based on amino acid composition and conservation. Consequently, the VBC score integrates these and other protein features to give improved predictors of sgRNA activity and indel-formation to establish a score that captures all relevant steps of CRISPR/Cas9 mutagenesis.

There are now many computational methods for designing highly efficient sgRNAs that provide an incredibly powerful and useful resource to the CRISPR researcher.4 But it is important to remember that each prediction model is unique, having been trained on different experimental data, and employing different machine learning techniques or scoring methods. For example, Chop Chop uses empirical experimental data from publications, whereas CasFinder instead uses mismatch data to determine efficiencies based on the number and position of mismatches. The method with which CRISPR/Cas9 activity has been measured could also affect the predictive model, with potential differences between sequencing-based data versus phenotypic screens. A 2016 study by Haeussler et al compared different models and showed they were only consistently accurate when used on their original training dataset.5 It is therefore critical to consider the experimental conditions used in the creation of the model and ensure that conditions are similar to those in the planned experiment.

Minimizing off-target effects

A remaining challenge to the CRISPR/Cas9 technology is the possibility of off-target activity—the unintended cleavage at sites that share sequence homology to the sgRNA outside of the target region. Simple alignment-based methods, such as Bowtie and BWA, have been used to align the sgRNA to a reference genome to identify potential off-target sites, but these repurposed tools may not always be up to the task—studies using pre-validated CRISPR off-targets showed that these tools did not always identify them, even with only a single mismatch.6

Once again, computational scoring methods are available to help score sgRNA on their off-target activity. Popular methods include the MIT-Broad score and CFD score, developed from work using a series of sgRNA with mutations to represent one, two, and three mismatched bases to construct a Linear-Regression algorithm to score the off-target sites.2,7 Further models have since been developed adding to this by including additional features that may affect off-target activity, such as sgRNA secondary structure and the location of the target site in the genome.8,9

It’s all a balance

Selecting the best sgRNA for your experiment can sometimes require a compromise—the most efficient guide will be no use if the position is not optimal for your experiment, or you may have to sacrifice efficiency for high specificity. But web-based design tools have taken much of the hard work away, crunching the numbers from genome-wide screens to create powerful computational prediction tools, providing a list of optimal sgRNA for your experiment. Care must be taken when selecting your tool, ensuring that the basis of the algorithm is in line with your experimental requirements and recent understanding of best practices. By selecting the right design tool and combining this with a careful assessment of off-target activity risk and optimization of your chosen sgRNA, achieving a successful gene-editing experiment is now within the capabilities of almost every biological lab.

The CRISPR Toolkit

Since its debut in 2013, the CRISPR toolkit has been expanded and a range of edits are available for researchers—so a key factor that will influence your sgRNA design is the type of CRISPR experiment you are planning to do:

  • CRISPR-knockout (CRISPR-KO): the most common way to use CRISPR is to knockout the target gene by introducing a double-strand break at the target site with Cas9 activating the non-homologous end joining (NHEJ) cellular DNA repair pathway, resulting in the introduction of insertions or deletions (INDELs) at the target site. These INDELs often lead to a frameshift mutation and subsequent knockout of the target gene, so when designing sgRNA for gene knockout it’s a good idea to target protein-coding regions to have a better chance at producing a non-functional protein. You should also try to avoid sites close to the N-terminus to prevent the use of alternative start codons but remember that you can target either the coding or non-coding strand for successful CRISPR-KO.
  • Gene knock-in: Using CRISPR to perform gene knock-ins uses a similar process to CRISPR-KO—but you instead deliver an exogenous DNA template along with the CRISPR machinery. The aim here is to activate the homology-directed repair (HDR) pathway to fix the double-strand break, so the cell uses your chosen template to replace the broken section. In this way, you can make changes to the genetic sequence, such as incorporation of an exogenous tag or alterations to specific bases. For these targeted gene-editing applications, sgRNA design is constrained by the location of your desired edit—but you should aim to design your sgRNA within 30 nucleotides from the proximal ends of the repair template, which will be most efficacious.
  • Modulation of gene expression with CRISPRi and CRISPRa: The development of a catalytically inactive version of Cas9 (dCas9), has given researchers a finer tool with which to alter gene expression, rather than the all-or-nothing gene knockout. Using dCas9 alone or tethered to a transcriptional inhibitor or activator, allows for knockdown or activation of gene expression. When designing sgRNA for CRISPRi or CRISPRa, aim for the promoter region of your target gene—for CRISPRa, the sgRNA should target upstream of the transcriptional start site, whereas for CRISPRi, sgRNA should target sequences downstream of the TSS within the promoter.

References

1. Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nature Biotechnology 32, 1262–1267 (2014).

2. Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nature Biotechnology 34, 184–191 (2016).

3. Michlits, Georg et al. “Multilayered VBC score predicts sgRNAs that efficiently generate loss-of-function alleles.” Nature Methods vol. 17,7 (2020): 708-716. doi:10.1038/s41592-020-0850-8

4. Liu, G., Zhang, Y. & Zhang, T. Computational approaches for effective CRISPR guide RNA design and evaluation. Computational and Structural Biotechnology Journal 18, 35–44 (2020).

5. Haeussler, M. et al. .Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biology 17, (2016).

6. Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nature Biotechnology 33, 187–198 (2015).

7. Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nature Biotechnology 31, 827–832 (2013).

8. Listgarten, J. et al. Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs. Nature Biomedical Engineering 2,38–47 (2018).

9. Abadi, S., Yan, W. X., Amar, D. & Mayrose, I. A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action.PLoS Computational Biology 13, (2017).