Methods
Plasmid construction
All gRNAs used for HEK293T cells and mESCs targeting have been cloned into the double BbsI sites of pX330-vector (Addgene ID 42230). The plasmid used for CH12F3 cell targeting was an optimized vector in which we removed the AAV2 ITR sequence and introduced a mCherry gene with CMV promoter by Gibson assembly into pX330 vector.
Cell culture and plasmid transfection
The mESCs were cultured in ES-DMEM medium (Millipore) with 15% fetal bovine serum (FBS, ExCell Bio), Penicillin/Streptomycin (Corning), Nucleotides (Millipore), L-Glutamine (Corning), Nonessential Amino Acids (Corning), PD0325901 (Selleck), CHIR99021 (Selleck) and LIF (Millipore) at 37°C with 5% CO2. mESCs in 6-cm dishes were transfected with 7.2 μg pX330-Cas9 plus 1.8 μg GFP expression vector by 4D-nucleofector X (Lonza, solution Cytomix, program GC104), then harvested for genomic DNA 3 days after transfection.
The wild-type, Ku80-/-, Lig4-/, Parp1-/-, and AID-/- CH12F3 cells were cultured in RPIM1640 medium (Corning) with 15% Fetal Bovine Serum (FBS, ExCell Bio), HEPES (Corning), Penicillin-Streptomycin (Corning), L-Glutamine (Corning), Nonessential Amino Acids (Corning), Sodium Pyruvate (Corning) and β-Mercaptoethanol (Sigma-Aldrich) at 37°C with 5% CO2. Growing CH12F3 cells were transfected with 1.5 μg pX330-Cas9 or pX330-Cas9-mCherry expression vector per million by 4D-nucleofector X (Lonza, solution M1, procedure DN100) and seeding at 0.5 million cells/mL in fresh medium with 1μg/mL anti-CD40, 5 ng/mL IL-4, and 0.5 ng/mL TGF-β. After 72 hrs stimulation, the cells were harvested and genomic DNA was extracted for PEM-seq library construction.
PEM-seq and 3C-HTGTS
The primers and gRNAs used for library construction are listed in table S2 and table S3, respectively. The PEM-seq libraries were constructed according to the standard procedure described previously (Yin et al. 2019). About 20 μg genome DNA from edited cells were used for each library. Primer control libraries were done with Cas9-infected cells with no gRNA.
The 3C-HTGTS libraries were constructed following the previously described procedures (Jain et al. 2018). Briefly for preparing the 3C-HTGTS libraries, 5-6 million cells were incubated with 1% formaldehyde for 10 min at room temperature and glycine was added to a final concentration of 125 mM to stop the cross-linking reaction. Then cell lysis buffer containing 10mM Tris-HCl (pH 8.0), 10mM NaCl, 0.2% NP-40, 10mM EDTA was used to lysis cell and prepare nuclei. Then the nuclei restriction enzyme (RE) digestion was performed by incubating with 700 units of Dpn II restriction enzyme overnight at 37°C, and the digestion efficiency was checked by DNA gel electrophoresis. Re-ligate the DNA sequence at 16°C for 4 hrs to overnight under dilute conditions. De-crosslink the nuclei by incubating the DNA with Proteinase K at 56°C by rotating overnight. Finally, the purified DNA after RNase A treatment was the “3C templates” and then subsequently prepared the 3C library as the same as PEM-seq library construction.
All the libraries were sequenced by Hiseq.
PEM-Q analysis
Before PEM-Q analysis, raw reads were pre-processed as we did in the previous method (Yin et al. 2019). We used cutadapt (http://cutadapt.readthedocs.io/en/stable/) to remove the universal adapters. Reads ending with QC < 30 were trimmed; remaining reads larger than 25 bp were kept for library demultiplex by fastq-multx (https://github.com/brwnj/fastq-multx). Reads after demultiplex were analyzed by PEM-Q in 5 steps.
reads alignment
- Download figure
- Open in new tab
- To begin with, R1 and R2 of pair-end reads generated by Hiseq were stitched using flash 1.2.11 (https://ccb.jhu.edu/software/FLASH/) with default parameters. Then the stitched reads, along with unstitched R1 reads were aligned to reference genome (hg38 for human, mm10 for mouse) by bwa-mem. Reads were kept if their alignment start sites were around primer start with an error less than 4 bp. Meanwhile, R2 reads were aligned to the blue adapter, which was used to find random molecular barcode (RMB, equal to unique molecular index) in step 2. Mapped reads with the wrong primer location were discarded in this step.
RMB extract We kept reads with the correct blue adapter allowing at most 2 bp truncation. Then, RMB within 2-bp loss in length were extracted according to blue adapter location. RMB was recorded in a separated file with sequence name (Qname). Reads with multiple tandem adapters were filtered in this step.
find chimeric alignment
- Download figure
- Open in new tab
- Chimeric reads were reported in SA tag in bwa-mem. Sequence aligned to primer was bait while the other side was prey. We then kept reads that only reported one chimeric junction and recorded their information as prey in a tab file. Reads with bait alignment not exceeding 10 bp after primer binding site were discarded. Extra bases between bait and prey were extracted and recorded as insertions. For those without insertions, we identified overlapped bases as microhomology between the end of bait and the start of prey. Reads that did not have chimeric alignment were further analyzed in step 4.
find indels
- Download figure
- Open in new tab
- Reads without chimeric alignment were linear alignment. Linear alignment length not exceeding 10 bp after cut-site of CRISPR was discarded. The remaining were processed to find indels. Insertions and deletions were reported by “I” and “D” from CIGAR reported by bwa-mem. The same bases at the ends of deletions were identified as microhomology. Substitutions were also aware of PEM-Q and we identified substitutions according to MD tags reported by bwa-mem. The remaining reads without chimeric alignment or indels were recorded as germline.
Classify and deduplicate Reads that have both bait and prey aligning to target chromosomes with inserted sequences were classified as insertions. Those without inserted sequences but with a distance between bait and prey no more than 500 kb were classified as deletions in this study. Reads with a distance between bait and prey exceeded 500 kb were classified as intra-chromosomal translocations, while those with prey from other chromosomes were classified as inter-chromosomal translocations. RMB extract in step 2 was relocated to reads according to their sequence name. Within each type of variants we classified, duplicates were removed according to prey’s alignment information including chromosome, strand, junction, and bait end together with RMB.
Additional program: Vector (plasmid) analysis
There are two main types of vector integrations as described in the text. One is short vector insertions that the entire inserted fragments can be aligned to the vector backbone. The others with too long inserted fragments are discarded in PEM-Q. However, the second type still has potential large vector integrations. Therefore, we remapped these discarded reads to the genome and then the vector backbone to find missed vector integrations. We used bwa-mem to do the alignment with a default seed length of 20 bp.
Off-target and TSS analysis
Off-target identification was described previously (Yin et al. 2019), using MACS2 callpeak and a commonly used criteria. For TSS analysis, we used computeMatrix (deeptools 3.1.3) to calculate the signals in RNA Pol II ChIP-seq data, using parameters “-a 50000 -b 50000 -bs 1000”. As for PEM-seq data, we used the same algorithm described before (Zhang et al. 2012) to assign junctions to the nearest TSSs.