# Allowlisting the CellTag plasmid library to obtain a list of high confidence barcodes CellTag constructs are made available as lentiviral plasmid libraries. While the theoretical diversity of CellTags is very high (65,536 for the original tags and ~68 billion for the 18N libraries) the real barcode diversity in the plasmid library is limited due to bottlenecks during the process of synthesis, cloning etc. To identify CellTag barcodes present in the plasmid library, we perform an allowlisting step. The detailed methodology for generating sequencing libraries for allowlisting is described in [Jindal et al. Nat. Biotech. (2023)](https://www.nature.com/articles/s41587-023-01931-4). The following text outlines the computational workflow for processing the sequencing data. Scripts and notebooks are available at our [GitHub repo](https://github.com/morris-lab/newCloneCalling/tree/main): - Obtain Read 1 (R1) fastq files for the 2 replicates and parse celltag reads from each by running the following script: `allowlisting_scripts/parse_fq_allowlisting.sh ` - Error correct identified CellTag barcodes using starcode (set distance threshold to 4): `allowlisting_scripts/starcode_collapse.sh ` - Use `allowlisting_scripts/create_allowlist.ipynb` to identify list of barcodes present in both replicates to obtain the final allowlist The allowlist for the multi-v1 library used in our paper has been provided in our [GitHub repo](https://github.com/morris-lab/newCloneCalling/tree/main) at `misc_files/18N-multi-v1-allowlist.csv`. The allowlists for 8N-v1,v2 and v3 libraries have been made available on [addgene](https://www.addgene.org/pooled-library/morris-lab-celltag/)