Tools Module (tl)
- celltag_tools.tools.read_celltag(celltag_path, sample_prefix=None, assay='RNA', triplet_th=1, starcode_th=2, starcode_path=None, allowlist_path=None, inplace=True)
Reads and processes CellTag read data from file paths, applying filtering, error correction, and allowlisting. Optionally returns a new CellTagData object or the processed data.
- Args:
- celltag_path (str | list[str]):
Path or list of paths to the CellTag read files (TSV format).
- sample_prefix (str | list[str], optional):
Prefix or list of prefixes to be added to CellTag barcodes. If not provided and multiple paths are specified, prefixes are autogenerated.
- assay (str, optional):
Single-cell assay type. Must be either “RNA” or “ATAC”. Defaults to “RNA”.
- triplet_th (int, optional):
Threshold for filtering out read triplets (UMI or read occurrences) below this count. Defaults to 1.
- starcode_th (int, optional):
Edit distance threshold for collapsing barcodes via Starcode. Defaults to 2.
- starcode_path (str, optional):
Path to the Starcode installation directory. Must contain the executable ‘starcode’.
- allowlist_path (str, optional):
Path to the allowlist file (TSV) containing valid CellTags.
- inplace (bool, optional):
If True, returns a CellTagData object with the processed data set inside it. If False, returns a tuple containing (processed reads, thresholds, sequencing saturation). Defaults to True.
- Returns:
- CellTagData:
If inplace=True, returns a CellTagData object with ct_reads, thresholds, and seq_sat set.
- tuple:
- If inplace=False, returns a tuple of:
pd.DataFrame: Processed CellTag read data.
dict: Dictionary containing thresholds {‘starcode’: starcode_th, ‘triplet’: triplet_th}.
float: Sequencing saturation percentage.
- Raises:
- ValueError: If any of the provided file paths do not exist, if Starcode is not found,
if the allowlist is missing, or if assay is invalid (“RNA” or “ATAC” only).
- celltag_tools.tools.create_allow_mtx(ct_obj, overwrite=False, inplace=True)
Creates a sparse allow matrix (cell x CellTag) in the provided CellTagData object, using either UMI counts (for RNA) or read counts (for ATAC).
- Args:
- ct_obj (CellTagData):
A valid CellTagData object containing the ‘ct_reads’ attribute.
- overwrite (bool, optional):
If False (default), raises an error if an allow matrix already exists. If True, overwrites the existing allow matrix.
- inplace (bool, optional):
If True (default), updates the allow_mtx attribute within the ct_obj. If False, returns the created allow matrix and associated row/column labels.
- Returns:
- tuple:
- If inplace=False, returns (allow_mtx, allow_rows, allow_cols), where:
allow_mtx (scipy.sparse.csr_matrix): The constructed allow matrix.
allow_rows (list): List of cell barcodes (rows).
allow_cols (list): List of allowed CellTags (columns).
- Raises:
- ValueError: If ct_obj is not a CellTagData object or if the allow matrix already exists
and overwrite=False.
- celltag_tools.tools.create_bin_mtx(ct_obj, bin_th=1, overwrite=False, inplace=True)
Binarizes the allow matrix from a CellTagData object. The resulting matrix is stored in bin_mtx if ‘inplace=True’, or returned. Values above bin_th are set to True (1), else False (0).
- Args:
- ct_obj (CellTagData):
A valid CellTagData object containing an allow matrix in allow_mtx.
- bin_th (int, optional):
Threshold for binarization. Defaults to 1.
- overwrite (bool, optional):
If False (default), raises an error if a binarized matrix already exists. If True, overwrites the existing bin_mtx.
- inplace (bool, optional):
If True (default), updates bin_mtx attribute within ct_obj. If False, returns the binarized matrix and associated row/column labels.
- Returns:
- tuple:
- If inplace=False, returns (ct_bin_mtx, cells, celltags), where:
ct_bin_mtx (scipy.sparse.csr_matrix): The binarized matrix.
cells (list): List of cell barcodes (rows).
celltags (list): List of CellTags (columns).
- Raises:
- ValueError: If ct_obj is not a CellTagData object or if the allow_mtx is missing/invalid,
or if a binarized matrix already exists and overwrite=False.
- celltag_tools.tools.create_metric_mtx(ct_obj, met_lower=1, met_upper=25, overwrite=False, inplace=True)
Performs metric-based filtering on the binarized cell x CellTag matrix to remove cells with too few or too many CellTags (defined by met_lower and met_upper). Produces a filtered matrix stored in metric_mtx if ‘inplace=True’, or returns it.
- Args:
- ct_obj (CellTagData):
A valid CellTagData object containing a binarized matrix in bin_mtx.
- met_lower (int, optional):
Minimum number of CellTags required for a cell to be retained. Defaults to 1.
- met_upper (int, optional):
Maximum number of CellTags allowed for a cell to be retained. Defaults to 25.
- overwrite (bool, optional):
If False (default), raises an error if a metric matrix already exists. If True, overwrites the existing metric_mtx.
- inplace (bool, optional):
If True (default), updates metric_mtx attribute within ct_obj. If False, returns the filtered matrix and associated row/column labels.
- Returns:
- tuple:
- If inplace=False, returns (celltag_mat_met, cells_met, celltags_met), where:
celltag_mat_met (scipy.sparse.csr_matrix): The filtered (metric) matrix.
cells_met (ndarray): Array of filtered cell barcodes (rows).
celltags_met (ndarray): Array of filtered CellTags (columns).
- Raises:
- ValueError: If ct_obj is not a CellTagData object, if bin_mtx is missing/invalid,
or if a metric matrix already exists and overwrite=False.
- celltag_tools.tools.call_clones(ct_obj, jaccard_th=0.7, return_graph=False, overwrite=False, inplace=True)
Identifies clonal relationships among cells based on the Jaccard similarity of their CellTag profiles. Optionally returns the Jaccard matrix, a graph representation, and a clone table, or stores them within the given CellTagData object.
- Args:
- ct_obj (CellTagData):
A valid CellTagData object containing a filtered matrix in metric_mtx.
- jaccard_th (float, optional):
Threshold for Jaccard similarity to consider cells part of the same clone. Defaults to 0.7.
- return_graph (bool, optional):
If True, additionally returns the graph representation of cell clones. Defaults to False.
- overwrite (bool, optional):
If False (default), raises an error if the Jaccard matrix or clone table already exist. If True, overwrites existing data.
- inplace (bool, optional):
If True (default), updates the ct_obj with jaccard_mtx, clone_table, and optionally clone_graph if return_graph=True. If False, returns the requested data.
- Returns:
- tuple | None:
- Depending on inplace and return_graph:
If inplace=False and return_graph=False: (jac_mat, clones).
If inplace=False and return_graph=True: (jac_mat, clone_graph, clones).
- If `inplace=True function sets the following attributes on the CellTagData object:
ct_obj.jaccard_mtx
ct_obj.clone_table
ct_obj.thresholds[“jaccard”]
ct_obj.clone_graph (only if return_graph=True)
- Where:
jac_mat (scipy.sparse.csr_matrix): Jaccard similarity matrix.
clone_graph (networkx.Graph): Graph where nodes represent cells, and edges represent similarity > jaccard_th.
clones (pd.DataFrame): Table mapping cells to their assigned clones.
- Raises:
- ValueError: If ct_obj is not a CellTagData object, if metric_mtx is missing/invalid,
or if jaccard_mtx or clone_table already exist and overwrite=False.
- celltag_tools.tools.assign_fate(ct_obj, fate_col='day', fate_key='d5', cell_type_key='cell_type2', inplace=False)
Assigns a “fate” to each clone in a CellTagData object’s clone table, based on the most frequent cell type (cell_type_key) present at a specified time point (fate_key in column fate_col).
For each clone (identified by clone.id), the function finds rows in the clone table where fate_col == fate_key and determines the most common cell_type_key among those rows. This value is assigned as the clone’s “fate,” along with the percentage of cells (fate_pct) that match this fate within that clone at fate_key. If no cells meet the fate criteria (e.g., time point is missing), the clone is labeled with fate=’no_fate_cells’ and fate_pct=0.
- Args:
- ct_obj (CellTagData):
A valid CellTagData object containing clone_table.
- fate_col (str, optional):
Column name in clone_table that defines the time point or condition used to assign fate. Defaults to ‘day’.
- fate_key (str, optional):
A value in fate_col specifying which rows represent the “fate” condition. Defaults to ‘d5’.
- cell_type_key (str, optional):
Column name in clone_table that specifies the cell type. Defaults to ‘cell_type2’.
- inplace (bool, optional):
If True, updates ct_obj.clone_table directly. If False (default), returns a modified DataFrame without changing ct_obj.
- Returns:
- pandas.DataFrame | None:
If inplace=False, returns the updated clone table with new columns fate and fate_pct.
If inplace=True, the function returns None and updates ct_obj.clone_table in place.
- Raises:
- ValueError: If ct_obj is not a CellTagData object, if clone_table is missing
or invalid, or if the specified columns (fate_col, cell_type_key) are not found in the table.
- celltag_tools.tools.naive_atac_rna_pairing(ct_obj, seed=100, state_day=None, add_fate=True)
Performs a naive pairing of cells labeled as ATAC with those labeled as RNA within the same clone, using clone_table of a CellTagData object. Random pairing is done so that every ATAC sibling is matched to an RNA sibling, potentially looping over if the sets differ in size.
- Args:
- ct_obj (CellTagData):
A valid CellTagData object containing clone_table.
- seed (int, optional):
Seed value for NumPy’s random generator to ensure reproducible pairings. Defaults to 100.
- state_day (str | None, optional):
If provided, restricts the pairing to cells in the clone_table where ‘day’ == state_day. Defaults to None.
- add_fate (bool, optional):
If True, attempts to append an additional row containing the fate value for all paired cells. The column ‘fate’ must exist in clone_table. Defaults to True.
- Returns:
- numpy.ndarray:
A 2D array of shape (2, N) or (3, N), where N is the total number of pairs. - First row: ATAC cell barcodes - Second row: RNA cell barcodes - Third row (optional): The single fate value repeated for each pair
(only if add_fate is True and ‘fate’ column exists).
- Raises:
- ValueError: If ct_obj is invalid, if clone_table is missing or not a DataFrame,
or if add_fate=True but ‘fate’ column is missing.
- celltag_tools.tools.get_clone_celltag_mtx(ct_obj, sig_type='core')
Builds a clone-by-CellTag matrix from the metric-filtered matrix in a CellTagData object, based on which CellTags are present in each clone.
For each clone (from ct_obj.clone_table): - A sub-matrix of the metric-filtered matrix (ct_obj.metric_mtx) is extracted
for cells belonging to that clone.
- Depending on sig_type, a list of CellTags is selected:
“core”: CellTags present in more than one cell of the clone.
“union”: CellTags present in at least one cell of the clone.
Each clone’s chosen CellTags are accumulated.
Finally, this information is converted into a sparse matrix via table_to_spmtx, returning a clone-by-CellTag matrix of ones (indicating presence of each CellTag in a particular clone).
- Args:
- ct_obj (CellTagData):
A valid CellTagData object, which must include metric_mtx and a clone_table.
- sig_type (str, optional):
Determines which CellTags define the clone’s “signature”: - “core”: CellTags present in more than one cell of the clone. - “union”: CellTags present in at least one cell of the clone. Defaults to “core”.
- Returns:
- tuple:
- (sparse_mtx, row_labels, col_labels) as returned by table_to_spmtx, where:
row_labels are clone IDs.
col_labels are CellTag identifiers.
sparse_mtx is a clone-by-CellTag matrix of ones indicating presence.
- Raises:
- ValueError: If ct_obj is invalid or does not contain the required
metric matrix (metric_mtx) or clone_table.
- celltag_tools.tools.ident_sparse_clones(ct_obj, n_largest=10, density_th=0.2, plot=False, **kwargs)
Identifies the “sparse” clones among the largest clones in a given metadata table, defined by an edge density threshold. Optionally generates a scatter plot of clone size vs. edge density.
- Args:
- clone_info (pd.DataFrame):
A DataFrame containing per-clone metadata, including columns: - ‘clone.id’ - ‘size’ (number of cells in each clone) - ‘edge.den’ (edge density of the clone subgraph)
- n_largest (int, optional):
Number of top clones by size to consider for filtering. Defaults to 10.
- density_th (float, optional):
Maximum edge density for a clone to be considered “sparse.” Defaults to 0.2.
- plot (bool, optional):
If True, returns a matplotlib Axes object with a scatter plot of clone size vs. edge density. Defaults to False.
- **kwargs:
Additional keyword arguments passed to the plotting function (e.g., marker size).
- Returns:
- pd.DataFrame | tuple[None, matplotlib.axes.Axes]:
If any sparse clones are found, returns a DataFrame subset of clone_info containing only those sparse clones. If plot=True, also returns the Axes object.
If no sparse clones are found, returns None. If plot=True, returns (None, Axes).
- Notes:
Sparse clones are defined here as clones that rank among the top n_largest by size but have edge.den < density_th.
The optional plotting is handled by plot_size_by_den.
- celltag_tools.tools.fix_sparse_clones(ct_obj, sparse_ids=None)
Reassigns cells from “sparse” clones by splitting them into maximal cliques, then recombines all clones into a new clone table. Useful for refining clone assignments after initial clone calling.
Specifically, for each clone in ct_obj.clone_graph whose index is in sparse_ids , we repeatedly extract the largest clique and mark those cells as a new clone until no edges remain.
- Args:
- ct_obj (CellTagData):
A valid CellTagData object with a clone_graph attribute representing clonal subgraphs and a clone_table.
- sparse_ids (array-like | None, optional):
List of clone IDs (1-based) to be split. If None, the function does nothing and returns immediately. Defaults to None.
- Returns:
- pd.DataFrame | None:
If inplace=True, updates ct_obj.clone_table with the newly rebuilt clone assignments and returns None.
Otherwise, returns a new clone table (pd.DataFrame) without modifying ct_obj.
- Raises:
ValueError: If cell number checks fail or if ct_obj is not valid.
- Notes:
This function references an inplace check near the end, but there’s no formal inplace argument in its signature. If you want in-place updates, consider adding inplace=True to the signature.
The final clone IDs are re-enumerated starting from 1.