celltag_tools package

Submodules

celltag_tools.celltag_data module

class celltag_tools.celltag_data.CellTagData(ct_reads=None, thresholds=None, seq_sat=None, clone_graph=None, jaccard_mtx=None, clone_table=None, clone_info=None)

Bases: object

A container class for storing and managing CellTag data, including read data, thresholds, various matrices, and clone information. Provides methods for serialization and easy attribute access.

Attributes:
ct_reads (pd.DataFrame | None):

The raw or processed CellTag read data.

thresholds (dict | None):

A dictionary of thresholds used at various processing steps (e.g., ‘starcode’, ‘triplet’, ‘binarization’, etc.).

seq_sat (float | None):

Sequencing saturation value.

clone_graph (igraph.Graph | None):

Graph representing clone relationships among cells.

allow_mtx (celltag_mtx_dict):

Dictionary-like container for the allow matrix and its axes.

bin_mtx (celltag_mtx_dict):

Dictionary-like container for the binarized matrix and its axes.

metric_mtx (celltag_mtx_dict):

Dictionary-like container for the metric-filtered matrix and its axes.

jaccard_mtx (scipy.sparse.spmatrix | None):

Jaccard similarity matrix.

clone_table (pd.DataFrame | None):

Table mapping cells to their clones.

clone_info (pd.DataFrame | None):

Table containing clone level metadata. Defaults to None.

save(path)

Saves the current CellTagData object to a file using pickle serialization.

Args:
path (str):

The file path where the serialized object will be stored.

class celltag_tools.celltag_data.celltag_mtx_dict(initial_data)

Bases: object

A specialized dictionary-like container for a sparse matrix (‘mtx’) and its associated row (‘cells’) and column (‘celltags’) labels, specifically tailored for the CellTagData workflow.

This class strictly maintains three keys:
  • ‘mtx’: A scipy.sparse matrix representing the cell-tag matrix.

  • ‘cells’: A numpy.ndarray of cell identifiers (e.g., barcodes).

  • ‘celltags’: A numpy.ndarray of tag identifiers (e.g., CellTags).

Any attempt to add keys beyond these three raises a KeyError. Standard dictionary operations (e.g., indexing, iteration) are supported, but the structure is ensured to remain consistent with these fixed keys.

Key Features:
  • Only three keys (‘mtx’, ‘cells’, ‘celltags’) are permitted.

  • You can retrieve or set each key via subscript notation (e.g., obj[‘mtx’]).

  • The is_empty() method returns True if all three keys are set to None.

  • The length (len(obj)) is always 3.

  • The __repr__ method provides a concise, formatted view of the data.

Example Usage:

ctdict = celltag_mtx_dict({‘mtx’: None, ‘cells’: None, ‘celltags’: None}) if ctdict.is_empty():

print(“All entries are currently None.”)

ctdict[‘mtx’] = my_sparse_matrix ctdict[‘cells’] = my_cell_array ctdict[‘celltags’] = my_celltag_array

keys()
values()
items()
is_empty()

Check if all keys are set to None.

celltag_tools.plotting module

celltag_tools.plotting.diagnostic_plots(ct_obj, mtx_use='metric')

Generates diagnostic scatter and histogram plots for a specified matrix (allow, bin, or metric) within a CellTagData object.

Args:
ct_obj (CellTagData):

The CellTagData object containing the matrix to be visualized.

mtx_use (str, optional):

The type of matrix to plot. Must be one of {“allow”, “bin”, “metric”}. Defaults to “metric”.

Raises:
ValueError:

If ct_obj is not a valid CellTagData object, or if mtx_use is not in {“metric”, “allow”, “bin”}.

Notes:

The function checks whether the specified matrix dictionary (mtx_use_mtx) is valid, then creates a 2x2 figure showing: - A scatter plot of row sums (CellTags per cell). - A scatter plot of column sums (cells per CellTag). - A histogram of row sums. - A histogram of column sums.

celltag_tools.plotting.plot_size_by_den(ct_obj, highlight_clones=None, ax=None, **kwargs)

Creates a scatter plot of clone size versus edge density for visualization, with the option to highlight specific clones by coloring or labeling them.

Args:
clone_meta (pd.DataFrame):

A DataFrame containing per-clone metadata, expected to have columns: - ‘edge.den’: The clone’s edge density. - ‘size’: The clone’s size (number of cells). - ‘clone.id’: The clone ID for labeling (if highlighting).

highlight_clones (list | np.ndarray | None, optional):

A list of clone IDs to highlight and label in red. Defaults to None.

ax (matplotlib.axes.Axes | None, optional):

A matplotlib Axes object on which to draw the plot. If None, the current Axes (plt.gca()) is used. Defaults to None.

**kwargs:

Additional keyword arguments passed to sns.scatterplot.

Returns:
matplotlib.axes.Axes:

The Axes object containing the scatter plot.

Raises:
ValueError: If clone_info is missing or invalid, or if highlight_clones

is not a valid list/array when provided.

celltag_tools.tools module

celltag_tools.tools.read_celltag(celltag_path, sample_prefix=None, assay='RNA', triplet_th=1, starcode_th=2, starcode_path=None, allowlist_path=None, inplace=True)

Reads and processes CellTag read data from file paths, applying filtering, error correction, and allowlisting. Optionally returns a new CellTagData object or the processed data.

Args:
celltag_path (str | list[str]):

Path or list of paths to the CellTag read files (TSV format).

sample_prefix (str | list[str], optional):

Prefix or list of prefixes to be added to CellTag barcodes. If not provided and multiple paths are specified, prefixes are autogenerated.

assay (str, optional):

Single-cell assay type. Must be either “RNA” or “ATAC”. Defaults to “RNA”.

triplet_th (int, optional):

Threshold for filtering out read triplets (UMI or read occurrences) below this count. Defaults to 1.

starcode_th (int, optional):

Edit distance threshold for collapsing barcodes via Starcode. Defaults to 2.

starcode_path (str, optional):

Path to the Starcode installation directory. Must contain the executable ‘starcode’.

allowlist_path (str, optional):

Path to the allowlist file (TSV) containing valid CellTags.

inplace (bool, optional):

If True, returns a CellTagData object with the processed data set inside it. If False, returns a tuple containing (processed reads, thresholds, sequencing saturation). Defaults to True.

Returns:
CellTagData:

If inplace=True, returns a CellTagData object with ct_reads, thresholds, and seq_sat set.

tuple:
If inplace=False, returns a tuple of:
  • pd.DataFrame: Processed CellTag read data.

  • dict: Dictionary containing thresholds {‘starcode’: starcode_th, ‘triplet’: triplet_th}.

  • float: Sequencing saturation percentage.

Raises:
ValueError: If any of the provided file paths do not exist, if Starcode is not found,

if the allowlist is missing, or if assay is invalid (“RNA” or “ATAC” only).

celltag_tools.tools.create_allow_mtx(ct_obj, overwrite=False, inplace=True)

Creates a sparse allow matrix (cell x CellTag) in the provided CellTagData object, using either UMI counts (for RNA) or read counts (for ATAC).

Args:
ct_obj (CellTagData):

A valid CellTagData object containing the ‘ct_reads’ attribute.

overwrite (bool, optional):

If False (default), raises an error if an allow matrix already exists. If True, overwrites the existing allow matrix.

inplace (bool, optional):

If True (default), updates the allow_mtx attribute within the ct_obj. If False, returns the created allow matrix and associated row/column labels.

Returns:
tuple:
If inplace=False, returns (allow_mtx, allow_rows, allow_cols), where:
  • allow_mtx (scipy.sparse.csr_matrix): The constructed allow matrix.

  • allow_rows (list): List of cell barcodes (rows).

  • allow_cols (list): List of allowed CellTags (columns).

Raises:
ValueError: If ct_obj is not a CellTagData object or if the allow matrix already exists

and overwrite=False.

celltag_tools.tools.create_bin_mtx(ct_obj, bin_th=1, overwrite=False, inplace=True)

Binarizes the allow matrix from a CellTagData object. The resulting matrix is stored in bin_mtx if ‘inplace=True’, or returned. Values above bin_th are set to True (1), else False (0).

Args:
ct_obj (CellTagData):

A valid CellTagData object containing an allow matrix in allow_mtx.

bin_th (int, optional):

Threshold for binarization. Defaults to 1.

overwrite (bool, optional):

If False (default), raises an error if a binarized matrix already exists. If True, overwrites the existing bin_mtx.

inplace (bool, optional):

If True (default), updates bin_mtx attribute within ct_obj. If False, returns the binarized matrix and associated row/column labels.

Returns:
tuple:
If inplace=False, returns (ct_bin_mtx, cells, celltags), where:
  • ct_bin_mtx (scipy.sparse.csr_matrix): The binarized matrix.

  • cells (list): List of cell barcodes (rows).

  • celltags (list): List of CellTags (columns).

Raises:
ValueError: If ct_obj is not a CellTagData object or if the allow_mtx is missing/invalid,

or if a binarized matrix already exists and overwrite=False.

celltag_tools.tools.create_metric_mtx(ct_obj, met_lower=1, met_upper=25, overwrite=False, inplace=True)

Performs metric-based filtering on the binarized cell x CellTag matrix to remove cells with too few or too many CellTags (defined by met_lower and met_upper). Produces a filtered matrix stored in metric_mtx if ‘inplace=True’, or returns it.

Args:
ct_obj (CellTagData):

A valid CellTagData object containing a binarized matrix in bin_mtx.

met_lower (int, optional):

Minimum number of CellTags required for a cell to be retained. Defaults to 1.

met_upper (int, optional):

Maximum number of CellTags allowed for a cell to be retained. Defaults to 25.

overwrite (bool, optional):

If False (default), raises an error if a metric matrix already exists. If True, overwrites the existing metric_mtx.

inplace (bool, optional):

If True (default), updates metric_mtx attribute within ct_obj. If False, returns the filtered matrix and associated row/column labels.

Returns:
tuple:
If inplace=False, returns (celltag_mat_met, cells_met, celltags_met), where:
  • celltag_mat_met (scipy.sparse.csr_matrix): The filtered (metric) matrix.

  • cells_met (ndarray): Array of filtered cell barcodes (rows).

  • celltags_met (ndarray): Array of filtered CellTags (columns).

Raises:
ValueError: If ct_obj is not a CellTagData object, if bin_mtx is missing/invalid,

or if a metric matrix already exists and overwrite=False.

celltag_tools.tools.call_clones(ct_obj, jaccard_th=0.7, return_graph=False, overwrite=False, inplace=True)

Identifies clonal relationships among cells based on the Jaccard similarity of their CellTag profiles. Optionally returns the Jaccard matrix, a graph representation, and a clone table, or stores them within the given CellTagData object.

Args:
ct_obj (CellTagData):

A valid CellTagData object containing a filtered matrix in metric_mtx.

jaccard_th (float, optional):

Threshold for Jaccard similarity to consider cells part of the same clone. Defaults to 0.7.

return_graph (bool, optional):

If True, additionally returns the graph representation of cell clones. Defaults to False.

overwrite (bool, optional):

If False (default), raises an error if the Jaccard matrix or clone table already exist. If True, overwrites existing data.

inplace (bool, optional):

If True (default), updates the ct_obj with jaccard_mtx, clone_table, and optionally clone_graph if return_graph=True. If False, returns the requested data.

Returns:
tuple | None:
Depending on inplace and return_graph:
  • If inplace=False and return_graph=False: (jac_mat, clones).

  • If inplace=False and return_graph=True: (jac_mat, clone_graph, clones).

  • If `inplace=True function sets the following attributes on the CellTagData object:
    • ct_obj.jaccard_mtx

    • ct_obj.clone_table

    • ct_obj.thresholds[“jaccard”]

    • ct_obj.clone_graph (only if return_graph=True)

Where:
  • jac_mat (scipy.sparse.csr_matrix): Jaccard similarity matrix.

  • clone_graph (networkx.Graph): Graph where nodes represent cells, and edges represent similarity > jaccard_th.

  • clones (pd.DataFrame): Table mapping cells to their assigned clones.

Raises:
ValueError: If ct_obj is not a CellTagData object, if metric_mtx is missing/invalid,

or if jaccard_mtx or clone_table already exist and overwrite=False.

celltag_tools.tools.assign_fate(ct_obj, fate_col='day', fate_key='d5', cell_type_key='cell_type2', inplace=False)

Assigns a “fate” to each clone in a CellTagData object’s clone table, based on the most frequent cell type (cell_type_key) present at a specified time point (fate_key in column fate_col).

For each clone (identified by clone.id), the function finds rows in the clone table where fate_col == fate_key and determines the most common cell_type_key among those rows. This value is assigned as the clone’s “fate,” along with the percentage of cells (fate_pct) that match this fate within that clone at fate_key. If no cells meet the fate criteria (e.g., time point is missing), the clone is labeled with fate=’no_fate_cells’ and fate_pct=0.

Args:
ct_obj (CellTagData):

A valid CellTagData object containing clone_table.

fate_col (str, optional):

Column name in clone_table that defines the time point or condition used to assign fate. Defaults to ‘day’.

fate_key (str, optional):

A value in fate_col specifying which rows represent the “fate” condition. Defaults to ‘d5’.

cell_type_key (str, optional):

Column name in clone_table that specifies the cell type. Defaults to ‘cell_type2’.

inplace (bool, optional):

If True, updates ct_obj.clone_table directly. If False (default), returns a modified DataFrame without changing ct_obj.

Returns:
pandas.DataFrame | None:
  • If inplace=False, returns the updated clone table with new columns fate and fate_pct.

  • If inplace=True, the function returns None and updates ct_obj.clone_table in place.

Raises:
ValueError: If ct_obj is not a CellTagData object, if clone_table is missing

or invalid, or if the specified columns (fate_col, cell_type_key) are not found in the table.

celltag_tools.tools.naive_atac_rna_pairing(ct_obj, seed=100, state_day=None, add_fate=True)

Performs a naive pairing of cells labeled as ATAC with those labeled as RNA within the same clone, using clone_table of a CellTagData object. Random pairing is done so that every ATAC sibling is matched to an RNA sibling, potentially looping over if the sets differ in size.

Args:
ct_obj (CellTagData):

A valid CellTagData object containing clone_table.

seed (int, optional):

Seed value for NumPy’s random generator to ensure reproducible pairings. Defaults to 100.

state_day (str | None, optional):

If provided, restricts the pairing to cells in the clone_table where ‘day’ == state_day. Defaults to None.

add_fate (bool, optional):

If True, attempts to append an additional row containing the fate value for all paired cells. The column ‘fate’ must exist in clone_table. Defaults to True.

Returns:
numpy.ndarray:

A 2D array of shape (2, N) or (3, N), where N is the total number of pairs. - First row: ATAC cell barcodes - Second row: RNA cell barcodes - Third row (optional): The single fate value repeated for each pair

(only if add_fate is True and ‘fate’ column exists).

Raises:
ValueError: If ct_obj is invalid, if clone_table is missing or not a DataFrame,

or if add_fate=True but ‘fate’ column is missing.

celltag_tools.tools.get_clone_celltag_mtx(ct_obj, sig_type='core')

Builds a clone-by-CellTag matrix from the metric-filtered matrix in a CellTagData object, based on which CellTags are present in each clone.

For each clone (from ct_obj.clone_table): - A sub-matrix of the metric-filtered matrix (ct_obj.metric_mtx) is extracted

for cells belonging to that clone.

  • Depending on sig_type, a list of CellTags is selected:
    • “core”: CellTags present in more than one cell of the clone.

    • “union”: CellTags present in at least one cell of the clone.

  • Each clone’s chosen CellTags are accumulated.

Finally, this information is converted into a sparse matrix via table_to_spmtx, returning a clone-by-CellTag matrix of ones (indicating presence of each CellTag in a particular clone).

Args:
ct_obj (CellTagData):

A valid CellTagData object, which must include metric_mtx and a clone_table.

sig_type (str, optional):

Determines which CellTags define the clone’s “signature”: - “core”: CellTags present in more than one cell of the clone. - “union”: CellTags present in at least one cell of the clone. Defaults to “core”.

Returns:
tuple:
(sparse_mtx, row_labels, col_labels) as returned by table_to_spmtx, where:
  • row_labels are clone IDs.

  • col_labels are CellTag identifiers.

  • sparse_mtx is a clone-by-CellTag matrix of ones indicating presence.

Raises:
ValueError: If ct_obj is invalid or does not contain the required

metric matrix (metric_mtx) or clone_table.

celltag_tools.tools.ident_sparse_clones(ct_obj, n_largest=10, density_th=0.2, plot=False, **kwargs)

Identifies the “sparse” clones among the largest clones in a given metadata table, defined by an edge density threshold. Optionally generates a scatter plot of clone size vs. edge density.

Args:
clone_info (pd.DataFrame):

A DataFrame containing per-clone metadata, including columns: - ‘clone.id’ - ‘size’ (number of cells in each clone) - ‘edge.den’ (edge density of the clone subgraph)

n_largest (int, optional):

Number of top clones by size to consider for filtering. Defaults to 10.

density_th (float, optional):

Maximum edge density for a clone to be considered “sparse.” Defaults to 0.2.

plot (bool, optional):

If True, returns a matplotlib Axes object with a scatter plot of clone size vs. edge density. Defaults to False.

**kwargs:

Additional keyword arguments passed to the plotting function (e.g., marker size).

Returns:
pd.DataFrame | tuple[None, matplotlib.axes.Axes]:
  • If any sparse clones are found, returns a DataFrame subset of clone_info containing only those sparse clones. If plot=True, also returns the Axes object.

  • If no sparse clones are found, returns None. If plot=True, returns (None, Axes).

Notes:
  • Sparse clones are defined here as clones that rank among the top n_largest by size but have edge.den < density_th.

  • The optional plotting is handled by plot_size_by_den.

celltag_tools.tools.fix_sparse_clones(ct_obj, sparse_ids=None)

Reassigns cells from “sparse” clones by splitting them into maximal cliques, then recombines all clones into a new clone table. Useful for refining clone assignments after initial clone calling.

Specifically, for each clone in ct_obj.clone_graph whose index is in sparse_ids , we repeatedly extract the largest clique and mark those cells as a new clone until no edges remain.

Args:
ct_obj (CellTagData):

A valid CellTagData object with a clone_graph attribute representing clonal subgraphs and a clone_table.

sparse_ids (array-like | None, optional):

List of clone IDs (1-based) to be split. If None, the function does nothing and returns immediately. Defaults to None.

Returns:
pd.DataFrame | None:
  • If inplace=True, updates ct_obj.clone_table with the newly rebuilt clone assignments and returns None.

  • Otherwise, returns a new clone table (pd.DataFrame) without modifying ct_obj.

Raises:

ValueError: If cell number checks fail or if ct_obj is not valid.

Notes:
  • This function references an inplace check near the end, but there’s no formal inplace argument in its signature. If you want in-place updates, consider adding inplace=True to the signature.

  • The final clone IDs are re-enumerated starting from 1.

celltag_tools.utils module

celltag_tools.utils.jaccard_similarities(mat)

Computes the Jaccard similarity for all pairs of columns in a given sparse matrix.

Args:
mat (scipy.sparse.spmatrix):

A binary sparse matrix (rows x columns). Each column is a feature vector to be compared with every other column.

Returns:
scipy.sparse.spmatrix:

A sparse matrix (same shape as mat.T * mat) where each entry (i, j) represents the Jaccard similarity between columns i and j of the input matrix. The diagonal is set to 0.

celltag_tools.utils.table_to_spmtx(row_data, col_data, count_data)

Converts row, column, and count data into a CSR (Compressed Sparse Row) matrix.

Args:
row_data (array-like):

Row labels (e.g., cell barcodes).

col_data (array-like):

Column labels (e.g., CellTag identifiers).

count_data (array-like):

Counts or other values to populate the sparse matrix.

Returns:
tuple:
A tuple (celltag_mat, cells, celltags) where:
  • celltag_mat (scipy.sparse.csr_matrix): The constructed sparse matrix of shape (len(unique_rows), len(unique_columns)).

  • cells (numpy.ndarray): Sorted unique row labels.

  • celltags (numpy.ndarray): Sorted unique column labels.

celltag_tools.utils.check_mtx_dict(target_mtx_dict)

Validates that the provided matrix dictionary conforms to the expected structure for CellTagData matrices (e.g., allow_mtx, bin_mtx, metric_mtx).

Args:
target_mtx_dict (celltag_mtx_dict):
A dictionary-like object expected to contain:
  • ‘mtx’: A scipy.sparse.spmatrix

  • ‘cells’: A numpy.ndarray of cell identifiers

  • ‘celltags’: A numpy.ndarray of cell tag identifiers

Raises:
ValueError: If target_mtx_dict is not a celltag_mtx_dict, if it does not have

exactly three keys (‘mtx’, ‘cells’, ‘celltags’), or if the types of those values are incorrect.

celltag_tools.utils.find_homoplasy(n_cells, moi, barcode_abundance, ct_min=2, ct_max=25, n_iters=1000, verbose=False)

Simulates CellTag signatures in a population of cells to estimate the rate of CellTag signature duplication (homoplasy) across unrelated cells (i.e. false clones).

In each iteration: 1. A Poisson-distributed random count of CellTags is assigned to each cell (mean = moi). 2. Cells with CellTag counts outside [ct_min, ct_max] are filtered out. 3. CellTags are sampled from the provided abundance distribution and assigned to each remaining cell. 4. The duplication rate is computed as the fraction of cell pairs sharing the exact same CellTag signature.

Args:
n_cells (int):

The number of cells to simulate in each iteration (prior to filtering).

moi (float):

The mean of the Poisson distribution from which the CellTag counts per cell are drawn.

barcode_abundance (pd.DataFrame | list):

A DataFrame containing CellTag abundances (first column) with barcodes as the index, or a list of barcodes (assumed uniform abundance).

ct_min (int, optional):

The minimum allowed number of CellTags in a cell (inclusive). Defaults to 2.

ct_max (int, optional):

The maximum allowed number of CellTags in a cell (inclusive). Defaults to 25.

n_iters (int, optional):

The number of Monte Carlo simulation iterations to run. Defaults to 1000.

verbose (bool, optional):

If True, prints progress messages every 10 iterations. Defaults to False.

Returns:
list[float]:

A list of duplication rates (homoplasy) across the simulation iterations. Each entry represents the duplication rate in one iteration.

Raises:
ValueError:

If barcode_abundance is neither a DataFrame nor a list.

Example:
>>> # Using a uniform abundance of barcodes
>>> homoplasy_rates = find_homoplasy(
...     n_cells=1000,
...     moi=5,
...     barcode_abundance=["tagA", "tagB", "tagC"],
...     ct_min=2,
...     ct_max=25,
...     n_iters=10,
...     verbose=True
... )
>>> print(homoplasy_rates)
Notes:
  • The duplication rate is the proportion of pairs of cells that share the exact same set of CellTags. It’s computed as:

    net_dup_pairs / comb(len(filtered_cells), 2).

  • comb(x, 2) is shorthand for binomial coefficient C(x, 2) = x*(x-1)/2.

celltag_tools.utils.get_clone_cell_embed(adata_obj, ct_obj, clone_weight=1)

Creates a combined AnnData object containing both single-cell RNA data and clone-level “pseudo-cells” co-embedded in a knowledge graph, based on the connectivities in adata_obj and the clone assignments in ct_obj.clone_table.

The new connectivity graph is constructed by: 1. Scaling down the original adata_obj.obsp[‘connectivities’] by 1 / clone_weight

if clone_weight >= 1.

  1. Building a sparse clone-cell connectivity matrix from ct_obj.clone_table.

  2. Combining the two connectivity matrices into a larger graph with rows/columns for both cells and clones.

  3. Storing the result in adata_obj_coembed.obsp[‘connectivities’].

Args:
adata_obj (anndata.AnnData):

The AnnData object containing single-cell data and a precomputed neighbors graph in adata_obj.obsp[‘connectivities’].

ct_obj (CellTagData):

A valid CellTagData object containing a clone_table with columns for clone IDs and cell barcodes.

clone_weight (float, optional):

A scaling factor for weighting or penalizing the clone-cell connections relative to cell-cell connections. Defaults to 1.

Returns:
anndata.AnnData:

A new AnnData object containing: - .obs_names: The concatenation of the original cell barcodes and the

clone IDs.

  • .obsp[‘connectivities’]: The merged connectivity matrix for cells and clones.

  • .uns[‘neighbors’]: Copied parameters from the original adata_obj.

Raises:
ValueError: If ct_obj is invalid or missing clone_table, or if

adata_obj.obsp[‘connectivities’] is empty.

celltag_tools.utils.merge_nn(nn_graph, all_cells, cell_list)

Merges a given list of cells with their nearest neighbors as defined by a nearest-neighbor graph.

For each cell in cell_list, the function retrieves its neighbors from nn_graph (row corresponding to that cell in all_cells) and unions them into a set.

Args:
nn_graph (scipy.sparse.spmatrix or numpy.ndarray):

A nearest-neighbor matrix where row i contains nonzero entries at the columns corresponding to the neighbors of cell i.

all_cells (array-like):

A list or array of all cell identifiers, matching the rows/columns of nn_graph.

cell_list (array-like):

A list of cell identifiers whose neighbors should be collected together.

Returns:
set:

A set of cell identifiers including all cell_list cells plus any of their nearest neighbors found in nn_graph.

Module contents

Software for CellTag Clonal analysis

class celltag_tools.CellTagData(ct_reads=None, thresholds=None, seq_sat=None, clone_graph=None, jaccard_mtx=None, clone_table=None, clone_info=None)

Bases: object

A container class for storing and managing CellTag data, including read data, thresholds, various matrices, and clone information. Provides methods for serialization and easy attribute access.

Attributes:
ct_reads (pd.DataFrame | None):

The raw or processed CellTag read data.

thresholds (dict | None):

A dictionary of thresholds used at various processing steps (e.g., ‘starcode’, ‘triplet’, ‘binarization’, etc.).

seq_sat (float | None):

Sequencing saturation value.

clone_graph (igraph.Graph | None):

Graph representing clone relationships among cells.

allow_mtx (celltag_mtx_dict):

Dictionary-like container for the allow matrix and its axes.

bin_mtx (celltag_mtx_dict):

Dictionary-like container for the binarized matrix and its axes.

metric_mtx (celltag_mtx_dict):

Dictionary-like container for the metric-filtered matrix and its axes.

jaccard_mtx (scipy.sparse.spmatrix | None):

Jaccard similarity matrix.

clone_table (pd.DataFrame | None):

Table mapping cells to their clones.

clone_info (pd.DataFrame | None):

Table containing clone level metadata. Defaults to None.

save(path)

Saves the current CellTagData object to a file using pickle serialization.

Args:
path (str):

The file path where the serialized object will be stored.