APIs

I/O

mito.io.make_afm(path_ch_matrix: str, path_meta: str = None, sample: str = None, pp_method: str = 'maegatk', scLT_system: str = 'MAESTER', ref: str = 'rCRS', kwargs: dict = {}) → AnnData[source]

Creates an annotated Allele Frequency Matrix from different scLT_system and pre-processing pipelines outputs.

Parameters:

path_ch_matrix (str) – Path to folder with necessary data for provided scLT_system.
path_meta (str, optional) – Path to .csv file with cell meta-data. Default is None.
sample (str, optional) – Sample name to append at preprocessed CBs. Default is None.
pp_method (str, optional) – Preprocessing method (MAESTER data only). Available options: mito_preprocessing, maegatk, cellsnp-lite, freebayes, samtools. Default is ‘maegatk’.
scLT_system (str, optional) – scLT system (i.e., marker) used for tracing. Available options: MAESTER, RedeeM, Cas9, scWGS. Default is ‘MAESTER’.
ref (str, optional) – Path to MT-reference genome. THe user can provide a custom FASTA file. Default is ‘rCRS’.
kwargs (dict, optional) – Optional arguments for specific scLT_system readers. Default is {}.

Returns:

afm – The assembled Allele Frequency Matrix (AFM).

Return type:

AnnData

mito.io.read_coverage(afm_raw: AnnData, path_coverage: str, sample: str) → DataFrame[source]: Read coverage table from mito_preprocessing/maegatk output.

mito.io.read_newick(path, X_raw: DataFrame = None, X_bin: DataFrame = None, D: DataFrame = None, meta: DataFrame = None) → CassiopeiaTree[source]

Read a newick string as CassiopeiaTree object.

Parameters:

path (str) – Path to newick string.
X_raw (pd.DataFrame, optional) – Raw allelic frequency table. Cell x variants. Default is None.
X_bin (pd.DataFrame, optional) – Binary (1,0) cell genotypes. Cell x variants. Default is None.
D (pd.DataFrame, optional) – Cell x cell distance matrix. Default is None.
meta (pd.DataFrame, optional) – Cell metadata. Cell x covariates. Default is None.

Returns:

afm – The assembled Allele Frequency Matrix (AFM).

Return type:

AnnData

mito.io.write_newick(tree: CassiopeiaTree, path: str)[source]

Write a CassiopeiaTree as a newick string.

Parameters:

tree (CassiopeiaTree) – Tree to write.
path (str) – Path to newick string.

Preprocessing

mito.pp.annotate_vars(afm: AnnData, overwrite: bool = False)[source]: Annotate MT-SNVs properties as in in Weng et al., 2024, and Miller et al. 2022 before. Create vars_df and update .var.

mito.pp.call_genotypes(afm: AnnData, bin_method: str = 'MiTo', t_vanilla: float = 0.0, min_AD: int = 2, t_prob: float = 0.7, min_cell_prevalence: float = 0.1, k: int = 5, gamma: float = 0.25, n_samples: int = 100, resample: bool = False)[source]

Call genotypes. The ‘bin’ layer is added in-place.

Three strategies are implemented: * “vanilla”: simple, hard thresholding on raw AF values or number of alternative allele counts. * “MiTo”: hybrid MiTo genotype calling strategy (see mito.pp.genotype_MiTo). * “MiTo_smooth”: MiTo with kNN smoothing of posterior probability before genotype calling.

Parameters:

afm (AnnData) – Allele Frequency Matrix.
bin_method (str, optional) – Genotyping strategy. Default is “MiTo”.
t_prob (float, optional) – Threshold on posterior probabilities. Default is 0.7.
t_vanilla (float, optional) – Threshold on raw allele frequencies. Default is 0.
min_AD (int, optional) – Minimum number of alternative UMI counts to assign the ‘mut’ (1) genotype. Default is 1.
min_cell_prevalence (float, optional) – Minimum cell prevalence to use probabilistic genotyping. Default is 0.1.
k (int, optional) – Number of neighbors for kNN search (if bin_method is “MiTo_smooth”). Default is 5.
gamma (float, optional) – Correction factor weight from neighboring cells (if bin_method is “MiTo_smooth”). Default is 0.25.
n_samples (int, optional) – Number of cell profile replicates (if bin_method is “MiTo_smooth”). Default is 100.
resample (bool, optional) – Generate in-silico replicates of cell profiles before kNN (if bin_method is “MiTo_smooth”). Default is False.

mito.pp.compute_distances(afm: AnnData, distance_key: str = 'distances', metric: str = 'weighted_jaccard', precomputed: bool = False, bin_method: str = 'MiTo', binarization_kwargs: Dict[str, Any] = {}, ncores: int = 1, rescale: bool = True, verbose: bool = True)[source]

Pairwise cell-cell (or sample-) distance computation in some character space (e.g., MT-SNVs mutation space). Updates the afm.obsp slot in-place.

Parameters:

afm (AnnData) – Allele Frequency Matrix (.X slot or ‘bin’ layer present).
distance_key (str, optional) – Key in afm.obsp at which distances will be stored. Default is “distances”.
metric (str, optional) – Distance metric to use. Default is “weighted_jaccard”.
precomputed (bool, optional) – If True, use precomputed genotypes; otherwise, recompute from scratch. Default is False.
bin_method (str, optional) – Genotyping method. Default is “MiTo”.
binarization_kwargs (dict, optional) – Keyword arguments for the discretization function. Default is {}.
ncores (int, optional) – Number of cores for parallel computation. Default is 1.
rescale (bool, optional) – Whether to apply min-max rescaling to distance values. Default is True.
verbose (bool, optional) – Whether to print verbose logging. Default is True.

mito.pp.compute_lineage_biases(afm: AnnData, lineage_column: str, target_lineage: str, bin_method: str = 'MiTo', binarization_kwargs: Dict[str, Any] = {}, alpha: float = 0.05) → DataFrame[source]

Compute MT-SNVs enrichment scores for a given lineage category using Fisher’s exact test.

Parameters:

afm (AnnData) – Allele Frequency Matrix.
lineage_column (str) – Field in afm.obs containing the ‘lineage’ categorical variable.
target_lineage (str) – The category in afm.obs[lineage_column] to test for MT-SNV enrichment.
bin_method (str, optional) – Genotyping method. Default is “MiTo”.
binarization_kwargs (dict, optional) – Additional keyword arguments for genotyping. Default is {}.
alpha (float, optional) – Family-wise error rate for p-value correction. Default is 0.05.

Returns:

results – DataFrame containing computed statistics (e.g., -log10(FDR) from Fisher’s exact test).

Return type:

pd.DataFrame

mito.pp.filter_MiTo(afm: AnnData, min_cov: float = 5, min_var_quality: float = 30, min_frac_negative: float = 0.2, min_n_positive: int = 5, af_confident_detection: float = 0.02, min_n_confidently_detected: int = 2, min_mean_AD_in_positives: float = 1.25, min_mean_DP_in_positives: float = 25) → AnnData[source]

MiTo custom filter. Filter variants with: - At least min_cov mean site coverage (across cells) - At least min_var_quality mean variant allele basecall quality (across cells) - At least n cells * min_frac_negative negative cells - At least min_n_positive (AF > 0) cells - At least min_n_confidently_detected cells in which the variant has been detected with AF greater than af_confident_detection - At least min_mean_AD_in_positives mean AD in positive cells - At least min_mean_DP_in_positives mean DP in positive cells

Parameters:

afm (AnnData) – Allele Frequency Matrix.
min_cov (float) – Minimum mean site coverage (across cells). Default is 5.
min_var_quality (float) – Minimum mean variant allele basecall quality (across cells). Default is 30.
min_frac_negative (float) – Minimum fraction of negative cells (expressed as a fraction of total cells). Default is 0.2.
min_n_positive (int) – Minimum number of cells with AF > 0. Default is 5.
min_n_confidently_detected (int) – Minimum number of cells in which the variant has been detected with an AF greater than af_confident_detection. Default is 2.
af_confident_detection (float) – Allele frequency threshold for confident detection. Default is 0.02.
min_mean_AD_in_positives (float) – Minimum mean alternative allele count (AD) in positive cells. Default is 1.25.
min_mean_DP_in_positives (float) – Minimum mean total UMI counts (DP) in positive cells. Default is 25.

Returns:

afm – Filtered Allele Frequency Matrix.

Return type:

AnnData

mito.pp.filter_afm(afm: AnnData, lineage_column: str = None, min_cell_number: int = 0, cells: Iterable[str] = None, filtering: str = 'MiTo', filtering_kwargs: Dict[str, Any] = {}, filter_moran: bool = True, moran_I_pvalue: float = 0.01, max_AD_counts: int = 2, variants: Iterable[str] = None, min_n_var: int = 1, fit_mixtures: bool = False, only_positive_deltaBIC: bool = False, filter_dbs: bool = True, compute_enrichment: bool = True, bin_method: str = 'MiTo', binarization_kwargs: Dict[str, Any] = {}, metric: str = 'weighted_jaccard', ncores: int = 8, spatial_metrics: bool = False, tree_kwargs: Dict[str, Any] = {}, return_tree: bool = False)[source]

Filter an Allele Frequency Matrix for downstream analysis.

This function implements different strategies to subset the detected cells and MT-SNVs to those that exhibit optimal properties for single-cell lineage tracing (scLT). The user can tune filtering method defaults via the filtering_kwargs argument. Pre-computed sets of cells and variants can be selected without relying on any specific method (the function ensures integrity of the AFM AnnData object after subsetting).

Parameters:

afm (AnnData) – Allele Frequency Matrix.
lineage_column (str, optional) – Lineage column of interest in afm.obs. Default is None.
min_cell_number (int, optional) – Minimum number of cells required for groups in afm.obs[lineage_column]. Default is 0.
cells (Iterable[str], optional) – Pre-defined list of cells to retain. Default is None.
filtering (str, optional) – MT-SNVs filtering strategy. See mito.pp.filters for available strategies and parameters. Default is MiTo.
filtering_kwargs (dict, optional) – Additional keyword arguments for the selected filtering method. Default is {}.
filter_moran (bool, optional) – Whether to remove MT-SNVs that are not spatially auto-correlated. Default is True.
moran_I_pvalue (float, optional) – P-value threshold for Moran’s I statistics. Default is 0.01.
max_AD_counts (int, optional) – Retain an MT-SNV if at least one cell has this number of alternative allele counts. Default is 2.
variants (Iterable[str], optional) – Pre-defined list of variants to retain. Default is None.
min_n_var (int, optional) – Retain cells with at least this number of MT-SNVs. Default is 1.
fit_mixtures (bool, optional) – Whether to fit MQuad (Kwock et al., 2022) binomial mixtures. Default is False.
only_positive_deltaBIC (bool, optional) – Retain only MT-SNVs with positive deltaBIC (from MQuad). Default is False.
filter_dbs (bool, optional) – Filter MT-SNVs from dbSNP and REDIdb database. Default is True.
compute_enrichment (bool, optional) – Whether to compute MT-SNVs enrichment in the lineage_column. Default is True.
bin_method (str, optional) – Genotyping method. Default is MiTo.
binarization_kwargs (dict, optional) – Additional keyword arguments for genotyping. Default is {}.
metric (str, optional) – Distance metric to use. Default is weighted_jaccard.
ncores (int, optional) – Number of cores to use for distance computations and fitting MQuad mixtures, if necessary. Default is 1.
spatial_metrics (bool, optional) – Whether to compute “spatial” connectivity metrics for filtered MT-SNVs. Default is False.
tree_kwargs (dict, optional) – Additional keyword arguments for tree inference (i.e., mito.tl.build_tree). Default is {}.
return_tree (bool, optional) – Whether to return a CassiopeiaTree if spatial_metrics is True. Default is False.

Returns:

Filtered Allelic Frequency Matrix.

Return type:

AnnData

mito.pp.filter_baseline(afm: AnnData, min_site_cov: int = 5, min_var_quality: int = 30, min_n_positive: int = 2, only_genes: bool = True) → AnnData[source]: Compute summary stats and baseline filter MT-SNVs (MAESTER, redeem).

mito.pp.filter_cell_clones(afm: AnnData, column: str = 'GBC', min_cell_number: int = 10) → AnnData[source]: Filter only cells from groups in afm.obs[column] with more than min_cell_number cells.

mito.pp.filter_cells(afm: AnnData, cell_subset: Iterable[str] = None, cell_filter: str = 'filter1', nmads: int = 5, mean_cov_all: float = 20, median_cov_target: int = 25, min_perc_covered_sites: float = 0.75) → AnnData[source]

Filter cells from a MAESTER/RedeeM Allele Frequency Matrix.

Parameters:

afm (AnnData) – Allele Frequency Matrix.
cell_subset (Iterable[str], optional) – Subset of cells to retain. Default is None.
cell_filter (str, optional) – Cell filtering strategy. Options are: - “filter1”: Filter cells based on mean MT-genome coverage (all sites). - “filter2”: Filter cells based on median target MT-sites coverage and minimum percentage of target sites covered (MAESTER only). Default is None.
nmads (int, optional) – Number of Minimum Absolute Deviations to filter cells with high MT-library UMI counts. Default is 5.
mean_coverage (int, optional) – Minimum mean consensus (at least 3-supporting-reads) UMI coverage across the MT-genome per cell. Default is 20.
median_cov_target (int, optional) – Minimum median UMI coverage at target MT-sites (only for MAESTER data). Default is 25.
min_perc_covered_sites (float, optional) – Minimum fraction of MT target sites covered (only for MAESTER data). Default is 0.75.

Returns:

Filtered Allele Frequency Matrix.

Return type:

AnnData

mito.pp.kNN_graph(X: array = None, D: array = None, k: int = 10, from_distances: bool = False, nn_kwargs: Dict[str, Any] = {}) → Tuple[array, csr_matrix, csr_matrix][source]

kNN graph computation.

Parameters:

X (np.array) – Feature matrix (observations x features).
D (np.array, optional) – Pairwise distance matrix. Default is None.
k (int, optional) – Number of neighbors. Default is 10.
from_distances (bool, optional) – Whether to start from precomputed distances. Default is False.
nn_kwargs (dict, optional) – Additional keyword arguments for kNN search.

Returns:

A tuple containing: - A numpy array of shape (n_samples, k) with the indices of the k-nearest neighbors. - A csr_matrix representing the connectivity matrix of the kNN graph. - A csr_matrix representing the distances corresponding to the kNN graph.

Return type:

tuple of (np.array, csr_matrix, csr_matrix)

mito.pp.reduce_dimensions(afm: AnnData, layer: str = 'bin', distance_key: str = 'distances', seed: int = 1234, method: str = 'UMAP', k: int = 10, n_comps: int = 2, ncores: int = 8, metric: str = 'weighted_jaccard', bin_method: str = 'MiTo', binarization_kwargs: Dict[str, Any] = {})[source]

Dimensionality reduction for an Allele Frequency Matrix.

Parameters:

afm (AnnData) – Allele Frequency Matrix.
layer (str, optional) – Layer to use. Default is “bin”.
distance_key (str, optional) – afm.obsp key to append distances. Default is “distances”.
seed (int, optional) – Random seed. Default is 1234.
method (str, optional) – Dimensionality reduction method. Default is “UMAP”.
k (int, optional) – Number of neighbors to use for kNN search. Default is 10.
n_comps (int, optional) – Number of dimensions of the output embedding. Default is 2.
metric (str, optional) – Dissimilarity metric to use. Default is “weightde_jaccard”.
bin_method (str, optional) – Genotyping method. Default is “MiTo”.
binarization_kwargs (dict, optional) – Keyword arguments for binarization. Default is {}.

Tools

class mito.tl.MiToTreeAnnotator(tree: CassiopeiaTree)[source]

Bases: object

MiTo tree annotation class. Performs clonal inference from an arbitrary MT-SNVs-based phylogeny.

clonal_inference(similarity_tresholds: Iterable[int] = [85, 90, 95, 99], mut_enrichment_tresholds: Iterable[int] = [3, 5, 10], merging_treshold: Iterable[float] = [0.25, 0.5, 0.75], af_treshold: float = 0.0, weight_silhouette: float = 0.3, weight_n_clones: float = 0.4, weight_similarity: float = 0.3, max_fraction_unassigned: float = 0.05, n_cores: int = None)[source]: Optimize tresholds for self.infer_clones and pick clonal labels with best silhouette scores across the attempted splits.

compute_cell_fitness()[source]: LBI method (Neher et al., 2014) from Cassiopeia.

compute_expansions()[source]: Call cassiopeia.tools.compute_expansion_pvalues. Compute clonal expansion pvalues as descrived in Yang, Jones et al, BioRxiv (2021).

extract_mut_order(pval_tresh: float = 0.01)[source]: Extract diagonal-order of MT-SNVs using mutation assignments, to create a ordered list of MT-SNVs for plotting.

get_M(alpha: float = 0.05, n_cores: int = None)[source]: Compute the “mutation enrichment” matrix, M. M is a mut x clade matrix storing for each mutation i and clade j the enrichment value defined as -log10(pval) from a Fisher’s Exact test. Uses joblib for parallel processing.

get_T(with_root: bool = True)[source]: Compute the “cell assignment” matrix, T. T is a cell x clade (internal node) binary matrix mapping each cell i to every clade j.

infer_clones(similarity_percentile: float = 85, mut_enrichment_treshold: int = 5) → DataFrame[source]: A MT-SNVs-specific re-adaptation of the recursive approach described in the MethylTree paper (… et al., 2025).

resolve_ambiguous_clones(df_predict: DataFrame, merging_treshold: float = 0.7, af_treshold: float = 0.0, add_to_meta: bool = False) → Tuple[Series, Series][source]: Final clonal resolution process. Tries to merge similar clones, iteratively. First, the (raw) AF matrix is aggregated at the clonal level using MiTo clones. Then, clone-clone similarities are computed using (1-) weighted jaccard distances among these aggregated MT-SNVs clonal profiles. At each round, the tiniest “ambiguous” clone is selected for merging with its smallest “interacting clone”. If the merge is successfull, the clonal assignment table is updated and the process go through other merging rounds, until no ambiguous clones remain. Unresolved clones (if any) are annotated as NaNs in the final MiTo clone column which is appended to self.tree.cell_meta.

mito.tl.bootstrap_MiTo(afm: AnnData, boot_replicate: str = 'observed', boot_strategy: str = 'feature_resampling', frac_char_resampling: float = 0.8) → AnnData[source]: Bootstrap MAESTER/RedeeM Allele Frequency matrices.

mito.tl.build_tree(afm: AnnData, precomputed: bool = False, distance_key: str = 'distances', metric: str = 'weighted_jaccard', bin_method: str = 'MiTo', solver: str = 'UPMGA', ncores: int = 1, min_n_positive_cells: int = 2, filter_muts: bool = False, max_frac_positive: float = 0.95, binarization_kwargs: Dict[str, Any] = {}, solver_kwargs: Dict[str, Any] = {}) → CassiopeiaTree[source]

Wrapper around cassiopeia lineage solvers. MW Jones et al., 2020.

Parameters:

afm (AnnData) – Allele Frequency Matrix.
precomputed (bool, optional) – Whether to use precomputed data. Default is False.
distance_key (str, optional) – Key in afm.obsp where distances are stored. Default is “distances”.
metric (str, optional) – Distance metric to use. Default is “weighted_jaccard”.
bin_method (str, optional) – Genotyping method. Default is “MiTo”.
solver (str, optional) – Lineage solver to use. Default is “UPMGA”.
ncores (int, optional) – Number of cores to use for computation. Default is 1.
min_n_positive_cells (int, optional) – Minimum number of positive cells required. Default is 2.
filter_muts (bool, optional) – Whether to filter mutations. Default is False.
max_frac_positive (float, optional) – Maximum fraction of positive cells allowed. Default is 0.95.
binarization_kwargs (dict, optional) – Additional keyword arguments for genotyping. Default is {}.
solver_kwargs (dict, optional) – Additional keyword arguments for the solver. Default is {}.

Returns:

Solved single-cell phylogeny.

Return type:

CassiopeiaTree

mito.tl.coarse_grained_tree(tree: CassiopeiaTree, groupby: str) → CassiopeiaTree[source]: Take a full cell phylogeny and coarse-grained it into a clone or “groupby” phylogeny.

mito.tl.compute_clonal_fate_bias(tree: CassiopeiaTree, state_column: str, clone_column: str, target_state: str | Any) → DataFrame[source]: Compute -log10(FDR) Fisher’s exact test: clonal fate biases towards some target_state.

mito.tl.compute_scPlasticity(tree: CassiopeiaTree, meta_column: str)[source]: Compute scPlasticity as in Yang et al., 2022. https://www.sc-best-practices.org/trajectories/lineage_tracing.html#

mito.tl.nb_regression(df: DataFrame, features: Iterable[str], predictor: str) → DataFrame[source]

Negative binomial regression approach to associate clonal-level features to gene expression.

Parameters:

afm (pd.DataFrame (clone/sample x features/covariates)) – Input data table. Contains raw counts for all genes, and covariates of interest.
features (list, str) – List of variables to test the GLM model coefficients on.
predictor (str) – Model specification via formula interface. Formula is in the form: “gene ~ predictor”. Example predictor: “fitness + counts”

Returns:

Filtered Allelic Frequency Matrix.

Return type:

AnnData

Plotting

mito.pl.MT_coverage_by_gene_polar(cov: DataFrame, sample: str = None, subset: Iterable[str] = None, ax: Axes = None) → Axes[source]: Plot coverage and muts across MT-genome positions, with annotated genes.

mito.pl.MT_coverage_polar(cov: DataFrame, var_subset: Iterable[str] = None, ax: Axes = None, n_xticks: int = 6, xticks_size: float = 7, yticks_size: float = 2, xlabel_size: float = 6, ylabel_size: float = 9, kwargs_main: Dict[str, Any] = {}, kwargs_subset: Dict[str, Any] = {}) → Axes[source]: Plot coverage and muts across MT-genome positions.

mito.pl.draw_embedding(afm: AnnData, basis: str = 'X_umap', feature: str = None, ax: Axes = None, categorical_cmap: str | Dict[str, Any] = ['#1f77b4', '#ff7f0e', '#279e68', '#d62728', '#aa40fc', '#8c564b', '#e377c2', '#b5bd61', '#17becf', '#aec7e8', '#ffbb78', '#98df8a', '#ff9896', '#c5b0d5', '#c49c94', '#f7b6d2', '#dbdb8d', '#9edae5', '#ad494a', '#8c6d31'], continuous_cmap: str = 'viridis', size: float = None, frameon: bool = False, outline: bool = False, legend: bool = False, loc: str = 'center left', bbox_to_anchor: Tuple[float, float] = (1, 0.5), artists_size: float = 10, label_size: float = 10, ticks_size: float = 10, kwargs: Dict[str, Any] = {}) → Axes[source]

sc.pl.embedding, with some defaults and a custom legend.

Parameters:

afm (AnnData) – Allele Frequency Matrix with some basis to plot in afm.obsm.
basis (str, optional) – Key in afm.obsm. Default is “X_umap”.
feature (Iterable[str], optional) – Features to plot. Default is an empty list.
ax (matplotlib.axes.Axes, optional) – Axes object to populate. Default is None.
categorical_cmap (str or dict, optional) – Color palette for categoricals. Default is sc.pl.palettes.vega_20_scanpy.
continuous_cmap (str, optional) – Color palette for continuous data. Default is “viridis”.
size (float, optional) – Point size. Default is None.
frameon (bool, optional) – Whether to draw a frame around the axes. Default is False.
outline (bool, optional) – Whether to draw a fancy outline around dots. Default is False.
legend (bool, optional) – Whether to automatically draw a legend. Default is False.
loc (str, optional) – Which corner of the legend to anchor. Default is “center left”.
bbox_to_anchor (tuple of float, optional) – Anchor ‘loc’ legend corner to ax.transformed coordinates. Default is (1, 0.5).
artists_size (float, optional) – Size of legend artists. Default is 10.
label_size (float, optional) – Size of legend labels. Default is 10.
ticks_size (float, optional) – Size of legend ticks. Default is 10.
kwargs (dict, optional) – Kwargs to sc.pl.embedding. Default is {}

Returns:

ax – Axes object.

Return type:

matplotlib.axes.Axes

mito.pl.heatmap_distances(afm: AnnData, distance_key: str = 'distances', tree: CassiopeiaTree = None, vmin: float = 0.25, vmax: float = 0.95, cmap: str = 'Spectral', ax: Axes = None) → Axes[source]

Heatmap cell/cell pairwise distances.

Parameters:

afm (AnnData) – Allele Frequency Matrix.
distance_key (str,) – Distence key in afm.obsp. Default is distances
tree (CassiopeiaTree, optional) – Tree from which cell ordering can be retrieved. Default is None.
vmin (float, optional) – Minimum value for the colorbar. Default is 0.25.
vmax (float, optional) – Maximum value for the colorbar. Default is 0.95.
cmap (str, optional) – Color map for cell-cell distances. Default is “Spectral”.
ax (matplotlib.axes.Axes, optional) – Axes object to draw on. Default is False.

Returns:

ax – Axes object.

Return type:

matplotlib.axes.Axes

mito.pl.heatmap_variants(afm: AnnData, tree: CassiopeiaTree = None, label: str = 'Allelic Frequency', annot: str = None, annot_cmap: Dict[str, Any] = None, layer: str = None, ax: Axes = None, cmap: str = 'mako', vmin: float = 0, vmax: float = 0.1, kwargs: Dict[str, Any] = {}) → Axes[source]

Heatmap cell x variants.

Parameters:

afm (AnnData) – Allele Frequency Matrix.
tree (CassiopeiaTree, optional) – Tree from which cell ordering can be retrieved. Default is None.
label (str, optional) – Label for layer colorbar. Default is “Allelic Frequency”.
annot (str, optional) – afm.obs column to annotate. Default is None.
annot_cmap (dict, optional) – Color mapping for afm.obs[annot]. Default is None.
layer (str, optional) – Layer to plot. Default is None.
ax (matplotlib.axes.Axes, optional) – Axes object to draw on. Default is False.
cmap (str, optional) – Color map for layer. Default is “mako”.
vmin (float, optional) – Minimum value for the colorbar. Default is 0.25.
vmax (float, optional) – Maximum value for the colorbar. Default is 0.95.
kwargs (dict, optional) – Optional kwargs to plu.plot_heatmap. Default is {}.

Returns:

ax – Axes object.

Return type:

matplotlib.axes.Axes

mito.pl.mut_profile(mut_list: Iterable[str], figsize: Tuple[float, float] = (6, 3), legend_kwargs: Dict[str, Any] = {}) → Figure[source]: Re-implementation of MutationProfile_bulk, from Weng et al., 2024).

mito.pl.packed_circle_plot(df: DataFrame, ax: Axes = None, covariate: str = None, color: str = None, cmap: Dict[str, Any] = None, color_by: str = None, alpha: float = 0.5, linewidth: float = 1.2, t_cov: float = 0.01, annotate: bool = False, fontsize: float = 6, ascending: bool = False, fontcolor: Any = 'white', fontweight: str = 'normal') → Axes[source]: Circle plot. Packed.

mito.pl.plot_ncells_nAD(afm: AnnData, ax: Axes = None, title: str = None, xticks: Iterable[Any] = None, yticks: str = None, s: float = 5, color: Any = 'k', alpha: float = 0.7, **kwargs) → Axes[source]: Plots similar to the one in Weng et al., 2024, followed by the two commentaries from Lareau and Weng, 2024. For each variant, plot the n of positive cells (x-axis) vs mean number of AD in positive cells (y-axis).

mito.pl.plot_tree(tree: CassiopeiaTree, ax: Axes = None, orient: float | str = 90, extend_branches: bool = True, angled_branches: bool = True, add_root: bool = False, features: Iterable[str] = None, categorical_cmaps: Dict[str, str | Dict[str, Any]] = None, continuous_cmaps: Dict[str, str | Dict[str, Any]] = None, characters: Iterable[str] = None, cont_character_cmap: str = 'mako', bin_character_cmap: Dict[str, Any] = None, layer: str = 'raw', vmin_characters: float = 0, vmax_characters: float = 0.05, colorstrip_spacing: float = 0.25, colorstrip_width: float = 1.5, labels: bool = True, label_size: float = 10, label_offset: float = 2, meta_branches: DataFrame = None, cov_branches: str = None, cmap_branches: str | Dict[str, Any] = 'Spectral_r', cov_leaves: str = None, cmap_leaves: str | Dict[str, Any] = 'tab20', feature_internal_nodes: str = None, cmap_internal_nodes: str | Dict[str, Any] = 'Spectral_r', vmin: float = None, vmax: float = None, vmin_internal_nodes: float = 0.2, vmax_internal_nodes: float = 0.8, vmin_leaves: float = None, vmax_leaves: float = None, internal_node_labels: bool = False, internal_node_subset: Iterable[str] = None, internal_node_label_size: float = 7, show_internal: bool = False, leaves_labels: bool = False, leaf_label_size: float = 5, colorstrip_kwargs: Dict[str, Any] = {}, leaf_kwargs: Dict[str, Any] = {}, internal_node_kwargs: Dict[str, Any] = {}, branch_kwargs: Dict[str, Any] = {}, x_space: float = 1.5) → Axes[source]

Plotting function that extends capabilities in cs.plotting.local.plot_matplotlib from Cassiopeia, MW Jones et al, 2020.

Parameters:

tree (CassiopeiaTree) – Tree to plot.
ax (matplotlib.axes.Axes, optional) – Axes object to draw on. Default is None.
orient (float or str, optional) – Tree layout in polar (90) or cartesian coordinates (e.g., “down”). Default is 90.
extend_branches (bool, optional) – Equal length branch from leaf to root. Default is True.
angled_branches (bool, optional) – Make branches angled, not round. Default is True.
add_root (bool, optional) – Add root to tree. Default is False.
features (Iterable[str], optional) – Features in tree.cell_meta to plot. Default is None.
categorical_cmaps (dict of {str: str or dict}, optional) – Dictionary of colors for categorical features. Default is None.
continuous_cmaps (dict of {str: str or dict}, optional) – Dictionary of colors for continuous features. Default is None.
characters (Iterable[str], optional) – List of characters to plot. Default is None.
cont_character_cmap (str, optional) – Color map for characters (“raw” layer). Default is “mako”.
bin_character_cmap (dict, optional) – Colors for binary character states (“transformed” layer). Default is None.
layer (str, optional) – Layer in tree.layers to plot, if characters is not None. Default is “raw”.
vmin_characters (float, optional) – Minimum value for character colorbar. Default is 0.
vmax_characters (float, optional) – Maximum value for character colorbar. Default is 0.05.
colorstrip_spacing (float, optional) – Relative amount of spacing between colorstrips. Default is 0.25.
colorstrip_width (float, optional) – Relative colorstrip width. Default is 1.5.
labels (bool, optional) – Draw labels for features and characters. Default is True.
label_size (float, optional) – Features and character label size. Default is 10.
label_offset (float, optional) – Features and character label offset. Default is 2.
meta_branches (pd.DataFrame, optional) – Annotation table for branches. Default is None.
cov_branches (str, optional) – Branch feature to plot. Default is None.
cmap_branches (str or dict, optional) – Color map for branch feature. Default is “Spectral_r”.
cov_leaves (str, optional) – Leaf feature to plot. Default is None.
cmap_leaves (str or dict, optional) – Color map for leaves feature. Default is “tab20”.
vmin_leaves (float, optional) – Min value for leaves cmap.
vmax_leaves (float, optional) – Max value for leaves cmap.
feature_internal_nodes (str, optional) – Internal node feature to plot. Default is None.
cmap_internal_nodes (str or dict, optional) – Color map for internal nodes feature. Default is “Spectral_r”.
vmin_internal_nodes (float, optional) – Minimum value for internal node feature colorbar. Default is 0.2.
vmax_internal_nodes (float, optional) – Maximum value for internal node feature colorbar. Default is 0.8.
internal_node_labels (bool, optional) – Draw internal node names on location. Default is False.
internal_node_subset (Iterable[str], optional) – Subset of internal nodes to plot. Default is None.
internal_node_label_size (float, optional) – Internal node name/label size. Default is 7.
show_internal (bool, optional) – Show internal nodes. Default is False.
leaves_labels (bool, optional) – Plot leaves names. Default is False.
leaf_label_size (float, optional) – Leaf name/label size. Default is 5.
colorstrip_kwargs (dict, optional) – Additional colorstrip keyword arguments. Default is {}.
leaf_kwargs (dict, optional) – Additional leaves keyword arguments. Default is {}.
internal_node_kwargs (dict, optional) – Additional internal nodes keyword arguments. Default is {}.
branch_kwargs (dict, optional) – Additional branch keyword arguments. Default is {}.

Returns:

ax – Axes object.

Return type:

matplotlib.axes.Axes

mito.pl.vars_AF_spectrum(afm: AnnData, ax: Axes = None, color: str = 'b', **kwargs) → Axes[source]: Ranked AF distributions (as in Miller et al., 2022).

Utils

mito.ut.AOC(D1: array, D2: array, k: int = 10, n_trials: int = 1000)[source]: Agreement of Closeness (AOC) metric calculation. See Weng et al., 2024.

mito.ut.CI(tree: CassiopeiaTree) → float[source]: Calculate the Consistency Index (CI) of tree characters.

mito.ut.NN_entropy(index: array, labels: array) → float[source]

Calculate the median (over cells) lentiviral-labels Shannon Entropy, given an index matrix of a KNN graph.

Parameters:

index (np.array) – Array of shape (n_cells, k-neighbors) containing cell neighbors indeces.
labels (pd.Series) – Discrete-valued batch annotation for each cell (length n_cells).

Returns:

float

Return type:

NN Shannon Entropy score.

mito.ut.RI(tree: CassiopeiaTree) → float[source]: Calculate the Consistency Index (RI) of tree characters.

mito.ut.calculate_corr_distances(tree: CassiopeiaTree) → float[source]: Calculate correlation between tree and character matrix cell-cell distances. Used in Yang et al., 2023.

mito.ut.custom_ARI(g1: Iterable[Any], g2: Iterable[Any]) → float[source]: Compute scIB (Luecken et al., 2022) modified Adjusted Rand Index.

mito.ut.distance_AUPRC(D: array, labels: Iterable[Any]) → float[source]

Uses a n x n distance matrix D as a binary classifier for a set of labels (1,…,n). Reports Area Under Precision Recall Curve. Used in Ludwig et al., 2019.

Parameters:

D (np.array) – Array of shape (n_cells, n_cells) containing cell-cell distances.
labels (pd.Series) – Discrete-valued batch annotation for each cell (length n_cells).

Returns:

float

Return type:

AUPRC score.

mito.ut.genotype_mix(ad: array, dp: array, t_prob: float = 0.7, t_vanilla: float = 0, debug: bool = False, min_AD: int = 1) → array[source]: Derive a discrete genotype (1:’MUT’, 0:’WT’) for each cell, given the AD and DP counts of one of its candidate mitochondrial variants.

mito.ut.kbet(index: array, batch: Series, alpha: float = 0.05, only_score: bool = True) → Tuple[float, float, float][source]

Computes the kBET metric (Buttner et al., 2018) to assess batch effects for an index matrix of a KNN graph.

Parameters:

index (np.array) – Array of shape (n_cells, n_neighbors) containing kNN indices.
batch (pd.Series) – Discrete-valued batch annotation for each cell (length n_cells).
alpha (float, optional) – Significance level of the chi-squared test. Default is 0.05.
only_score (bool, optional) – If True, return only the accept rate; otherwise, return full kBET results. Default is True.

Returns:

kBET statistics, where: - stat_mean is the mean test statistic, - pvalue_mean is the mean p-value, - accept_rate is the overall acceptance rate.

Return type:

tuple of (stat_mean, pvalue_mean, accept_rate)

mito.ut.subsample_afm(afm, n_clones=3, ncells=100, freqs=array([0.3, 0.3, 0.4]))[source]