Module: Cell_BLAST.data

Dataset utilities

Classes:

Dataset(data_dict)

Functions:

annotation_confidence(adata, annotation[, ...])

Compute annotation confidence of each obs (cell) based on sample silhouette score.

compute_libsize(adata)

Compute library size

find_variable_genes(adata[, slot, ...])

A reimplementation of the Seurat v2 "mean.var.plot" gene selection method in the "FindVariableGenes" function, with the extended ability of selecting variable genes within specified groups of cells and then combine results of individual groups.

map_vars(adata, mapping[, map_hvg])

Map variables of input dataset to some other terms, e.g.

normalize(adata[, target])

Obs-wise normalization of expression matrix.

read_table(filename[, orientation, sparsify])

Read expression matrix from a plain-text file

select_vars(adata, var_names)

Select variables with special support for variables inexistent in the input (in which case the inexistent variables will be filled with zeros).

write_table(adata, filename[, orientation])

Write the expression matrix to a plain-text file.

class Cell_BLAST.data.Dataset(data_dict)[source]
Cell_BLAST.data.annotation_confidence(adata, annotation, used_vars=None, metric='cosine', return_group_percentile=True)[source]

Compute annotation confidence of each obs (cell) based on sample silhouette score.

Parameters:
  • adata (AnnData) – Input dataset

  • annotation (Union[str, List[str]]) – Specifies annotation for which confidence will be computed. If passed an array-like, it should be 1 dimensional with length equal to obs number, and will be used directly as annotation. If passed a string, it should be a column name in obs.

  • used_vars (Optional[List[str]]) – Specifies the variables used to evaluate metric, If not specified, all variables are used.

  • metric (str) – Specifies distance metric used to compute sample silhouette scores. See sklearn.metrics.silhouette_samples() for available options.

  • return_group_percentile (bool) – Whether to return within group confidence percentile, instead of raw sample silhouette score.

Return type:

Tuple[ndarray, ndarray]

Returns:

  • confidence – 1 dimensional numpy array containing annotation confidence for each obs.

  • group_percentile – 1 dimensional numpy array containing within-group percentile for each obs.

Cell_BLAST.data.compute_libsize(adata)[source]

Compute library size

Parameters:

adata (AnnData) – Input dataset.

Return type:

None

Cell_BLAST.data.find_variable_genes(adata, slot='variable_genes', x_low_cutoff=0.1, x_high_cutoff=8.0, y_low_cutoff=1.0, y_high_cutoff=inf, num_bin=20, binning_method='equal_frequency', grouping=None, min_group_frac=0.5)[source]

A reimplementation of the Seurat v2 “mean.var.plot” gene selection method in the “FindVariableGenes” function, with the extended ability of selecting variable genes within specified groups of cells and then combine results of individual groups. This is useful to minimize batch effect during feature selection.

Parameters:
  • adata (AnnData) – Input dataset

  • slot (str) – Slot in var to store the variable genes

  • x_low_cutoff (float) – Minimal log mean cutoff

  • x_high_cutoff (float) – Maximal log mean cutoff

  • y_low_cutoff (float) – Minimal log VMR cutoff

  • y_high_cutoff (float) – Maximal log VMR cutoff

  • num_bin (int) – Number of bins based on mean expression.

  • binning_method (str) – How binning should be done based on mean expression. Available choices include {“equal_width”, “equal_frequency”}.

  • grouping (Optional[str]) – Specify a column in the obs table that splits cells into several groups. Gene selection is performed in each group separately and results are combined afterwards.

  • min_group_frac (float) – The minimal fraction of groups in which a gene must be selected for it to be kept in the final result.

Returns:

VMR plot (a dict of plots if grouping is specified)

Return type:

ax

Cell_BLAST.data.map_vars(adata, mapping, map_hvg=None)[source]

Map variables of input dataset to some other terms, e.g. gene ortholog groups, or orthologous genes in another species.

Note that “raw”, “varm” and “layers” will be discarded.

Parameters:
  • adata (AnnData) – Input dataset.

  • mapping (DataFrame) – A 2-column data frame defining variable name mapping. First column is source variable name and second column is target variable name.

  • map_hvg (Optional[List[str]]) – Specify var slots containing highly variable genes that should also be mapped.

Returns:

Mapped dataset.

Return type:

mapped

Cell_BLAST.data.normalize(adata, target=10000.0)[source]

Obs-wise normalization of expression matrix.

Parameters:
  • adata (AnnData) – Input dataset.

  • target (float) – Target value of normalization.

Return type:

None

Cell_BLAST.data.read_table(filename, orientation='cg', sparsify=False, **kwargs)[source]

Read expression matrix from a plain-text file

Parameters:
  • filename (str) – Name of the file to read from.

  • orientation (str) – Specifies whether matrix in the file is in \(cell \times gene\) or \(gene \times cell\) orientation.

  • sparsify (bool) – Whether to convert the expression matrix into sparse format.

  • kwargs – Additional keyword arguments will be passed to pandas.read_csv().

Returns:

An ad.AnnData object loaded from the file.

Return type:

loaded_dataset

Cell_BLAST.data.select_vars(adata, var_names)[source]

Select variables with special support for variables inexistent in the input (in which case the inexistent variables will be filled with zeros).

Note that “raw”, “varm” and “layers” will be discarded.

Parameters:
  • adata (AnnData) – Input dataset.

  • var_names (List[str]) – Variables to select.

Returns:

Dataset with selected variables.

Return type:

selected

Cell_BLAST.data.write_table(adata, filename, orientation='cg', **kwargs)[source]

Write the expression matrix to a plain-text file. Note that obs (cell) meta table, var (gene) meta table and data in the uns slot are discarded, only the expression matrix is written to the file.

Parameters:
  • adata (AnnData) – Input Dataset.

  • filename (str) – Name of the file to be written.

  • orientation (str) – Specifies whether to write in \(obs \times var\) or \(obs \times var\) orientation, should be among {“cg”, “gc”}.

  • kwargs – Additional keyword arguments will be passed to pandas.DataFrame.to_csv().

Return type:

None