Module: Cell_BLAST.data
Dataset utilities
Classes:
|
Functions:
|
Compute annotation confidence of each obs (cell) based on sample silhouette score. |
|
Compute library size |
|
A reimplementation of the Seurat v2 "mean.var.plot" gene selection method in the "FindVariableGenes" function, with the extended ability of selecting variable genes within specified groups of cells and then combine results of individual groups. |
|
Map variables of input dataset to some other terms, e.g. |
|
Obs-wise normalization of expression matrix. |
|
Read expression matrix from a plain-text file |
|
Select variables with special support for variables inexistent in the input (in which case the inexistent variables will be filled with zeros). |
|
Write the expression matrix to a plain-text file. |
- Cell_BLAST.data.annotation_confidence(adata, annotation, used_vars=None, metric='cosine', return_group_percentile=True)[source]
Compute annotation confidence of each obs (cell) based on sample silhouette score.
- Parameters:
adata (
AnnData
) – Input datasetannotation (
Union
[str
,List
[str
]]) – Specifies annotation for which confidence will be computed. If passed an array-like, it should be 1 dimensional with length equal to obs number, and will be used directly as annotation. If passed a string, it should be a column name inobs
.used_vars (
Optional
[List
[str
]]) – Specifies the variables used to evaluatemetric
, If not specified, all variables are used.metric (
str
) – Specifies distance metric used to compute sample silhouette scores. Seesklearn.metrics.silhouette_samples()
for available options.return_group_percentile (
bool
) – Whether to return within group confidence percentile, instead of raw sample silhouette score.
- Return type:
- Returns:
confidence – 1 dimensional numpy array containing annotation confidence for each obs.
group_percentile – 1 dimensional numpy array containing within-group percentile for each obs.
- Cell_BLAST.data.find_variable_genes(adata, slot='variable_genes', x_low_cutoff=0.1, x_high_cutoff=8.0, y_low_cutoff=1.0, y_high_cutoff=inf, num_bin=20, binning_method='equal_frequency', grouping=None, min_group_frac=0.5)[source]
A reimplementation of the Seurat v2 “mean.var.plot” gene selection method in the “FindVariableGenes” function, with the extended ability of selecting variable genes within specified groups of cells and then combine results of individual groups. This is useful to minimize batch effect during feature selection.
- Parameters:
adata (
AnnData
) – Input datasetslot (
str
) – Slot in var to store the variable genesx_low_cutoff (
float
) – Minimal log mean cutoffx_high_cutoff (
float
) – Maximal log mean cutoffy_low_cutoff (
float
) – Minimal log VMR cutoffy_high_cutoff (
float
) – Maximal log VMR cutoffnum_bin (
int
) – Number of bins based on mean expression.binning_method (
str
) – How binning should be done based on mean expression. Available choices include {“equal_width”, “equal_frequency”}.grouping (
Optional
[str
]) – Specify a column in theobs
table that splits cells into several groups. Gene selection is performed in each group separately and results are combined afterwards.min_group_frac (
float
) – The minimal fraction of groups in which a gene must be selected for it to be kept in the final result.
- Returns:
VMR plot (a dict of plots if grouping is specified)
- Return type:
ax
- Cell_BLAST.data.map_vars(adata, mapping, map_hvg=None)[source]
Map variables of input dataset to some other terms, e.g. gene ortholog groups, or orthologous genes in another species.
Note that “raw”, “varm” and “layers” will be discarded.
- Parameters:
- Returns:
Mapped dataset.
- Return type:
mapped
- Cell_BLAST.data.normalize(adata, target=10000.0)[source]
Obs-wise normalization of expression matrix.
- Cell_BLAST.data.read_table(filename, orientation='cg', sparsify=False, **kwargs)[source]
Read expression matrix from a plain-text file
- Parameters:
filename (
str
) – Name of the file to read from.orientation (
str
) – Specifies whether matrix in the file is in \(cell \times gene\) or \(gene \times cell\) orientation.sparsify (
bool
) – Whether to convert the expression matrix into sparse format.kwargs – Additional keyword arguments will be passed to
pandas.read_csv()
.
- Returns:
An
ad.AnnData
object loaded from the file.- Return type:
loaded_dataset
- Cell_BLAST.data.select_vars(adata, var_names)[source]
Select variables with special support for variables inexistent in the input (in which case the inexistent variables will be filled with zeros).
Note that “raw”, “varm” and “layers” will be discarded.
- Cell_BLAST.data.write_table(adata, filename, orientation='cg', **kwargs)[source]
Write the expression matrix to a plain-text file. Note that
obs
(cell) meta table,var
(gene) meta table and data in theuns
slot are discarded, only the expression matrix is written to the file.- Parameters:
adata (
AnnData
) – Input Dataset.filename (
str
) – Name of the file to be written.orientation (
str
) – Specifies whether to write in \(obs \times var\) or \(obs \times var\) orientation, should be among {“cg”, “gc”}.kwargs – Additional keyword arguments will be passed to
pandas.DataFrame.to_csv()
.
- Return type: