scfocus.utils.preprocess

scfocus.utils.preprocess(_adata, n_top_genes)[source]

Preprocess single-cell RNA-seq data using scanpy.

This function performs standard preprocessing steps including count normalization, log transformation, identification of highly variable genes, and PCA.

Parameters:

_adata (anndata.AnnData) – Annotated data matrix with cells as observations and genes as variables. Note: Despite the underscore prefix (required by Streamlit caching), this function modifies the AnnData object in place.
n_top_genes (int) – Number of highly variable genes to identify.

Notes

This function uses Streamlit’s caching mechanism to avoid redundant computations. The preprocessing steps are: 1. Total count normalization to 10,000 counts per cell 2. Log transformation (log1p) 3. Highly variable gene identification 4. PCA on highly variable genes