scfocus.utils.preprocess
- scfocus.utils.preprocess(_adata, n_top_genes)[source]
Preprocess single-cell RNA-seq data using scanpy.
This function performs standard preprocessing steps including count normalization, log transformation, identification of highly variable genes, and PCA.
- Parameters:
_adata (anndata.AnnData) – Annotated data matrix with cells as observations and genes as variables. Note: Despite the underscore prefix (required by Streamlit caching), this function modifies the AnnData object in place.
n_top_genes (int) – Number of highly variable genes to identify.
Notes
This function uses Streamlit’s caching mechanism to avoid redundant computations. The preprocessing steps are: 1. Total count normalization to 10,000 counts per cell 2. Log transformation (log1p) 3. Highly variable gene identification 4. PCA on highly variable genes