API Reference
xcell.XCell
X-Cell: a diffusion language model for genome-scale perturbation prediction.
X-Cell predicts the transcriptional response to genetic perturbations from a set of control cells. It operates on sets of cells (not individual cells) and refines predictions iteratively via a masked diffusion process.
Available variant:
"mini"— 55M parameters, initialized from scGPT, runs on a single GPU.
Examples
Load X-Cell Mini and predict the response to a BRCA1 knockdown:
import anndata as ad from xcell import XCell model = XCell.from_pretrained("Xaira-Therapeutics/X-Cell", variant="mini") adata = ad.read_h5ad("control_cells.h5ad") predictions = model.predict(adata, perturbation="BRCA1")
Predict from multiple .h5ad files:
predictions = model.predict( ... ["screen1.h5ad", "screen2.h5ad"], ... perturbation="BRCA1", ... )
from_pretrained(model_id='Xaira-Therapeutics/X-Cell', variant='mini', device=None, cache_dir=None)
classmethod
Load a pretrained X-Cell model from HuggingFace Hub.
Parameters
model_id:
HuggingFace repository ID. Defaults to "Xaira-Therapeutics/X-Cell".
variant:
Model variant. Currently only "mini" (55M) is available.
device:
PyTorch device string (e.g. "cuda", "cpu").
Defaults to CUDA if available, otherwise CPU.
cache_dir:
Local directory for caching downloaded weights.
Returns
XCell A loaded model instance ready for inference.
Raises
ValueError
If variant is not one of the supported variants.
predict(data, perturbation, n_cells=64, n_diffusion_steps=4, batch_size=8)
Predict the transcriptional response to a perturbation.
Parameters
data: Control cell expression. Accepts:
- an :class:`anndata.AnnData` object,
- a path (``str`` or :class:`pathlib.Path`) to an ``.h5ad`` file,
- a list of ``.h5ad`` file paths (cells are pooled across files).
Expression values should be log-normalized (log1p CP10k). Genes not
present in the X-Cell vocabulary are zero-imputed.
perturbation:
HGNC gene symbol of the CRISPRi knockdown to simulate (e.g. "BRCA1").
n_cells:
Number of control cells to sample per prediction set. Default 64.
n_diffusion_steps:
Number of iterative diffusion refinement steps at inference. Default 4.
batch_size:
Number of cell sets to process in parallel per forward pass.
Returns
AnnData
Predicted perturbed expression. Shape matches the input data.
- ``.X`` — predicted log-normalized expression (log1p CP10k)
- ``.obs["perturbation"]`` — perturbation name
- ``.var`` — gene metadata (same as input)
Raises
RuntimeError
If the model has not been loaded via :meth:from_pretrained.