Skip to content

X-Cell

A diffusion language model for genome-scale perturbation prediction across diverse cellular contexts.

X-Cell Architecture

X-Cell predicts genome-scale transcriptional responses to genetic perturbations across diverse cellular contexts. Trained on X-Atlas/Pisces — 25.6 million perturbed single cells across 7 CRISPRi Perturb-seq screens — X-Cell integrates multi-modal biological priors through cross-attention and generalizes zero-shot to unseen cell types and perturbations.

Preprint 🤗 Model Weights 🤗 Dataset

Availability

Model weights and inference code are coming soon. The API examples below reflect the planned interface. Watch the GitHub repository for release updates.


Key Results

  • State-of-the-art fold-change prediction


    X-Cell achieves Pearson Δ of 0.51 on held-out iPSC perturbations — over higher than the next-best method.

  • Zero-shot T-cell inactivation


    Predicts CD3 complex inactivators and novel regulators (LRBA, APPL2) confirmed by an independent primary T-cell screen.

  • LLM-class scaling laws


    Train loss scales as a power law (α = 0.32) matching large language models, across 83M to 4.9B parameters.

  • Zero-shot cell type generalization


    Generalizes to melanocyte progenitors and primary human CD4+ T cells using test-time adaptation on unlabeled controls.


Installation

pip install xcell

Quick Start

import anndata as ad
from xcell import XCell

# Load pretrained X-Cell Mini
model = XCell.from_pretrained("Xaira-Therapeutics/X-Cell", variant="mini")

# Predict from an AnnData object
adata = ad.read_h5ad("your_control_cells.h5ad")
predictions = model.predict(adata, perturbation="BRCA1")

# Or predict from one or more .h5ad file paths directly
predictions = model.predict(
    ["screen1.h5ad", "screen2.h5ad"],
    perturbation="BRCA1",
)

See Quick Start for full examples including batch prediction and output interpretation.


Model

Model Parameters Description Weights
X-Cell Mini 55M Fast inference; initialized from scGPT 🤗 Xaira-Therapeutics/X-Cell

Dataset: X-Atlas/Pisces

The largest CRISPRi Perturb-seq compendium to date, comprising 25.6 million perturbed single cells across 7 diverse biological contexts.

Screen Context Perturbations
HCT116 Colorectal cancer 18,924
HEK293T Kidney epithelial 18,312
HepG2 Hepatocellular carcinoma 9,735
iPSC Induced pluripotent stem cells 10,095
Jurkat Resting T lymphoblastic leukemia 10,872
Jurkat Active CD3/CD28-stimulated T cells 10,878
iPSC Multi-Diff Multi-lineage differentiation 12,175

Test perturbation sets available at 🤗 Xaira-Therapeutics/X-Atlas-Pisces.


Citation

If you use X-Cell or X-Atlas/Pisces in your research, please cite:

@article{xcell2026,
  title   = {X-Cell: Scaling Causal Perturbation Prediction Across Diverse
             Cellular Contexts via Diffusion Language Models},
  year    = {2026},
}

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.