Skip to content

Dataset: X-Atlas/Pisces

X-Atlas/Pisces is the largest CRISPRi Perturb-seq compendium to date, comprising 25.6 million perturbed single-cell transcriptomes across 7 biologically diverse contexts.

Screens

Screen Cell Type Perturbations Perturbed Cells Median KD %
HCT116 Colorectal cancer 18,924 3.4M 70%
HEK293T Kidney epithelial 18,312 4.5M 48%
HepG2 Hepatocellular carcinoma 9,735 2.6M 85%
iPSC Induced pluripotent stem cells 10,095 4.2M 82%
Jurkat Resting T lymphoblastic leukemia 10,872 2.8M 79%
Jurkat Active CD3/CD28-stimulated T cells 10,878 2.8M 71%
iPSC Multi-Diff Multi-lineage differentiation 12,175 5.1M 96%

Data Access

Test perturbation sets (held-out genes from HepG2 and iPSC, plus full Jurkat Resting/Active screens) are available on HuggingFace:

🤗 Xaira-Therapeutics/X-Atlas-Pisces

Format

Data is provided as .h5ad files (AnnData format) with:

  • .X — log-normalized expression (log1p CP10k)
  • .obs["perturbation"] — gene target of CRISPRi knockdown
  • .obs["is_control"] — boolean flag for non-targeting controls
  • .var_names — ENSEMBL gene IDs

Context-Dependent Biology

A key finding from X-Atlas/Pisces is that perturbation effects are strongly context-dependent. Hierarchical clustering of perturbation effect profiles (F1 scores from a per-perturbation binary classifier) reveals three classes:

  • Context-independent: core essential machinery (e.g., mitochondrial ribosome subunits, oxidative phosphorylation) — enriched in shared metabolic functions
  • Context-specific: lineage-defining regulators — enriched in cell-type-specific pathways (e.g., hypoxia response in HepG2, neural crest differentiation in iPSC)
  • Conserved proximal / variable distal: perturbations where the direct consequence is consistent but downstream cascades diverge by context

This context-dependence motivates X-Cell's cross-attention architecture, which conditions predictions on multi-modal biological priors rather than learning context-invariant representations.