02.16.2026
5 Mins

MetaOmics-10T

MetaOmics-10T: The Foundational Dataset to UnlockCausal Modeling of Microbial Ecosystems

We propose MetaOmics-10T—an openly shareable, foundational dataset to unlockAI-accelerated discovery in microbial ecosystems. The dataset directly enablesthree high-impact AI tasks: (1) forecasting ecosystem dynamics, (2) predictingcounterfactual outcomes of interventions, and (3) inverse-design of microbial therapies under safety constraints. MetaOmics-10T combines 10 trillion base pairsreclaimed from public archives using a Quality-Aware Tokenization (QA-Token)framework with 100,000+ interventional trajectories generated via model-guidedexperimental design. The result is a first-of-its-kind, probabilistic, interventionready corpus that addresses the principal bottleneck for causal modeling in microbiome science and provides an empirical testbed to assess the reach and limits ofcausal inference at scale.

We are deploying the first multi-omic foundation model across the World Wide FINGERS Network.

Run Demo in AD Workbench