Skip to content
← All research
Published

NovoExpert: State-of-the-art ADMET prediction on TDC benchmarks

Five SOTA wins on Therapeutics Data Commons ADMET endpoints using CatBoost ensembles with MapLight and GIN fingerprints.

Ari HarrisonApril 5, 2026
Read full paper

Abstract

We present NovoExpert, a family of ADMET prediction models achieving state-of-the-art performance on five Therapeutics Data Commons (TDC) benchmark endpoints. Our approach combines CatBoost gradient-boosted trees with MapLight (2573-bit) and GIN (300-dimensional) molecular fingerprints, supplemented by Chemprop v2 directed message-passing neural networks for specific endpoints.

Results

EndpointMetricScoreImprovement
CYP2D6 VeithAUPRC0.778+0.028
CYP3A4 VeithAUPRC0.916+0.016
CYP3A4 SubstrateAUPRC0.648+0.004
Clearance HepatocyteSpearman0.602+0.024
DILIAUROC0.922+0.006

Method

For four of five winning endpoints, the final model is a CatBoost classifier trained on concatenated MapLight and GIN fingerprints. For DILI, a Chemprop v2 D-MPNN achieved the best performance. All models were validated using the TDC benchmark scaffold split.