摘要
Abstract Drug discovery remains a slow and costly process, in part because efficacy, toxicity, and physicochemical liabilities must be screened across a vast chemical space. Stand-alone, single-task predictors can help, but they lead to fragmented workflows and make it hard to reuse learned representations, data processing, and infrastructure across endpoints (i.e., prediction tasks). Here we present CheMLT-F, a compact multitask transformer that fuses encoders for molecular and protein sequences to learn a unified representation spanning 680+ endpoints, including diverse toxicities, physicochemical properties, and drug–target interactions. Across 13 public benchmarks, CheMLT-F matches state-of-the-art toxicity classifiers and sets new performance marks for physicochemical property prediction, while remaining competitive for drug–target affinity (KIBA and Davis). Moreover, CheMLT-F demonstrates competitive performance on an external protein-family benchmark spanning seven target superfamilies, indicating broad generalizability in bioactivity prediction. Multitask parameter sharing keeps the model lightweight and inference-efficient, and its modular heads make extensions to new endpoints straightforward. By replacing many individual models with a single, extensible backbone, CheMLT-F streamlines in silico screening and lowers the barrier to broad, data-driven decision-making in early drug discovery. Scientific contribution We introduce a unified transformer architecture that jointly models molecular and protein sequences across hundreds of pharmacologically relevant endpoints spanning toxicity, physicochemical properties, and drug–target interactions. A tailored training strategy that combines partial encoder freezing, global–local loss balancing, and weighted task sampling reduces trainable parameters and deployment complexity while preserving strong cross-domain generalization. Comprehensive evaluation across 13 public datasets, including scaffold-aware and random data splits, demonstrates competitive accuracy with substantially lower operational overhead than maintaining numerous single-task models, establishing a scalable foundation for extensible and holistic predictive modeling in computational drug discovery.