Bioinformatics
Diffusion Models
Drug Discovery

Diffusion Models for De Novo Molecular Design: Where We Are and Where We're Going

We review recent progress in applying score-based and DDPM-style diffusion models to molecular generation, and outline the open challenges in conditional, scaffold-aware, and target-specific generation.

AUAdmin User
March 10, 20259 min read

Introduction

Diffusion models have reshaped generative AI across images, audio, and video. The same principles are now transforming molecular design — enabling AI systems to generate novel drug-like molecules with controllable structural and pharmacological properties.

Ligand-Based and Structure-Based Approaches

Early generative work on molecules relied primarily on ligand-based representations — encoding molecules as SMILES strings or molecular graphs and training variational autoencoders or flow-based models to generate similar structures. While effective, these approaches ignore the 3D geometry of the target protein binding site.

Structure-based methods address this directly by conditioning generation on the 3D pocket geometry of the target. Our MolDiff framework extends this idea by jointly modeling scaffold constraints and pocket geometry inside a single diffusion denoising process.

Scaffold-Conditioned Diffusion

Scaffold-based drug design preserves a core substructure (the scaffold) while generating diverse R-groups. MolDiff incorporates scaffold conditioning as a masked diffusion objective: scaffold atoms are fixed, while peripheral atoms are iteratively denoised over the course of the reverse diffusion process.

Explainability Matters

Generative models are only useful in practice if medicinal chemists can understand and trust their outputs. Our ExplainMol toolkit provides substructure attribution maps that highlight which atoms and bonds contribute most to predicted binding affinity, directly linking model predictions to interpretable chemistry.

Open Challenges

Multi-target generation, synthesizability constraints, and ADMET-aware generation remain unsolved. We outline our roadmap for addressing each in upcoming work.

Share this article

About the Author

AU

Admin User