‘Diffusion Beats Autoregressive in Data-Constrained Settings’ - Paper read + a win for open review

Video
Published

August 7, 2025

‘Diffusion Beats Autoregressive in Data-Constrained Settings’ - Paper read + a win for open review

In this video we take a look at the paper ‘Diffusion Beats Autoregressive in Data-Constrained Settings’ (http://arxiv.org/abs/2507.15857) and a great discussion on Twitter that helped push the results + related hypotheses further, to the benefit of all.

After I shared it, some of the authors shared a parallel work that addressed some issues in the original but found the same overall lesson: when data constrained, dLLMs learn better than AR (but only after a lot of compute): https://jinjieni.notion.site/Diffusion-Language-Models-are-Super-Data-Learners-239d8f03a866800ab196e49928c019ac - very much worth checking out as a postscript to the video.

Tweets referenced:

https://x.com/giffmana/status/1947729993607348255 https://x.com/giffmana/status/1949001902970339471 https://x.com/mihirp98/status/1953196510725980173 https://x.com/giffmana/status/1953206125639123188