New Ways to Scale Inference Time Compute of LLMs: Parallel Scaling, Diffusion and More
Video
New Ways to Scale Inference Time Compute of LLMs: Parallel Scaling, Diffusion and More
Looking at the paper ‘Parallel Scaling Law for Language Models’ (https://arxiv.org/abs/2505.10475) with detours into ‘Large Language Models to Diffusion Finetuning’ as a way to examine these approaches to spending more compute at inference time per token without scaling up the total number of parameters. These research directions complement existing inference-time scaling work like reasoning models (o1/r1).