What is Speculative Sampling
Video
What is Speculative Sampling
A quick explainer video for a technique called ‘speculative sampling’ or ‘assisted generation’ which speeds up language model sampling through the use of a smaller ‘draft’ model. On some data types this can give a 2x speedup with no loss in accuracy! Let me know if you have suggestions for other topics you’d like covered.http://jalammar.github.io/illustrated-gpt2/https://huggingface.co/blog/assisted-generationhttps://arxiv.org/abs/2302.01318 (Accelerating Large Language Model Decoding with Speculative Sampling)https://proceedings.neurips.cc/paper/2018/file/c4127b9194fe8562c64dc0f5bf2c93bc-Paper.pdf (Blockwise Parallel Decoding for Deep Autoregressive Models)