Abstract: Speculative decoding has proven to be a powerful method for speeding up autoregressive inference by allowing tokens to be generated in parallel using a draft-then-verify approach. However, ...