What's new? Speculative decoding pairs a heavy main model with a light drafter to pre-generate tokens; Gemma 4 models now run on consumer GPUs and edge devices;
What's new? Speculative decoding pairs a heavy main model with a light drafter to pre-generate tokens; Gemma 4 models now run on consumer GPUs and edge devices;