Removing the embedding from my embedding: a byte transformer with a 0-parameter input layer (25M, single RTX 4070)

Hi everyone, a follow-up — and a slightly absurd experiment that worked.

Since the last post, the substrate ablation toolkit shipped inside the encoder (hsl_embedding.ablation — capacity-matched hsl / learned / random / permuted arms, as discussed in this thread). While running the full A/B I got curious about a stranger question:

**what happens if I remove the embedding from my…

Read more →
Page 1