Show HN: WaveletLM – wavelet-based, attention-free model with O(n log n) scaling https://ift.tt/GAW4jpX
Show HN: WaveletLM – wavelet-based, attention-free model with O(n log n) scaling WaveletLM is a wavelet-based, attention-free architecture that replaces self-attention with learned lifting wavelet decomposition, a Fast Walsh-Hadamard Transform, per-scale gated spectral mixing with SwiGLU activation, an inverse FWHT, and wavelet reconstruction. Combined with expanded MLPs and sparse product-key memory, this yields a model with O(n log n) scaling in sequence length. With 23.8 PPL on WikiText-103, WaveletLM beats both GPT-2 Medium, which was trained on 80× more data, and Transformer-XL Standard, which uses recurrence to extend its effective context. It is undertrained and underregularized due to budget constraints, so there is much room for development and improvement. I invite anyone who is curious to examine the model, test it out, and extend its capabilities further. All code and weights are fully open source, and a PG-19 run will be completed in 2-3 days. Generations can be done in 4-...