Independent AI Researcher (Lambda). I study deep neural architectures, and their underlying mathematics.
A brief introduction to the seminal Transformer paper.
A groundbreaking training approach from DeepSeek.
A Brand New Optimizer from Kimi Moonshot.
How reasoning emerges in large language models.
LongCat ZigZag Attention.
Exclusive Self Attention.
A new learning paradigm.
Architecture behind Nested Learning.
Deep Delta Learning.
Recursive Language Models.
Understanding Manifold Constrained Hyper Connections.
Deriving Manifold Constrained Hyper Connections.
A Memory technique introduced by DeepSeek.
Understanding and Implementation.