𝒩(μ, σ²)
e⁻λᵗ sin ωt
Arjun Kocher
Writing
Implementations
Transformer
Multi-Head Attention
RoPE
ALiBi
Transformer-XL
RETRO
Compressive Transformer
miniGPT
SwiGLU
kNN-LM
GLU
Feedback Transformer
Notes
Twitter
Ask me anything
Independent AI Researcher
I study deep neural architectures and their underlying mathematics.
Start here:
RL Algorithm
momentum L =
—
adam L =
—
drag to orbit the landscape
Featured
Most read
Most liked
Notes
handwritten worksheets behind the essays
×
‹
›
1
/
1