Hello, world.
I am starting this notebook the way every program does: with a small, declarative “hello.” A low-stakes thing that says I am here, I intend to learn, and I am going to leave a trail.
For the last decade I’ve shipped software for a living. Now I want to climb underneath it. The plan is old-fashioned: start at the math, work up through the silicon, and only then, slowly, through the systems that train and serve modern models.
A small map of where I’m going
There are roughly four directions I want to push at, and they will take turns being interesting:
- Math. Linear algebra first, then multivariable calculus, then probability. Not a survey; I want to be able to derive, not just recognize.
- Software. Python by day, C and Rust on the weekends, with detours into CUDA when I am brave.
- Hardware. What an SM is, what an HBM stack costs you, why bandwidth and not flops is the budget that matters.
- Models. Eventually. First as a reader, then a re-implementer, then, hopefully, as someone who can have an opinion.
A taste of what posts will look like
To shake the site out, here is one math idea, one diagram, and one snippet of code. The three things I expect to lean on most.
Math
The gradient of a scalar field $f : \mathbb{R}^n \to \mathbb{R}$ is the vector of its partial derivatives. It points in the direction of steepest ascent, and its negation is the workhorse of every optimizer I will write this year:
Gradient descent is then nothing more than the rule $\mathbf{x}_{t+1} = \mathbf{x}_t - \eta \nabla f(\mathbf{x}_t)$, and most of training is a discussion about how to be clever with $\eta$.
Diagrams
Here is the mental model I keep returning to: the loop a learning system runs forever.
Everything else (Adam, batch norm, attention, scaling laws) is commentary on this picture.
For diagrams that need annotated cells with pointers, like “what’s on the stack right now,” I use a tiny declarative DSL that renders to a real hand-drawn SVG. rough.js under the hood, same engine as Excalidraw:
Just write the schema in JSON inside the figure; the page renders the
picture on load. For one-off illustrations I’ll also reach for
Excalidraw: draw it, Export → SVG, paste the SVG into a
<figure class="diagram">. The two together cover most of what
Crafting Interpreters and Game Programming Patterns do by hand.
Code
A tiny gradient descent in Python that I will reuse and abuse for months:
import numpy as np
def gradient_descent(grad, x0, lr=1e-2, steps=1_000):
"""Take `steps` along -∇f starting from x0."""
x = np.asarray(x0, dtype=float)
for _ in range(steps):
x -= lr * grad(x)
return x
# Minimize f(x, y) = (x - 3)^2 + (y + 1)^2
grad_f = lambda v: np.array([2 * (v[0] - 3), 2 * (v[1] + 1)])
print(gradient_descent(grad_f, [0.0, 0.0])) # → ~[3, -1]
What’s next
Three posts queued in my head:
- Why a matrix is a function, and what changes when you start believing it.
- A from-scratch autograd in 90 lines, with no magic.
- What an H100 actually does in a microsecond, a budget post.
If any of that sounds like fun, subscribe to the feed or just check back. The site is small and the writing will be slow, but consistent.
Onward.