Conventions

Figures throughout this book use a consistent visual language for tensor shapes, dimensions, and components. This page collects those conventions in one place so you can decode any diagram at a glance.

Dimension names and abbreviations

Tensors in figures are labeled with their shapes using the single-letter abbreviations below. The full names are shown, along with abbreviations used elsewhere.

Table 1: Standard dimension symbols used in figures and text.

Symbol	Full name	Example values (Llama 3 70B)
B	Batch size	1–64
D	Model dimension (\(d_\text{model}\))	8192
\(\textcolor{#2563EB}{d_K}\)	Head dimension, usually \(\textcolor{#2563EB}{d_K} = \textcolor{#1E40AF}{D} / \textcolor{#A855F7}{H}\)	128
F	FFN intermediate dimension (\(d_\text{ff}\))	28672 (SwiGLU)
H	Number of attention heads (\(n_\text{heads}\))	64
\(\textcolor{#C084FC}{H_\text{KV}}\)	Number of KV heads (\(n_\text{kv}\))	8 (with GQA)
L	Number of layers	80
S	Sequence length	Prefill: prompt length (e.g., 2048); Decode: 1
V	Vocabulary size	128000

Tensor edge colors

Tensor rectangles use colored edges to indicate which dimension runs along that axis. Top/bottom edges represent one dimension; left/right edges represent another. This makes it possible to see at a glance how dimensions flow through a computation.

Edge colors encode tensor dimensions.
Edge	Dimension
	Model dimension (D)
	Head / Q-K dimension (\(d_K\))
	FFN intermediate dimension (F)
	Number of attention heads (H)
	Number of KV heads (H_KV)
	Number of layers (L)
	Sequence length (S)
	Vocabulary (V)
	Default / other

Tensor fill colors

The fill color of a tensor rectangle indicates its role in the computation.

Fill colors encode tensor roles.
Fill	Meaning
	Input / activations (neutral)
	Weight matrices
	Queries (Q)
	Keys (K)
	Values (V)
	Attention scores / weights
	KV cache entries

Component colors

Architectural block diagrams use a separate set of fill colors to distinguish transformer components.

Component colors in architectural block diagrams.
Color	Component
	Embedding layers
	Layer normalization / Add & Norm
	Self-attention blocks
	Feed-forward blocks
	Linear projections
	Softmax

Reading a tensor data flow diagram

To illustrate how these conventions combine, here is how a typical tensor rectangle should be read:

The top and bottom edge colors tell you the dimension that runs horizontally, the number of columns.
The left and right edge colors tell you the dimension that runs vertically, the number of rows.
The shape label (e.g., (S, D)) confirms the dimensions explicitly.
The fill color helps disambiguate what kind of tensor it is (query, key, weight, etc.).

When two tensors share an edge color on a matching side, it means those dimensions are compatible for the operation that connects them — a matrix multiply, a concatenation, or an elementwise operation. For matrix multiplcation, the number of columns on the lefthand tensor must match the number of rows on the righthand tensor. For elementwise operations, usually all dimensions must match, unless one of the dimensions is 1, and it is broadcast to match the other tensor. An sample diagram is explained in the Forward Pass Data Flow section.