Conventions

Figures throughout this book use a consistent visual language for tensor shapes, dimensions, and components. This page collects those conventions in one place so you can decode any diagram at a glance.

Dimension names and abbreviations

Tensors in figures are labeled with their shapes using the single-letter abbreviations below. The full names and abbreviations match those used in the text.

Table 1: Standard dimension symbols used in figures and text.
Symbol Full name Example values (Llama 3 70B)
B Batch size 1–64
D Model dimension (\(d_\text{model}\)) 8192
\(\textcolor{#2563EB}{d_K}\) Head dimension (\(D / H\)) 128
F FFN intermediate dimension (\(d_\text{ff}\)) 28672 (SwiGLU)
H Number of attention heads (\(n_\text{heads}\)) 64
\(\textcolor{#C084FC}{H_\text{KV}}\) Number of KV heads 8 (with GQA)
L Number of layers 80
S Sequence length Prefill: prompt length (e.g., 2048); Decode: 1
V Vocabulary size 128000

Tensor edge colors

Tensor rectangles use colored edges to indicate which dimension runs along that axis. Top/bottom edges represent one dimension; left/right edges represent another. This makes it possible to see at a glance how dimensions flow through a computation.

Edge colors encode tensor dimensions.
Edge Dimension
Model dimension (D)
Head / Q-K dimension (\(d_K\))
FFN intermediate dimension (F)
Number of attention heads (H)
Number of KV heads (HKV)
Number of layers (L)
Sequence length (S)
Vocabulary (V)
Default / other

Tensor fill colors

The fill color of a tensor rectangle indicates its role in the computation.

Fill colors encode tensor roles.
Fill Meaning
Input / activations (neutral)
Weight matrices
Queries (Q)
Keys (K)
Values (V)
Attention scores / weights
KV cache entries

Component colors

Architectural block diagrams use a separate set of fill colors to distinguish transformer components.

Component colors in architectural block diagrams.
Color Component
Embedding layers
Layer normalization / Add & Norm
Self-attention blocks
Feed-forward blocks
Linear projections
Softmax

Reading a data flow diagram

To illustrate how these conventions combine, here is how a typical tensor rectangle should be read:

  • The top and bottom edge colors tell you the dimension that runs horizontally, the number of columns.
  • The left and right edge colors tell you the dimension that runs vertically, the number of rows.
  • The shape label (e.g., (S, D)) confirms the dimensions explicitly.
  • The fill color helps disambiguate what kind of tensor it is (query, key, weight, etc.).

When two tensors share an edge color on a matching side, it means those dimensions are compatible for the operation that connects them — a matrix multiply, a concatenation, or an elementwise operation. For matrix multiplcation, the number of columns on the lefthand tensor must match the number of rows on the righthand tensor. For elementwise operations, usually all dimensions must match, unless one of the dimensions is 1, and it is broadcast to match the other tensor.