4×4 Matrices & the MVP Pipeline

The longest day of the project — the heart of everything that follows.

📐 Not familiar with vectors or matrices? Read the Math Primer first — it covers everything you need for this section.

The goal

Up to this point, triangles exist directly in screen coordinates. To render real 3D objects, vertices need to be transformed from 3D world space all the way to screen pixels. That journey requires three transformations chained together: Model, View, and Projection — the MVP pipeline.

Why 4×4 matrices

3D transformations like rotation and scale are linear transformations — functions T where:

\[ T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}) \qquad T(\alpha\mathbf{u}) = \alpha T(\mathbf{u}) \]

Linear transformations can be represented as matrix multiplications. But translation is not linear — if you translate a vector by t, you get v + t, and T(u + v) = u + v + t ≠ (u + t) + (v + t). It doesn't satisfy the first property. Even more tellingly: a linear transformation must map the origin to the origin, and translation obviously doesn't.

So 3×3 matrices can't represent translation. The solution is homogeneous coordinates: add a 4th component w to every vector. For points, w = 1. For directions, w = 0.

\[ \mathbf{v}_{3D} = \begin{pmatrix} x \\ y \\ z \end{pmatrix} \quad \rightarrow \quad \mathbf{v}_{4D} = \begin{pmatrix} x \\ y \\ z \\ 1 \end{pmatrix} \]

With this extra dimension, translation can be encoded in the 4th column of a 4×4 matrix — and now every transformation, including translation, is a matrix multiplication. That's why 4×4.

// Matrix-matrix multiplication
Mat4 Mat4::operator*(const Mat4& other) const {
    Mat4 result{};
    for (int i = 0; i < 4; i++)
        for (int j = 0; j < 4; j++)
            for (int k = 0; k < 4; k++)
                result.m[i*4+j] += m[i*4+k] * other.m[k*4+j];
    return result;
}

// Matrix-vector multiplication
Vec4 Mat4::operator*(const Vec4& v) const {
    return {
        m[0]*v.x + m[1]*v.y + m[2]*v.z + m[3]*v.w,
        m[4]*v.x + m[5]*v.y + m[6]*v.z + m[7]*v.w,
        m[8]*v.x + m[9]*v.y + m[10]*v.z + m[11]*v.w,
        m[12]*v.x + m[13]*v.y + m[14]*v.z + m[15]*v.w
    };
}

Model matrix — transforming the object

The model matrix moves an object from its own local coordinate system into the world space — the shared coordinate system where all objects in the scene coexist. It's built by combining three individual transformations.

Scale

Scale multiplies each coordinate by a factor. For uniform scale k, the basis vectors just get stretched:

\[ S = \begin{pmatrix} k & 0 & 0 & 0 \\ 0 & k & 0 & 0 \\ 0 & 0 & k & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \]

Multiplying a point (x, y, z, 1) by S gives (kx, ky, kz, 1). w stays 1.

inline Mat4 scale_matrix(float k) {
    return { k,0,0,0,  0,k,0,0,  0,0,k,0,  0,0,0,1 };
}

Translation

Translation adds an offset to each coordinate. With homogeneous coordinates, it fits cleanly in the 4th column:

\[ T = \begin{pmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{pmatrix} \]

Multiplying (x, y, z, 1) by T gives (x+tx, y+ty, z+tz, 1) — exactly the translation we want. This is why w = 1 for points: it activates the translation column. For direction vectors (w = 0), translation is correctly ignored.

inline Mat4 move_matrix(float x, float y, float z) {
    return { 1,0,0,x,  0,1,0,y,  0,0,1,z,  0,0,0,1 };
}

Rotation

Rotation is where the geometry becomes interesting. To build a rotation matrix, start with the 2D case: rotating the basis vector e₁ = (1, 0) by angle θ gives (cos θ, sin θ). That's just the definition of cosine and sine on the unit circle. The rotated e₂ = (0, 1) gives (-sin θ, cos θ).

Those rotated basis vectors become the columns of the rotation matrix. The interactive demo below shows exactly this:

● Interactive drag θ to explore

A matrix is a compact way of expressing what happens to each component of a vector separately — all at once. The animation above shows exactly this: rotating a point by θ produces two equations:

\[ x' = x \cdot \cos\theta - y \cdot \sin\theta \qquad y' = x \cdot \sin\theta + y \cdot \cos\theta \]

A matrix-vector multiplication works by dot-producting each row with the input vector. So if we pack those equations into rows:

Row 1 = [cos θ, −sin θ, 0, 0] · [x, y, z, 1] = x·cos θ − y·sin θ = x' ✓
Row 2 = [sin θ, cos θ, 0, 0] · [x, y, z, 1] = x·sin θ + y·cos θ = y' ✓
Row 3 = [0, 0, 1, 0] · [x, y, z, 1] = z (unchanged) ✓

The matrix IS those equations — just packed into rows. For rotation around Z:

\[ R_z(\theta) = \begin{pmatrix} \cos\theta & -\sin\theta & 0 & 0 \\ \sin\theta & \cos\theta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \]

The first column is where X ends up, the second is where Y ends up. Z stays untouched. The same logic applies to Rx (rotating in the YZ plane) and Ry (rotating in the XZ plane), each leaving their respective axis fixed. The full model rotation is Rz · Ry · Rx — applied right to left.

inline Mat4 roty_matrix(float angle) {
    return {
         cosf(angle), 0, sinf(angle), 0,
         0,           1, 0,           0,
        -sinf(angle), 0, cosf(angle), 0,
         0,           0, 0,           1,
    };
}

Putting the model matrix together

Scale first, then rotate, then translate — order matters because matrix multiplication is not commutative:

\[ M = T \cdot R_z \cdot R_y \cdot R_x \cdot S \]

const Mat4 modelMatrix =
    move_matrix(move.x, move.y, move.z) *
    rotz_matrix(rotz) * roty_matrix(roty) * rotx_matrix(rotx) *
    scale_matrix(scale);

View matrix — camera space

The view matrix transforms the world so that the camera sits at the origin, looking down the −Z axis. Everything else moves — not the camera. Why −Z? By OpenGL convention, the camera looks "into" the screen, which is the negative Z direction.

● Interactive toggle to see the transformation

Imagine the camera is at (−5, 0, 0) looking at the origin. We want to know where the world origin ends up in camera space. To place the camera at (−5, 0, 0), we translated it by −5 along X. To undo that for any world point, we do the opposite: translate by +5. The world origin goes from (0, 0, 0) to (5, 0, 0) — 5 units in front of the camera, which makes sense.

The pattern: whatever we did to place the camera in the world, we do the opposite to transform world points into camera space. And the matrix that does the opposite of another matrix is its inverse.

In practice, placing the camera in the world involves two steps: rotate it to face the right direction, then translate it to its position. So the full camera-to-world transform is T × R, and the view matrix is its inverse: (T × R)⁻¹.

Step 1 — Compute the camera's three axes in world space.

Forward (where the camera looks):

\[ \mathbf{f} = \text{normalize}(\text{center} - \text{eye}) \]

Right — perpendicular to forward. We use an artificial world up = (0, 1, 0) as a reference, since we don't know the camera's up yet but we know the world's Y is "up":

\[ \mathbf{r} = \text{normalize}(\mathbf{f} \times \mathbf{up}_{world}) \]

Camera's true up — perpendicular to both:

\[ \mathbf{u} = \mathbf{r} \times \mathbf{f} \]

Step 2 — Build the camera-to-world transform.

Placing the camera in the world takes two steps: rotate first, then translate to eye:

\[ M_{cam} = T_{eye} \times R_{cam} \]

The translation matrix T simply puts the camera position in the 4th column:

\[ T_{eye} = \begin{pmatrix} 1 & 0 & 0 & \text{eye}_x \\ 0 & 1 & 0 & \text{eye}_y \\ 0 & 0 & 1 & \text{eye}_z \\ 0 & 0 & 0 & 1 \end{pmatrix} \]

The rotation matrix R answers: where does each camera axis point in world space?

The camera's X axis is the standard basis vector e₁ = (1, 0, 0). When you multiply a matrix R by e₁, the result is always the first column of R — because all other terms cancel out. So the first column of R directly controls where the camera's X axis ends up in the world. If we want it to point in direction r, we set the first column to r. The same logic applies to the camera's Y axis (e₂ → second column = u) and Z axis (e₃ → third column = f):

\[ R_{cam} = \begin{pmatrix} r_x & u_x & f_x & 0 \\ r_y & u_y & f_y & 0 \\ r_z & u_z & f_z & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \]

Step 3 — Invert.

\[ V = (T_{eye} \times R_{cam})^{-1} = R_{cam}^{-1} \times T_{eye}^{-1} \]

T⁻¹ is straightforward: same matrix, but negate the translation (replace eye with −eye).

R⁻¹ is elegant: since r, u, f are orthonormal (unit vectors, mutually perpendicular), R is orthogonal. For orthogonal matrices, the inverse equals the transpose — rows and columns swap. The columns of R (which were r, u, f) become its rows.

Multiplying R^T × T⁻¹ and combining gives the final view matrix:

\[ V = \begin{pmatrix} r_x & r_y & r_z & -(\mathbf{r} \cdot \text{eye}) \\ u_x & u_y & u_z & -(\mathbf{u} \cdot \text{eye}) \\ f_x & f_y & f_z & -(\mathbf{f} \cdot \text{eye}) \\ 0 & 0 & 0 & 1 \end{pmatrix} \]

inline Mat4 lookAt_matrix(Vec3 eye, Vec3 center) {
    Vec3 forward = (center - eye).normalize();
    Vec3 right   = forward.cross(temp_up).normalize();
    Vec3 up      = right.cross(forward);

    return {
        right.x,   right.y,   right.z,   -(right * eye),
        up.x,      up.y,      up.z,      -(up * eye),
        forward.x, forward.y, forward.z, -(forward * eye),
        0,         0,         0,          1
    };
}

Projection matrix — perspective

The projection matrix maps the camera's frustum — the truncated pyramid of everything visible, bounded by a near plane and a far plane — into clip space. Clip space is an intermediate 4D space that keeps the original depth (w) intact before dividing by it. This is useful for clipping: a point is inside the view frustum if −w ≤ x ≤ w, −w ≤ y ≤ w, and −w ≤ z ≤ w — no division needed. After clipping, we divide by w to get NDC.

How focal length relates to FOV. Before building the matrix, we need a way to scale points onto the image plane. The demo below shows the geometry:

● Interactive drag FOV to see how the image plane scales

Focal length from FOV. In the image plane, we want points to map to NDC y ∈ [-1, 1]. A point at depth z with height y projects to:

\[ y_{\text{proj}} = \frac{f \cdot y}{z} \]

For this to land in [-1, 1], we want f to make the half-height of the frustum at distance f equal to 1. That gives:

\[ \tan\left(\frac{\text{FOV}}{2}\right) = \frac{1}{f} \quad \Rightarrow \quad f = \frac{1}{\tan(\text{FOV}/2)} \]

X gets divided by the aspect ratio to correct for non-square screens.

Storing z in w. Matrix multiplication is a linear operation — it can only add and scale values. Dividing by z (which is what perspective requires) is non-linear: you can't express it as a matrix. The solution: store z in the w component using the 1 in position [3,2], so that after the multiplication w = z. Then we do the division manually — that's the perspective divide. The third row has two values left to determine, A and B:

\[ P = \begin{pmatrix} f/\text{aspect} & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & A & B \\ 0 & 0 & 1 & 0 \end{pmatrix} \]

Multiplying this by a vertex (x, y, z, 1) gives clip space (fx/aspect, fy, Az+B, z). After dividing by w = z, the z component becomes:

\[ z_{\text{ndc}} = \frac{A \cdot z + B}{z} \]

We want this to map near → −1 and far → +1:

\[ \frac{A \cdot \text{near} + B}{\text{near}} = -1 \quad \Rightarrow \quad A \cdot \text{near} + B = -\text{near} \quad (1) \]

\[ \frac{A \cdot \text{far} + B}{\text{far}} = +1 \quad \Rightarrow \quad A \cdot \text{far} + B = \text{far} \quad (2) \]

Subtracting (1) from (2): A(far − near) = far + near, so:

\[ A = \frac{\text{near}+\text{far}}{\text{far}-\text{near}} \]

Substituting back into (1):

\[ B = \frac{2 \cdot \text{near} \cdot \text{far}}{\text{near}-\text{far}} \]

Replacing A and B, the full projection matrix is:

\[ P = \begin{pmatrix} f/\text{aspect} & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & \frac{n+f}{f-n} & \frac{2nf}{n-f} \\ 0 & 0 & 1 & 0 \end{pmatrix} \]

The last row stores the original z into w — that's what makes the perspective divide work.

● Interactive step through how the frustum transforms into clip space

inline Mat4 projection_matrix(float fov, float aspect, float near, float far) {
    const float f = 1.0f / tanf(radianes(fov) / 2.0f);
    return {
        f/aspect, 0,  0,                         0,
        0,        f,  0,                         0,
        0,        0,  (near+far)/(far-near),     2*(near*far)/(near-far),
        0,        0,  1,                         0
    };
}

Perspective divide & viewport transform

After multiplying by MVP, each vertex is in clip space — a 4D vector (x, y, z, w) where w holds the original depth. Dividing everything by w gives NDC (Normalized Device Coordinates), where every visible point is in the cube [-1, 1]³:

\[ \text{NDC} = \left(\frac{x_c}{w_c},\; \frac{y_c}{w_c},\; \frac{z_c}{w_c}\right) \]

This division is what produces perspective — far objects have larger w, so dividing by it makes them appear smaller.

Vec3 ndc = {clip.x/clip.w, clip.y/clip.w, clip.z/clip.w};

The viewport transform maps NDC to pixel coordinates. NDC x lives in [−1, 1]. Adding 1 shifts it to [0, 2]. Dividing by 2 gives [0, 1]. Multiplying by (W−1) gives [0, W−1] — exactly the screen pixel range. Y is flipped because NDC +Y is up but screen +Y is down: we use (1 − ndc_y) instead of (1 + ndc_y) to flip the axis.

\[ \text{screen}_x = \frac{(1 + \text{ndc}_x)(W-1)}{2} \qquad \text{screen}_y = \frac{(1 - \text{ndc}_y)(H-1)}{2} \]

● Interactive see how NDC coordinates map to screen pixels

Vec2 screen = {
    (1 + ndc.x) * (WIDTH  - 1) / 2,
    (1 - ndc.y) * (HEIGHT - 1) / 2  // Y flipped: NDC +Y is up, screen +Y is down
};

Putting it all together

The MVP matrix chains all three transformations. Applied right to left — scale and rotate first, then position in the world, then to camera space, then project:

\[ \text{MVP} = P \cdot V \cdot M \]

const Mat4 MVP =
    projection_matrix(fov, (float)WIDTH/HEIGHT, near, far) *     // P
    lookAt_matrix(eye, center) *                                  // V
    move_matrix(move.x, move.y, move.z) *                        // M: translate
    rotz_matrix(rotz) * roty_matrix(roty) * rotx_matrix(rotx) * // M: rotate
    scale_matrix(scale);                                          // M: scale

// Every vertex goes through the full pipeline:
Vec4 clip   = MVP * Vec4{ver.x, ver.y, ver.z, 1.0f};
Vec3 ndc    = {clip.x/clip.w, clip.y/clip.w, clip.z/clip.w};
Vec2 screen = {(1+ndc.x)*(WIDTH-1)/2, (1-ndc.y)*(HEIGHT-1)/2};

In the rasterizer, this runs for every vertex of every triangle, every frame. The screen-space positions go into the bounding box and edge function tests from the previous sections — the pipeline connects directly to the pixel loop.

Bugs

BUG Cube moves in the direction it rotates — MVP applied in reverse

What happened Moving the cube in Z sent it flying in the wrong direction depending on its rotation. Everything was distorted and unpredictable.

Cause The MVP matrix was being applied in the wrong order — projection was the first transformation applied, not the last. Since matrix multiplication reads right to left, the code had it completely backwards.

Fix Reverse the multiplication order: P × V × M. The vertex experiences scale, then rotation, then translation, then view, then projection — in that sequence.

BUG Epileptic cube — random color every frame, interior faces visible

What happened The cube was flickering with completely random colors every frame, and the interior faces were visible through the exterior.

Cause Two problems at once. First, a random color was being assigned per triangle per frame — no fixed color, just noise. Second, the z-buffer depth values had the wrong sign: −w was being stored instead of +w, so the depth test was inverted and back faces were drawing over front faces.

Fix Assign a fixed color per face. Fix the sign of w in the depth value — store +w so the depth test correctly picks the nearest surface.

Epileptic cube

Random color per triangle per frame + no backface culling = chaos.

BUG Abstract art — renderer not cleared between frames

What happened Rotating and moving the cube produced a beautiful abstract painting — every frame's output was layered on top of the previous ones.

Cause The framebuffer and z-buffer were being cleared each frame, but SDL_RenderClear was not being called. SDL's renderer kept the previous frame's content painted on screen underneath the new one.

Fix Add SDL_RenderClear(renderer) at the start of every frame. One line.

Abstract art bug

Not clearing the renderer — every frame painted on top of the last.

Result

Cube rotating correctly

The cube fully colored and rotating — model, view, and projection all working together.

With the MVP pipeline working, 3D objects can be positioned, oriented, and projected correctly onto the screen. The next step is loading real objects from files — the OBJ parser.

← Z-Buffer OBJ Parser →