Last updated: March 2025

Stages of Compilation in C: A Deep Dive Into All 4 Phases (With Real Output)

By CoodeVerse Editorial Team ⏱ 11 min read

When you run gcc hello.c -o hello, it looks like one operation. In reality, GCC silently invokes four separate programs in sequence, each one transforming your code one step closer to something a CPU can execute. Understanding these stages turns mysterious errors into diagnosable problems and opens up powerful debugging techniques most developers never use.

This guide goes deep on each stage — what it does, what its output looks like, what can go wrong, and how to inspect the intermediate files yourself.

📝
Stage 1
Preprocessing
cpp
hello.c
→ hello.i
⚙️
Stage 2
Compilation
cc1
hello.i
→ hello.s
🔩
Stage 3
Assembly
as
hello.s
→ hello.o
🔗
Stage 4
Linking
ld
hello.o + libc
→ hello

The Source File We'll Trace Through Every Stage

We'll use a simple but representative C program throughout this guide. It uses a macro, a header include, a function, and a library call — enough to show something interesting at each stage.

demo.c — our example source fileC
#include <stdio.h>
#include <math.h>

#define SQUARE(x)  ((x) * (x))

double circle_area(double r) {
    return 3.14159 * SQUARE(r);
}

int main() {
    double r = 5.0;
    printf("Area = %.2f\n", circle_area(r));
    printf("sqrt(r) = %.4f\n", sqrt(r));
    return 0;
}

To compile this fully: gcc -Wall -g demo.c -o demo -lm

Stage 1

Preprocessing

demo.c cpp (C preprocessor) demo.i

The preprocessor (cpp) runs first and handles every line beginning with #. It does not understand C syntax — it operates purely on text, performing three kinds of transformation:

1a — File inclusion (#include)

Every #include directive is replaced with the complete text content of the referenced file. #include <stdio.h> inserts the entire contents of stdio.h from the system include path. Because header files include other header files, a single include can expand to hundreds or thousands of lines.

inspect the preprocessed outputShell
gcc -E demo.c -o demo.i   # Stop after preprocessing
wc -l demo.i
    892 demo.i              # 12 lines of source → 892 lines after expansion
head -5 demo.i
# 1 "/usr/include/stdio.h" 1 3 4
extern int printf(const char *__restrict __format, ...);
Why this matters for debugging: If a macro is producing unexpected code, run gcc -E and search the .i file for the macro name. You'll see exactly what the preprocessor substituted — which is often very different from what you intended.

1b — Macro expansion (#define)

Every occurrence of a macro name is replaced with its defined value. In our example, SQUARE(r) becomes ((r) * (r)) — the parentheses in the definition are there to prevent operator precedence bugs when the argument is an expression.

macro expansion example — before and afterC
/* Source */
#define SQUARE(x)  ((x) * (x))
return 3.14159 * SQUARE(r);

/* After preprocessing — SQUARE(r) expanded */
return 3.14159 * ((r) * (r));

1c — Conditional compilation (#ifdef / #if)

The preprocessor includes or excludes entire blocks of code based on whether a symbol is defined. This is used for platform-specific code, debug builds, and feature flags.

conditional compilation exampleC
#ifdef DEBUG
    printf("debug: r = %.2f\n", r);   // Included only if -DDEBUG passed to gcc
#endif

/* Compile with debug output: */
gcc -DDEBUG demo.c -o demo
/* Compile without: */
gcc demo.c -o demo
Common preprocessing error: No such file or directory on a #include line means the header file path is wrong or the library's development package isn't installed. On Linux, install the dev package: e.g. sudo apt install libmath-dev (for custom libraries, not the standard math library).
Stage 2

Compilation (C → Assembly)

demo.i cc1 (C compiler proper) demo.s

The compiler proper (cc1 inside GCC) takes the preprocessed C text and translates it into assembly language — a human-readable representation of CPU instructions specific to the target architecture. On a modern x86-64 Linux machine, the output uses AT&T syntax assembly.

This is where the most important work happens: type checking, syntax validation, optimization passes, and code generation. All compiler warnings (-Wall, -Wextra) are generated here.

inspect the assembly outputShell
gcc -S -O0 demo.c -o demo.s   # -O0 = no optimization, easier to read
cat demo.s
demo.s — x86-64 assembly output (excerpt)x86-64 Assembly
        .globl  circle_area
        .type   circle_area, @function
circle_area:
        pushq   %rbp
        movq    %rsp, %rbp
        movsd   %xmm0, -8(%rbp)       # store parameter r
        movsd   -8(%rbp), %xmm1       # load r
        mulsd   -8(%rbp), %xmm1       # r * r  (SQUARE macro expanded)
        movsd   .LC0(%rip), %xmm0     # load 3.14159
        mulsd   %xmm1, %xmm0          # 3.14159 * (r*r)
        popq    %rbp
        ret
.LC0:
        .long   1374389535            # 3.14159 as IEEE 754 double
        .long   1074340347
The Compiler Explorer trick: Paste any C function into godbolt.org and see the assembly output in real time. Try changing -O0 to -O2 and watch how the compiler eliminates unnecessary instructions. This is the fastest way to understand what your code actually costs at the CPU level.

What the compiler checks at this stage

All of these errors and warnings come from Stage 2:

Stage 3

Assembly (Assembly → Object Code)

demo.s as (GNU assembler) demo.o

The assembler (as) converts the assembly text into binary machine code — the actual zeros and ones that the CPU executes. The output is an object file (.o on Linux/macOS, .obj on Windows).

An object file is not yet runnable. It is a self-contained binary for one .c file, but it still has holes — references to functions like printf and sqrt that are defined in other files or libraries. These are called unresolved external references.

create and inspect the object fileShell
gcc -c demo.c -o demo.o          # Stop after assembly
file demo.o
demo.o: ELF 64-bit LSB relocatable, x86-64, not stripped

nm demo.o                          # View the symbol table
nm demo.o — symbol table output
AddressTypeSymbol nameMeaning
0000000000000000Tcircle_areadefined in this file (Text)
0000000000000046Tmaindefined in this file (Text)
UprintfUndefined — needs linker
UsqrtUndefined — needs linker

The symbol table is the key interface between object files. T symbols are defined here (the linker can use them). U symbols are undefined — the linker must find them in another object file or library, or it will report undefined reference.

Practical use: nm is invaluable for diagnosing link errors. If you get "undefined reference to foo", run nm *.o | grep foo. If nothing appears, you forgot to compile the file containing foo. If it appears as U in every file, no file defines it — you're missing a library or forgot to write the function body.
disassemble the object file to verify machine codeShell
objdump -d demo.o               # Disassemble to verify machine code
0000000000000000 <circle_area>:
   0:  55                      push   %rbp
   1:  48 89 e5                mov    %rsp,%rbp
   4:  f2 0f 11 45 f8          movsd  %xmm0,-0x8(%rbp)
Stage 4

Linking

demo.o + libc.a / libm.a ld (GNU linker) demo

The linker (ld) combines one or more .o object files with library files and resolves every unresolved symbol reference. For each U symbol in the symbol tables, the linker searches the provided libraries for a matching T definition and connects the call site to that implementation.

The output is a complete, self-describing executable — on Linux an ELF (Executable and Linkable Format) binary, on macOS a Mach-O binary, on Windows a PE (Portable Executable) file.

linking — GCC orchestrates ld automaticallyShell
# Full build — GCC runs ld internally with correct flags
gcc demo.o -o demo -lm

# Verify all symbols are resolved in the final binary
nm demo | grep " U "
# Should produce no output — all undefined refs resolved

# Show which shared libraries the executable needs at runtime
ldd demo
    linux-vdso.so.1 (0x00007ffd...)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
The most common linking error — and its fix: undefined reference to 'sqrt' even though you included <math.h>. Including a header only gives the compiler the function declaration. The linker still needs the function implementation, which lives in libm. Fix: add -lm at the end of your GCC command: gcc demo.c -o demo -lm. The -l flag must come after the source files that use it.

Static vs Dynamic Linking

The linker can connect your program to library code in two fundamentally different ways. Understanding the difference explains why some executables are large self-contained files while others are tiny but require system libraries to be installed.

static vs dynamic linking commandsShell
# Default: dynamic linking
gcc demo.c -o demo -lm
ls -lh demo
-rwxr-xr-x  16K  demo

# Static linking: copies all libraries into the binary
gcc -static demo.c -o demo_static -lm
ls -lh demo_static
-rwxr-xr-x 868K  demo_static   # ~54x larger

Multi-File Compilation and Separate Compilation

Real C projects are split across many .c files — each one compiled to its own .o object file, then all linked together. This is called separate compilation and it is essential for large codebases: changing one file only requires recompiling that one file, not the entire project.

multi-file project — correct buildShell
# Project structure:
# main.c  — entry point, calls functions from utils.c and math_helpers.c
# utils.c — string/IO utilities
# math_helpers.c — custom math functions

# Option A: compile all at once (simplest)
gcc -Wall main.c utils.c math_helpers.c -o app -lm

# Option B: separate compilation (faster rebuilds)
gcc -Wall -c main.c         -o main.o
gcc -Wall -c utils.c        -o utils.o
gcc -Wall -c math_helpers.c -o math_helpers.o
gcc main.o utils.o math_helpers.o -o app -lm   # Link step

# After changing only utils.c, just recompile that file:
gcc -Wall -c utils.c -o utils.o
gcc main.o utils.o math_helpers.o -o app -lm
Use a Makefile for larger projects. A Makefile automates separate compilation — it tracks which files have changed and only recompiles the affected .c files. For a 100-file project, this can reduce a 30-second full rebuild to a 1-second incremental rebuild. Learning make is the natural next step after understanding separate compilation.

Frequently Asked Questions

The 4 stages are: (1) Preprocessing — expands #include and #define directives; output is a .i file. (2) Compilation — translates preprocessed C into architecture-specific assembly; output is a .s file. (3) Assembly — converts assembly into binary machine code; output is a .o object file. (4) Linking — combines object files and resolves external references using library files; output is the final executable.
The C preprocessor handles three tasks before the compiler sees the code: file inclusion (#include — inserts the full contents of header files), macro expansion (#define — replaces macro names with their defined values), and conditional compilation (#ifdef / #endif — includes or excludes blocks of code based on defined symbols). You can see its output with gcc -E file.c -o file.i.
An object file (.o) is the binary output of compiling a single .c file. It contains machine code for that file's functions, a symbol table listing what it defines (T symbols) and what it references but doesn't define (U symbols — unresolved). Object files cannot run on their own — the linker must combine multiple object files and resolve all U symbols before producing a runnable executable.
"Undefined reference" is a linker error — the linker found a call to a function but couldn't find where that function is implemented. Common causes: (1) using math functions like sqrt() without -lm; (2) splitting code across multiple .c files but only compiling one; (3) declaring a function prototype but never writing the body; (4) missing a custom library's -L path or -l flag. Diagnose with nm *.o | grep " U " to find all unresolved symbols.
Static linking (gcc -static) copies all library code into the executable at build time, producing a large but fully self-contained binary with no runtime dependencies. Dynamic linking (the default) stores a reference to a shared library (.so, .dylib, .dll) that the OS loads at runtime. Dynamic executables are much smaller and multiple programs can share one copy of a library in memory, but the library must be installed on the target system.
Separate compilation means compiling each .c source file into its own .o object file independently, then linking all the object files together. The key benefit is build speed: when you change one file, only that file needs to be recompiled — not the entire project. A project with 500 source files can go from a 5-minute full rebuild to a 2-second incremental rebuild when using separate compilation with a Makefile or build system.

Continue learning C on CoodeVerse

CoodeVerse Editorial Team

The CoodeVerse editorial team consists of experienced software developers and educators specialising in C, Python, Java, and web development. All content is technically reviewed and updated regularly.