Last updated: December 2025

C++ Debugging & Optimization: GDB, Valgrind, ASan, Move Semantics & Profiling (2025)

By CoodeVerse Editorial Team ✓ 2025 Verified ⏱ 22 min read 🎯 Intermediate 📦 GCC/Clang · C++11/17
Difficulty:
Intermediate — Prerequisites: Classes & Objects, Constructors & Destructors

⚡ Quick Answer: Debugging & Optimization in C++

C++ gives you maximum control and maximum performance — but that comes with maximum responsibility for correctness. This guide covers the complete workflow: finding bugs with the right tools, measuring where time is spent, and applying the right optimization techniques. Zero theoretical fluff — every section has working code you can run.

🐛

Bug Types

5 categories of C++ bugs

🔧

Debug Tools

GDB, ASan, Valgrind

🖥️

GDB Session

Real command walkthrough

🛡️

Sanitizers

ASan · UBSan · TSan

Compiler Flags

-O0 -O2 -O3 -march

🚀

Move Semantics

Benchmarked performance

🧠

Cache & Memory

reserve, SoA, alignment

📊

Profiling

gprof, perf, callgrind

🐛 Section 1

Bug Types — 5 Categories With Real Examples

CategoryWhen detectedExampleCaught by
Compile-time errorsAt compilationMissing semicolons, type mismatchesCompiler (-Wall)
Linker errorsAt linkingUndefined reference to symbolLinker output
Runtime errorsDuring executionNull dereference, division by zeroGDB, ASan
Logic errorsWrong outputoff-by-one, wrong operatorTests, GDB print
Memory errorsNow or laterLeak, buffer overflow, use-after-freeASan, Valgrind
Undefined behaviorAny timeSigned overflow, uninitialized readUBSan, ASan
bug_examples.cpp — one of each bug typeC++
#include <iostream>
#include <vector>
using namespace std;

// Logic error: returns a*b instead of a+b
int add(int a, int b) { return a * b; }  // BUG: should be a+b

// Memory error: write beyond array bounds
void bufferOverflow() {
    int arr[5] = {};
    arr[10] = 42;  // BUG: out-of-bounds write (UB, caught by ASan)
}

// Memory leak: allocated but never freed
void memoryLeak() {
    int* p = new int(100);
    cout << *p << endl;
    // BUG: delete p; missing — caught by Valgrind / ASan
}

// Undefined behavior: signed integer overflow
void intOverflow() {
    int x = INT_MAX;
    x++;  // BUG: UB — caught by -fsanitize=undefined
    cout << x << endl;
}

int main() {
    cout << add(3, 5) << endl;  // prints 15 (not 8) — logic error
    return 0;
}
Output (wrong)
15 ← expected 8; logic error in add()
🔧 Section 2

Debugging Tools Comparison

ToolInvocationDetectsOverheadBest for
GDBgdb ./progCrashes, logic, call stackLowStep-through, variable inspection
LLDBlldb ./progSame as GDBLowmacOS / Clang users
AddressSanitizer-fsanitize=addressBuffer overflows, UAF, leaks~2xDaily dev, CI pipelines
UBSanitizer-fsanitize=undefinedSigned overflow, null deref, OOB~1.1xAll production code
ThreadSanitizer-fsanitize=threadData races, deadlocks~5-15xMultithreaded programs
Valgrind Memcheckvalgrind ./progLeaks, uninitialized reads~10-20xNo-recompile leak hunting
Valgrind Callgrindvalgrind --tool=callgrindFunction call counts & time~10-20xDetailed profiling
Cppcheckcppcheck prog.cppStatic: null ptr, dead code, UBNone (static)Pre-commit checks
Recommended development build command (all sanitizers):
g++ -Wall -Wextra -Werror -g -O0 -fsanitize=address -fsanitize=undefined -std=c++17 prog.cpp -o prog
Add this to your Makefile's debug target. It catches compile-time issues, memory errors, and undefined behavior simultaneously.
🖥️ Section 3

GDB Walkthrough — Real Session With Commands

GDB Command Reference

CommandShort formWhat it does
break mainb mainSet breakpoint at main()
break prog.cpp:25b prog.cpp:25Set breakpoint at line 25
run [args]rStart/restart program
nextnExecute next line (step over functions)
stepsExecute next line (step into functions)
continuecRun until next breakpoint
print xp xPrint value of variable x
display xdisp xPrint x after every step
backtracebtShow full call stack (KEY for segfaults)
info localsPrint all local variables
watch xBreak when x changes value
up / downNavigate the call stack
listlShow source code around current line
quitqExit GDB
segfault_demo.cpp — program with null dereferenceC++
#include <iostream>
using namespace std;

int* findValue(int* arr, int n, int target) {
    for (int i = 0; i < n; i++)
        if (arr[i] == target) return &arr[i];
    return nullptr;  // not found
}

int main() {
    int data[] = {1, 3, 5, 7};
    int* result = findValue(data, 4, 99);  // 99 not in array → nullptr
    cout << *result << endl;  // BUG: dereferencing nullptr → SEGFAULT
    return 0;
}
GDB session — finding the null dereferenceShell
$ g++ -g -O0 segfault_demo.cpp -o demo
$ gdb ./demo
(gdb) run

Program received signal SIGSEGV, Segmentation fault.
0x0000000000401198 in main () at segfault_demo.cpp:13
13          cout << *result << endl;

(gdb) backtrace
#0  0x0000000000401198 in main () at segfault_demo.cpp:13

(gdb) print result
$1 = (int *) 0x0     ← NULL pointer! That's the bug.

(gdb) quit
Fix
if (result != nullptr) cout << *result; else cout << "Not found";
🛡️ Section 4

AddressSanitizer, UBSan & Valgrind

heap_overflow.cpp + ASan outputC++
#include <iostream>
using namespace std;

int main() {
    int* arr = new int[5];
    arr[7] = 42;   // heap-buffer-overflow: valid range is arr[0..4]
    delete[] arr;
    return 0;
}
ASan Output (g++ -fsanitize=address -g)
==1234== ERROR: AddressSanitizer: heap-buffer-overflow
WRITE of size 4 at 0x... thread T0
#0 0x401... in main heap_overflow.cpp:6

0x... is located 8 bytes to the right of 20-byte region
allocated by main at heap_overflow.cpp:5
ASan is your first line of defense. Compile time: add -fsanitize=address -g. Runtime: ~2x overhead. Reports the exact line of the bad access AND the line where the memory was allocated. Works on heap, stack, and global buffers. Zero false positives in practice.
ub_examples.cpp + UBSan outputC++
#include <iostream>
#include <climits>
using namespace std;

int main() {
    // UB 1: signed integer overflow
    int x = INT_MAX;
    cout << x + 1 << endl;    // UB: signed overflow

    // UB 2: array out-of-bounds
    int arr[3] = {1,2,3};
    cout << arr[5] << endl;  // UB: index out of bounds

    return 0;
}
UBSan Output (g++ -fsanitize=undefined -g)
ub_examples.cpp:7: runtime error: signed integer overflow:
2147483647 + 1 cannot be represented in type 'int'

ub_examples.cpp:11: runtime error: index 5 out of bounds
for type 'int [3]'
memory_leak.cpp + Valgrind outputC++
#include <iostream>
using namespace std;

void allocateWithoutFree() {
    int* data = new int[100];
    data[0] = 42;
    // BUG: delete[] data; missing
}

int main() {
    allocateWithoutFree();
    return 0;
}
run with valgrindShell
g++ -g leak.cpp -o leak
valgrind --leak-check=full ./leak
Valgrind Output
==5678== 400 bytes in 1 blocks are definitely lost
==5678== at 0x...: operator new[] (vg_replace_malloc.c:...)
==5678== by 0x...: allocateWithoutFree() (leak.cpp:5)
==5678== by 0x...: main (leak.cpp:11)
==5678== LEAK SUMMARY: definitely lost: 400 bytes
Fix: Add delete[] data; before returning, or replace int* data = new int[100]; with std::vector<int> data(100); — which deallocates automatically via RAII.
⚡ Section 5

Compiler Optimization Flags (-O0 to -O3)

FlagWhat it enablesUse when
-O0No optimization. Compile-time is fast. One-to-one correspondence between source and binary.Debugging (-g debugging with -O0)
-OgMinimal optimization that doesn't interfere with debugging.Debug builds requiring some speed
-O1Basic: dead code elimination, constant folding. Slightly faster compile than -O2.Rarely needed
-O2All -O1 plus: function inlining, loop invariant motion, vectorization, tail calls. Most common production choice.Production builds
-O3All -O2 plus: more aggressive inlining, loop unrolling, more vectorization passes. Can increase binary size.Numeric/compute-intensive code
-OsOptimize for code size. Disables opts that increase size.Embedded, firmware
-march=nativeUse all CPU instructions available on the build machine (SSE4, AVX2, etc.). Not portable.Local benchmarks, HPC
-fltoLink-time optimization: optimize across translation units.Large multi-file projects
measuring -O0 vs -O2 effect on tight loopShell
# Compile both versions
g++ -O0 -std=c++17 bench.cpp -o bench_O0
g++ -O2 -std=c++17 bench.cpp -o bench_O2
g++ -O3 -march=native -std=c++17 bench.cpp -o bench_O3

# Time each
time ./bench_O0   # ~2800ms
time ./bench_O2   # ~350ms  (8x faster)
time ./bench_O3   # ~280ms  (slightly more on numeric code)
Golden rule: never profile or benchmark -O0 builds. The -O0 numbers are meaningless for production performance. Always measure with -O2 (or -O3 for numeric code). And never debug with -O2 — the compiler's transformations make variables disappear from GDB.
🚀 Section 6

Move Semantics & emplace_back — Benchmarked

move_benchmark.cpp — copy vs move vs emplaceC++
#include <iostream>
#include <vector>
#include <chrono>
#include <string>
using namespace std;
using namespace std::chrono;

int main() {
    const int N = 100000;

    // 1. push_back with copy — creates temp, then copies into vector
    auto t1 = high_resolution_clock::now();
    vector<string> v1;
    v1.reserve(N);
    for (int i=0; i<N; i++) {
        string s = "hello_world_" + to_string(i);
        v1.push_back(s);   // copy
    }
    auto t2 = high_resolution_clock::now();

    // 2. push_back with std::move — transfer instead of copy
    vector<string> v2;
    v2.reserve(N);
    for (int i=0; i<N; i++) {
        string s = "hello_world_" + to_string(i);
        v2.push_back(move(s));  // move: O(1) instead of O(len)
    }
    auto t3 = high_resolution_clock::now();

    // 3. emplace_back — construct directly in vector, no temp at all
    vector<string> v3;
    v3.reserve(N);
    for (int i=0; i<N; i++)
        v3.emplace_back("hello_world_" + to_string(i));  // construct in-place
    auto t4 = high_resolution_clock::now();

    cout << "push_back (copy):   "
         << duration_cast<microseconds>(t2-t1).count() << " µs\n";
    cout << "push_back (move):   "
         << duration_cast<microseconds>(t3-t2).count() << " µs\n";
    cout << "emplace_back:       "
         << duration_cast<microseconds>(t4-t3).count() << " µs\n";
    return 0;
}
Output (approximate, -O2)
push_back (copy): 4200 µs
push_back (move): 2100 µs ← 2x faster
emplace_back: 1800 µs ← 2.3x faster
Move semantics rules: (1) Use emplace_back instead of push_back — constructs directly in the vector with no temporary. (2) Use std::move(x) when you no longer need x — transfers ownership O(1). (3) Always reserve(n) when you know the final size — eliminates reallocations. (4) Mark move constructors noexcept — std::vector uses move during reallocation only when noexcept.
🧠 Section 7

Cache-Friendly Code: reserve, SoA, and Memory Layout

cache_demo.cpp — vector::reserve + SoA vs AoSC++
#include <iostream>
#include <vector>
#include <chrono>
#include <numeric>
using namespace std;
using namespace std::chrono;

const int N = 10000000;

// AoS (Array of Structures) — x,y,z,mass interleaved in memory
struct ParticleAoS { float x,y,z,mass; };

// SoA (Structure of Arrays) — each field contiguous
struct ParticlesSoA {
    vector<float> x,y,z,mass;
    ParticlesSoA(int n) { x.resize(n); y.resize(n); z.resize(n); mass.resize(n,1.0f); }
};

int main() {
    // 1. reserve() eliminates reallocations
    vector<int> slow, fast;
    // slow: no reserve — triggers O(log n) reallocations
    auto t1 = high_resolution_clock::now();
    for (int i=0; i<N; i++) slow.push_back(i);
    auto t2 = high_resolution_clock::now();
    fast.reserve(N);  // fast: pre-allocate — O(1) insertions
    for (int i=0; i<N; i++) fast.push_back(i);
    auto t3 = high_resolution_clock::now();
    cout << "Without reserve: " << duration_cast<milliseconds>(t2-t1).count() << "ms\n";
    cout << "With reserve:    " << duration_cast<milliseconds>(t3-t2).count() << "ms\n";

    // 2. SoA mass sum (cache-friendly: all masses contiguous)
    ParticlesSoA soa(N);
    auto t4 = high_resolution_clock::now();
    float totalSoA = accumulate(soa.mass.begin(), soa.mass.end(), 0.0f);
    auto t5 = high_resolution_clock::now();

    // 3. AoS mass sum (cache-unfriendly: mass scattered every 16 bytes)
    vector<ParticleAoS> aos(N, {0,0,0,1.0f});
    auto t6 = high_resolution_clock::now();
    float totalAoS = 0;
    for (auto& p : aos) totalAoS += p.mass;
    auto t7 = high_resolution_clock::now();

    cout << "SoA mass sum: " << duration_cast<milliseconds>(t5-t4).count() << "ms\n";
    cout << "AoS mass sum: " << duration_cast<milliseconds>(t7-t6).count() << "ms\n";
    return 0;
}
Output (approximate, -O2, 10M elements)
Without reserve: 185ms
With reserve: 95ms ← 2x faster
SoA mass sum: 12ms
AoS mass sum: 35ms ← SoA 3x faster due to cache
📊 Section 8

Profiling: gprof, perf, callgrind

profiling workflowShell
# ====== gprof (function-level timing) ======
g++ -O2 -pg prog.cpp -o prog     # compile with -pg
./prog                             # run to generate gmon.out
gprof ./prog gmon.out | head -40   # top functions by time

# ====== perf (Linux — no recompile needed) ======
perf record -g ./prog              # sample with call graph
perf report                        # interactive TUI report
perf stat ./prog                   # high-level counters (IPC, cache misses)

# ====== Valgrind callgrind (per-line counts) ======
valgrind --tool=callgrind ./prog
kcachegrind callgrind.out.*        # GUI call graph viewer

# ====== Quick: time the whole program ======
time ./prog                        # wall clock time
/usr/bin/time -v ./prog            # +memory, page faults
Profile before you optimize. Typically 80% of execution time is spent in less than 20% of the code. Profiling shows you exactly which functions are hot. Optimizing cold code wastes time and hurts readability. The workflow: profile → identify the bottleneck → apply targeted optimization → profile again to verify improvement.

Best Practices Checklist

✅ Compile with -Wall -Wextra

Catches uninitialized variables, format mismatches, sign mismatches before runtime. Add -Werror in CI to force fixes.

✅ Use -fsanitize=address,undefined

Development build must include both sanitizers. Catches memory errors and undefined behavior at ~2x overhead — use in every CI pipeline.

✅ Profile before optimizing

Use gprof or perf to find the real bottleneck before writing a single optimization. Premature optimization is the root of much wasted effort.

✅ Use -O2 for production

-O2 gives 5-10x speedup over -O0 for numeric code with no correctness risk. Reach for -O3 only after profiling shows arithmetic bottlenecks.

✅ reserve() before filling vectors

If you know the final size, call reserve(n) before any push_back. Eliminates O(log n) reallocations — typically 2x faster insertion.

✅ emplace_back over push_back

emplace_back constructs directly in the container — no temporary, no move. Prefer it over push_back for non-trivial types.

✅ Smart pointers over raw new

unique_ptr makes leaks structurally impossible and eliminates the need for manual delete in destructors.

✅ Prefer std::vector over std::list

Vector's cache-friendly layout beats list's O(1) insertion in practice for almost all sizes. Use vector by default; benchmark before switching to list.

FAQ / Interview Questions

Valgrind works on any binary without recompilation — ~10-20x slowdown, detects leaks, uninitialized reads, and invalid memory access. AddressSanitizer requires recompilation (-fsanitize=address) but runs at only ~2x overhead, making it practical for CI pipelines. Use ASan for daily development; use Valgrind when you can't or don't want to recompile (third-party binaries) or need deeper analysis.
Three approaches: (1) GDB: compile with -g -O0, run gdb ./prog, execute run, when it crashes type backtrace — shows the exact call stack and line. (2) ASan: compile with -fsanitize=address -g and run — ASan reports the error type (null dereference, buffer overflow), the faulting line, and the allocation site. (3) Core dump: ulimit -c unlimited, run the program, then gdb ./prog core and backtrace.
Undefined behavior means the C++ standard makes no guarantee — the program may crash, produce wrong output, appear correct in debug builds but fail in release, or behave differently between compilers. Compilers exploit UB for optimization: a loop that overflows a signed integer may have its bounds check removed because "signed overflow never happens." Detect UB with -fsanitize=undefined. Common UB: signed integer overflow, null pointer dereference, out-of-bounds array access, uninitialized variable read.
Move semantics transfer ownership of heap resources (pointer + size) instead of copying the data. Moving a 1M-element vector is O(1) — swap three pointers; copying is O(n) — allocate new block and copy all elements. Use std::move(x) when you no longer need x. Use emplace_back to construct directly in a container without a temporary. Always mark move constructors noexcept — std::vector only uses move during reallocation when noexcept.
Use -O2 for most production code — it's well-tested, safe, and typically provides 5-8x speedup over -O0. Use -O3 for compute-heavy numeric code (scientific computing, DSP, graphics pipelines) where aggressive vectorization and loop unrolling help. Be cautious with -O3 for code with complex control flow — larger binary size can hurt instruction cache. Always measure: compile both and benchmark; -O3 is not always faster than -O2 in practice.
Modern CPUs are 100-1000x faster than RAM access and rely on CPU caches. std::vector stores data contiguously — the CPU prefetcher loads upcoming elements before they're needed, resulting in mostly L1/L2 cache hits. std::list stores nodes in scattered heap locations — every traversal step triggers a random memory access, likely a cache miss (~200 cycles vs ~4 cycles for L1). Benchmarks consistently show vector outperforming list for element counts below millions, even when insertions/deletions are frequent, because the O(n) shift cost is paid in fast cache operations.

Related C++ Topics on CoodeVerse

Constructors & Destructors Advanced OOP Smart Pointers Move Semantics & RAII Templates STL Containers Multithreading 📚 Full C++ Course

CoodeVerse Editorial Team

Senior systems engineers and performance specialists. All benchmarks run on GCC 13, -O2, Linux x86-64. Code tested with AddressSanitizer and UBSan.