C++ Debugging & Optimization: GDB, Valgrind, ASan, Move Semantics & Profiling (2025)
⚡ Quick Answer: Debugging & Optimization in C++
- Debugging tools: GDB (breakpoints, backtrace), AddressSanitizer (-fsanitize=address), Valgrind (--leak-check=full)
- Bug categories: compile-time · runtime · logic · memory · undefined behavior
- Memory errors: ASan catches leaks, buffer overflows, use-after-free at ~2x overhead
- Compiler optimization: -O0 (debug) · -O2 (production) · -O3 (aggressive numeric code)
- Move semantics: std::move transfers O(1) instead of O(n) copy
- Profile first: use gprof or perf before optimizing — 80% of time in 20% of code
- Cache friendliness: prefer std::vector over std::list; SoA over AoS for SIMD
C++ gives you maximum control and maximum performance — but that comes with maximum responsibility for correctness. This guide covers the complete workflow: finding bugs with the right tools, measuring where time is spent, and applying the right optimization techniques. Zero theoretical fluff — every section has working code you can run.
Bug Types
5 categories of C++ bugs
Debug Tools
GDB, ASan, Valgrind
GDB Session
Real command walkthrough
Sanitizers
ASan · UBSan · TSan
Compiler Flags
-O0 -O2 -O3 -march
Move Semantics
Benchmarked performance
Cache & Memory
reserve, SoA, alignment
Profiling
gprof, perf, callgrind
Bug Types — 5 Categories With Real Examples
| Category | When detected | Example | Caught by |
|---|---|---|---|
| Compile-time errors | At compilation | Missing semicolons, type mismatches | Compiler (-Wall) |
| Linker errors | At linking | Undefined reference to symbol | Linker output |
| Runtime errors | During execution | Null dereference, division by zero | GDB, ASan |
| Logic errors | Wrong output | off-by-one, wrong operator | Tests, GDB print |
| Memory errors | Now or later | Leak, buffer overflow, use-after-free | ASan, Valgrind |
| Undefined behavior | Any time | Signed overflow, uninitialized read | UBSan, ASan |
#include <iostream>
#include <vector>
using namespace std;
// Logic error: returns a*b instead of a+b
int add(int a, int b) { return a * b; } // BUG: should be a+b
// Memory error: write beyond array bounds
void bufferOverflow() {
int arr[5] = {};
arr[10] = 42; // BUG: out-of-bounds write (UB, caught by ASan)
}
// Memory leak: allocated but never freed
void memoryLeak() {
int* p = new int(100);
cout << *p << endl;
// BUG: delete p; missing — caught by Valgrind / ASan
}
// Undefined behavior: signed integer overflow
void intOverflow() {
int x = INT_MAX;
x++; // BUG: UB — caught by -fsanitize=undefined
cout << x << endl;
}
int main() {
cout << add(3, 5) << endl; // prints 15 (not 8) — logic error
return 0;
}
15 ← expected 8; logic error in add()Debugging Tools Comparison
| Tool | Invocation | Detects | Overhead | Best for |
|---|---|---|---|---|
| GDB | gdb ./prog | Crashes, logic, call stack | Low | Step-through, variable inspection |
| LLDB | lldb ./prog | Same as GDB | Low | macOS / Clang users |
| AddressSanitizer | -fsanitize=address | Buffer overflows, UAF, leaks | ~2x | Daily dev, CI pipelines |
| UBSanitizer | -fsanitize=undefined | Signed overflow, null deref, OOB | ~1.1x | All production code |
| ThreadSanitizer | -fsanitize=thread | Data races, deadlocks | ~5-15x | Multithreaded programs |
| Valgrind Memcheck | valgrind ./prog | Leaks, uninitialized reads | ~10-20x | No-recompile leak hunting |
| Valgrind Callgrind | valgrind --tool=callgrind | Function call counts & time | ~10-20x | Detailed profiling |
| Cppcheck | cppcheck prog.cpp | Static: null ptr, dead code, UB | None (static) | Pre-commit checks |
g++ -Wall -Wextra -Werror -g -O0 -fsanitize=address -fsanitize=undefined -std=c++17 prog.cpp -o progAdd this to your Makefile's debug target. It catches compile-time issues, memory errors, and undefined behavior simultaneously.
GDB Walkthrough — Real Session With Commands
GDB Command Reference
| Command | Short form | What it does |
|---|---|---|
| break main | b main | Set breakpoint at main() |
| break prog.cpp:25 | b prog.cpp:25 | Set breakpoint at line 25 |
| run [args] | r | Start/restart program |
| next | n | Execute next line (step over functions) |
| step | s | Execute next line (step into functions) |
| continue | c | Run until next breakpoint |
| print x | p x | Print value of variable x |
| display x | disp x | Print x after every step |
| backtrace | bt | Show full call stack (KEY for segfaults) |
| info locals | Print all local variables | |
| watch x | Break when x changes value | |
| up / down | Navigate the call stack | |
| list | l | Show source code around current line |
| quit | q | Exit GDB |
#include <iostream>
using namespace std;
int* findValue(int* arr, int n, int target) {
for (int i = 0; i < n; i++)
if (arr[i] == target) return &arr[i];
return nullptr; // not found
}
int main() {
int data[] = {1, 3, 5, 7};
int* result = findValue(data, 4, 99); // 99 not in array → nullptr
cout << *result << endl; // BUG: dereferencing nullptr → SEGFAULT
return 0;
}
$ g++ -g -O0 segfault_demo.cpp -o demo
$ gdb ./demo
(gdb) run
Program received signal SIGSEGV, Segmentation fault.
0x0000000000401198 in main () at segfault_demo.cpp:13
13 cout << *result << endl;
(gdb) backtrace
#0 0x0000000000401198 in main () at segfault_demo.cpp:13
(gdb) print result
$1 = (int *) 0x0 ← NULL pointer! That's the bug.
(gdb) quit
if (result != nullptr) cout << *result; else cout << "Not found";AddressSanitizer, UBSan & Valgrind
#include <iostream>
using namespace std;
int main() {
int* arr = new int[5];
arr[7] = 42; // heap-buffer-overflow: valid range is arr[0..4]
delete[] arr;
return 0;
}
==1234== ERROR: AddressSanitizer: heap-buffer-overflow
WRITE of size 4 at 0x... thread T0
#0 0x401... in main heap_overflow.cpp:6
0x... is located 8 bytes to the right of 20-byte region
allocated by main at heap_overflow.cpp:5
-fsanitize=address -g. Runtime: ~2x overhead. Reports the exact line of the
bad access AND the line where the memory was allocated. Works on heap, stack, and global
buffers. Zero false positives in practice.
#include <iostream>
#include <climits>
using namespace std;
int main() {
// UB 1: signed integer overflow
int x = INT_MAX;
cout << x + 1 << endl; // UB: signed overflow
// UB 2: array out-of-bounds
int arr[3] = {1,2,3};
cout << arr[5] << endl; // UB: index out of bounds
return 0;
}
ub_examples.cpp:7: runtime error: signed integer overflow:
2147483647 + 1 cannot be represented in type 'int'
ub_examples.cpp:11: runtime error: index 5 out of bounds
for type 'int [3]'
#include <iostream>
using namespace std;
void allocateWithoutFree() {
int* data = new int[100];
data[0] = 42;
// BUG: delete[] data; missing
}
int main() {
allocateWithoutFree();
return 0;
}
g++ -g leak.cpp -o leak
valgrind --leak-check=full ./leak
==5678== 400 bytes in 1 blocks are definitely lost
==5678== at 0x...: operator new[] (vg_replace_malloc.c:...)
==5678== by 0x...: allocateWithoutFree() (leak.cpp:5)
==5678== by 0x...: main (leak.cpp:11)
==5678== LEAK SUMMARY: definitely lost: 400 bytes
delete[] data; before returning, or replace int* data = new int[100]; with std::vector<int> data(100); — which deallocates automatically via RAII.
Compiler Optimization Flags (-O0 to -O3)
| Flag | What it enables | Use when |
|---|---|---|
| -O0 | No optimization. Compile-time is fast. One-to-one correspondence between source and binary. | Debugging (-g debugging with -O0) |
| -Og | Minimal optimization that doesn't interfere with debugging. | Debug builds requiring some speed |
| -O1 | Basic: dead code elimination, constant folding. Slightly faster compile than -O2. | Rarely needed |
| -O2 | All -O1 plus: function inlining, loop invariant motion, vectorization, tail calls. Most common production choice. | Production builds |
| -O3 | All -O2 plus: more aggressive inlining, loop unrolling, more vectorization passes. Can increase binary size. | Numeric/compute-intensive code |
| -Os | Optimize for code size. Disables opts that increase size. | Embedded, firmware |
| -march=native | Use all CPU instructions available on the build machine (SSE4, AVX2, etc.). Not portable. | Local benchmarks, HPC |
| -flto | Link-time optimization: optimize across translation units. | Large multi-file projects |
# Compile both versions
g++ -O0 -std=c++17 bench.cpp -o bench_O0
g++ -O2 -std=c++17 bench.cpp -o bench_O2
g++ -O3 -march=native -std=c++17 bench.cpp -o bench_O3
# Time each
time ./bench_O0 # ~2800ms
time ./bench_O2 # ~350ms (8x faster)
time ./bench_O3 # ~280ms (slightly more on numeric code)
Move Semantics & emplace_back — Benchmarked
#include <iostream>
#include <vector>
#include <chrono>
#include <string>
using namespace std;
using namespace std::chrono;
int main() {
const int N = 100000;
// 1. push_back with copy — creates temp, then copies into vector
auto t1 = high_resolution_clock::now();
vector<string> v1;
v1.reserve(N);
for (int i=0; i<N; i++) {
string s = "hello_world_" + to_string(i);
v1.push_back(s); // copy
}
auto t2 = high_resolution_clock::now();
// 2. push_back with std::move — transfer instead of copy
vector<string> v2;
v2.reserve(N);
for (int i=0; i<N; i++) {
string s = "hello_world_" + to_string(i);
v2.push_back(move(s)); // move: O(1) instead of O(len)
}
auto t3 = high_resolution_clock::now();
// 3. emplace_back — construct directly in vector, no temp at all
vector<string> v3;
v3.reserve(N);
for (int i=0; i<N; i++)
v3.emplace_back("hello_world_" + to_string(i)); // construct in-place
auto t4 = high_resolution_clock::now();
cout << "push_back (copy): "
<< duration_cast<microseconds>(t2-t1).count() << " µs\n";
cout << "push_back (move): "
<< duration_cast<microseconds>(t3-t2).count() << " µs\n";
cout << "emplace_back: "
<< duration_cast<microseconds>(t4-t3).count() << " µs\n";
return 0;
}
push_back (copy): 4200 µs
push_back (move): 2100 µs ← 2x faster
emplace_back: 1800 µs ← 2.3x fasteremplace_back instead of
push_back — constructs directly in the vector with no temporary. (2) Use
std::move(x) when you no longer need x — transfers ownership O(1).
(3) Always reserve(n) when you know the final size — eliminates reallocations.
(4) Mark move constructors noexcept — std::vector uses move during reallocation
only when noexcept.
Cache-Friendly Code: reserve, SoA, and Memory Layout
#include <iostream>
#include <vector>
#include <chrono>
#include <numeric>
using namespace std;
using namespace std::chrono;
const int N = 10000000;
// AoS (Array of Structures) — x,y,z,mass interleaved in memory
struct ParticleAoS { float x,y,z,mass; };
// SoA (Structure of Arrays) — each field contiguous
struct ParticlesSoA {
vector<float> x,y,z,mass;
ParticlesSoA(int n) { x.resize(n); y.resize(n); z.resize(n); mass.resize(n,1.0f); }
};
int main() {
// 1. reserve() eliminates reallocations
vector<int> slow, fast;
// slow: no reserve — triggers O(log n) reallocations
auto t1 = high_resolution_clock::now();
for (int i=0; i<N; i++) slow.push_back(i);
auto t2 = high_resolution_clock::now();
fast.reserve(N); // fast: pre-allocate — O(1) insertions
for (int i=0; i<N; i++) fast.push_back(i);
auto t3 = high_resolution_clock::now();
cout << "Without reserve: " << duration_cast<milliseconds>(t2-t1).count() << "ms\n";
cout << "With reserve: " << duration_cast<milliseconds>(t3-t2).count() << "ms\n";
// 2. SoA mass sum (cache-friendly: all masses contiguous)
ParticlesSoA soa(N);
auto t4 = high_resolution_clock::now();
float totalSoA = accumulate(soa.mass.begin(), soa.mass.end(), 0.0f);
auto t5 = high_resolution_clock::now();
// 3. AoS mass sum (cache-unfriendly: mass scattered every 16 bytes)
vector<ParticleAoS> aos(N, {0,0,0,1.0f});
auto t6 = high_resolution_clock::now();
float totalAoS = 0;
for (auto& p : aos) totalAoS += p.mass;
auto t7 = high_resolution_clock::now();
cout << "SoA mass sum: " << duration_cast<milliseconds>(t5-t4).count() << "ms\n";
cout << "AoS mass sum: " << duration_cast<milliseconds>(t7-t6).count() << "ms\n";
return 0;
}
Without reserve: 185ms
With reserve: 95ms ← 2x faster
SoA mass sum: 12ms
AoS mass sum: 35ms ← SoA 3x faster due to cacheProfiling: gprof, perf, callgrind
# ====== gprof (function-level timing) ======
g++ -O2 -pg prog.cpp -o prog # compile with -pg
./prog # run to generate gmon.out
gprof ./prog gmon.out | head -40 # top functions by time
# ====== perf (Linux — no recompile needed) ======
perf record -g ./prog # sample with call graph
perf report # interactive TUI report
perf stat ./prog # high-level counters (IPC, cache misses)
# ====== Valgrind callgrind (per-line counts) ======
valgrind --tool=callgrind ./prog
kcachegrind callgrind.out.* # GUI call graph viewer
# ====== Quick: time the whole program ======
time ./prog # wall clock time
/usr/bin/time -v ./prog # +memory, page faults
Best Practices Checklist
✅ Compile with -Wall -Wextra
Catches uninitialized variables, format mismatches, sign mismatches before runtime. Add -Werror in CI to force fixes.
✅ Use -fsanitize=address,undefined
Development build must include both sanitizers. Catches memory errors and undefined behavior at ~2x overhead — use in every CI pipeline.
✅ Profile before optimizing
Use gprof or perf to find the real bottleneck before writing a single optimization. Premature optimization is the root of much wasted effort.
✅ Use -O2 for production
-O2 gives 5-10x speedup over -O0 for numeric code with no correctness risk. Reach for -O3 only after profiling shows arithmetic bottlenecks.
✅ reserve() before filling vectors
If you know the final size, call reserve(n) before any push_back. Eliminates O(log n) reallocations — typically 2x faster insertion.
✅ emplace_back over push_back
emplace_back constructs directly in the container — no temporary, no move. Prefer it over push_back for non-trivial types.
✅ Smart pointers over raw new
unique_ptr makes leaks structurally impossible and eliminates the need for manual delete in destructors.
✅ Prefer std::vector over std::list
Vector's cache-friendly layout beats list's O(1) insertion in practice for almost all sizes. Use vector by default; benchmark before switching to list.
FAQ / Interview Questions
-fsanitize=address) but runs at only ~2x overhead, making it practical for CI pipelines. Use ASan for daily development; use Valgrind when you can't or don't want to recompile (third-party binaries) or need deeper analysis.-g -O0, run gdb ./prog, execute run, when it crashes type backtrace — shows the exact call stack and line. (2) ASan: compile with -fsanitize=address -g and run — ASan reports the error type (null dereference, buffer overflow), the faulting line, and the allocation site. (3) Core dump: ulimit -c unlimited, run the program, then gdb ./prog core and backtrace.-fsanitize=undefined. Common UB: signed integer overflow, null pointer dereference, out-of-bounds array access, uninitialized variable read.std::move(x) when you no longer need x. Use emplace_back to construct directly in a container without a temporary. Always mark move constructors noexcept — std::vector only uses move during reallocation when noexcept.-O2 for most production code — it's well-tested, safe, and typically provides 5-8x speedup over -O0. Use -O3 for compute-heavy numeric code (scientific computing, DSP, graphics pipelines) where aggressive vectorization and loop unrolling help. Be cautious with -O3 for code with complex control flow — larger binary size can hurt instruction cache. Always measure: compile both and benchmark; -O3 is not always faster than -O2 in practice.Master C++ from beginner to professional
Structured lessons, 200+ exercises, completion certificate. Join 50,000+ students.