What is debugging in C++?

Debugging in C++ is the systematic process of finding, analyzing, and fixing errors that prevent a program from running correctly. C++ bugs fall into five categories: (1) Compile-time errors (syntax errors, type mismatches) caught by the compiler. (2) Linker errors (undefined references) caught at link time. (3) Runtime errors (segfaults, null dereferences) that crash the program. (4) Logic errors (wrong output, no crash) that require careful testing. (5) Memory errors (leaks, buffer overflows, use-after-free) that require tools like Valgrind or AddressSanitizer.

What are the best C++ debugging tools in 2025?

The essential C++ debugging tools are: GDB (GNU Debugger) — free, powerful, command-line debugger for Linux/macOS. LLDB — Clang's debugger, similar commands to GDB, preferred on macOS. Visual Studio Debugger — best-in-class GUI debugger for Windows. Valgrind — detects memory leaks, use-after-free, uninitialized reads (Linux). AddressSanitizer (-fsanitize=address) — 2x overhead runtime memory error detector, faster than Valgrind. UndefinedBehaviorSanitizer (-fsanitize=undefined) — catches signed overflow, null dereference, out-of-bounds. Cppcheck — static analyzer, no runtime needed. CLion / VS Code — IDE-integrated debuggers with GUI.

How do you use GDB to debug a C++ program?

(1) Compile with debug symbols: g++ -g -O0 prog.cpp -o prog. (2) Launch: gdb ./prog. (3) Set a breakpoint: break main or break prog.cpp:15. (4) Run: run [args]. (5) Step one line: next (n). Step into functions: step (s). (6) Print a variable: print x or display x (prints after every step). (7) See the call stack: backtrace (bt). (8) Continue to next breakpoint: continue (c). (9) Inspect memory: x/10i $pc (next 10 instructions). (10) Quit: quit. The backtrace command is the most important for segfaults — it shows exactly which function call chain led to the crash.

How do you find memory leaks in C++?

Three approaches: (1) Valgrind: valgrind --leak-check=full --show-leak-kinds=all ./prog — reports every leaked allocation with file and line. ~10-20x slowdown. (2) AddressSanitizer: compile with g++ -fsanitize=address -g prog.cpp and run normally — reports leaks at exit with ~2x slowdown. Faster than Valgrind for daily development. (3) Smart pointers: use unique_ptr and shared_ptr instead of raw new/delete — the compiler ensures resources are freed when objects go out of scope, making leaks structurally impossible for owned resources.

What is AddressSanitizer (ASan) and how do you use it?

AddressSanitizer is a fast runtime memory error detector built into GCC and Clang. It detects: heap buffer overflow, stack buffer overflow, use-after-free, use-after-return, use-after-scope, double-free, memory leaks. To use: g++ -fsanitize=address -fsanitize=undefined -g prog.cpp -o prog && ./prog. When an error occurs, ASan prints the exact error type, the stack trace, the allocation site, and the memory address involved. ~2x slower than normal but much faster than Valgrind. Use ASan for daily development and CI pipelines.

What causes a segmentation fault in C++ and how do I debug it?

A segfault (SIGSEGV) occurs when your program accesses a memory address it doesn't own. Common causes: dereferencing a null pointer, array out-of-bounds access, use-after-free, stack overflow from infinite recursion, writing to a read-only string literal. To debug: (1) Compile with -g and run under GDB — on crash, type backtrace to see the exact line. (2) Compile with -fsanitize=address — ASan prints the crash type, address, and full call stack. (3) Enable core dumps (ulimit -c unlimited) and load in GDB: gdb ./prog core.

What C++ compiler flags should I use for optimization?

GCC/Clang optimization flags: -O0 (no optimization, default — fastest compile, best for debugging). -O1 (basic optimizations, no aggressive transforms). -O2 (recommended for production — enables vectorization, function inlining, loop optimizations without increasing binary size). -O3 (aggressive — adds auto-vectorization, more inlining; can increase binary size, occasionally slower due to instruction cache pressure). -Os (optimize for binary size — useful for embedded). -march=native (optimize for the exact CPU running the build, not portable). Use -O2 for production, -O0 or -Og for debugging, and profile before reaching for -O3.

What is the difference between -O2 and -O3 in GCC?

-O2 enables: dead code elimination, constant propagation, common subexpression elimination, function inlining (small functions), loop invariant code motion, tail call optimization, vectorization. -O3 adds on top of -O2: more aggressive inlining (larger functions), loop unrolling, more vectorization passes, __builtin_expect branch hints. When to use -O3: compute-heavy numeric code (scientific, DSP, graphics). When to avoid -O3: code with many branches or complex control flow where increased binary size hurts instruction cache. Profile both and compare.

How do I profile a C++ program to find bottlenecks?

Three profiling tools: (1) gprof: compile with g++ -pg prog.cpp, run ./prog to generate gmon.out, then gprof ./prog gmon.out | less — shows which functions consume the most CPU time. (2) perf (Linux): perf record ./prog && perf report — samples CPU program counter to produce a flamegraph. No recompile needed. (3) Valgrind's callgrind: valgrind --tool=callgrind ./prog && kcachegrind callgrind.out.* — detailed call graph with per-line counts. Rule: always profile BEFORE optimizing. Premature optimization wastes time and degrades readability. Profiling shows where 80% of time is spent — that's where optimization pays off.

How does std::vector reserve improve performance?

std::vector automatically doubles its capacity when it runs out of space, which requires allocating a new block and moving all existing elements — an O(n) operation. If you push_back n elements into an empty vector, the worst case involves O(n log n) total work across all reallocations. vector.reserve(n) pre-allocates capacity for n elements before insertion, eliminating all reallocations. Result: O(n) total insertions instead of O(n log n). Use reserve when you know the approximate final size: if reading n elements from input, v.reserve(n) before the loop.

What is cache-friendly code in C++ and why does it matter?

Modern CPUs are 100-1000x faster than RAM. They work on data in cache (L1: ~4 cycles, L2: ~12 cycles, L3: ~40 cycles, RAM: ~200 cycles). Cache-friendly code accesses memory sequentially so the CPU's prefetcher loads upcoming data before it's needed. Arrays and std::vector store data contiguously — iterating them is cache-friendly. std::list stores nodes in scattered heap locations — every access risks a cache miss. Rule: prefer std::vector over std::list for sequential access even if insertions are O(n) — the cache advantage makes vector faster in practice for most sizes up to millions of elements.

How do you debug an infinite loop in C++?

To debug an infinite loop: (1) Run the program under GDB: gdb ./prog, then run, then press Ctrl+C to interrupt — GDB stops at the current line inside the loop. (2) Type backtrace to see the call stack. (3) Print the loop variables: print i, print condition. (4) Use display variable to print after every step. (5) Alternatively, add a counter or condition with assert: assert(iteration++ < 10000000) — the assertion fires with a backtrace when the counter exceeds the limit. (6) For release builds, add fprintf(stderr, 'iteration %d\n', i) and watch stderr output.

What is const correctness in C++ and how does it help optimization?

Const correctness means marking variables, parameters, and member functions const whenever they should not modify data: const int x = 5; void f(const string& s); int size() const. Benefits: (1) Documents intent — callers know the function won't modify their data. (2) Enables compiler optimizations — the compiler can cache const values in registers and skip re-loads. (3) Allows passing to const-ref parameters without a copy. (4) Prevents accidental modification bugs. Best practice: use const by default and remove it only when modification is genuinely needed.

How do I fix a C++ program that crashes with 'corrupted heap' or 'double free'?

Heap corruption and double-free crashes are caused by: (1) Calling delete on the same pointer twice. (2) Writing beyond an allocated buffer (heap buffer overflow). (3) Using a pointer after free. Diagnosis: compile with -fsanitize=address -g and run — ASan reports the exact error, the double-free location, and the original allocation site with file and line numbers. Fix patterns: use unique_ptr (automatically prevents double-free), set raw pointers to nullptr after delete (delete nullptr is a safe no-op), use vector instead of manual arrays.

What are the most important C++ debugging tips for beginners?

Top 5 beginner debugging tips: (1) Always compile with -Wall -Wextra — most bugs are caught as warnings. (2) Add -fsanitize=address -fsanitize=undefined to your development build — finds memory errors and undefined behavior instantly. (3) When you get a segfault, compile with -g and run under GDB, type backtrace — it shows the exact line. (4) Use assert() to verify invariants: assert(ptr != nullptr), assert(i < n). The assertion fires with a backtrace instead of silent wrong behavior. (5) Binary search for the bug — add a print or assertion in the middle of the code; if it fires, the bug is before; if not, the bug is after. Narrow down the range.

How does inline function optimization work in C++?

Inlining replaces a function call with the function's actual code at the call site, eliminating function call overhead (stack frame setup, return address push/pop). The compiler inlines functions automatically at -O2 and above when the function is small. The inline keyword is a hint to the compiler, not a command — modern compilers often ignore it if the function is too large. Use constexpr for compile-time computed functions. Use __attribute__((always_inline)) (GCC) to force inlining. Avoid over-inlining: large functions inlined many times inflate the binary and hurt instruction cache performance.

What is structure of arrays (SoA) vs array of structures (AoS) in C++ optimization?

Array of Structures (AoS): struct Particle { float x,y,z,mass; }; Particle particles[N]; — all data for one particle is together. Good for single-particle access, bad for processing one field across all particles (mass array is scattered). Structure of Arrays (SoA): float x[N], y[N], z[N], mass[N]; — each field is contiguous. Good for SIMD vectorization and processing one field across all particles — the CPU loads 4-8 floats at a time. For physics simulations, graphics, and DSP that process millions of elements with the same operation, SoA can be 4-8x faster due to better SIMD utilization and cache efficiency.

Last updated: December 2025

C++ Debugging & Optimization: GDB, Valgrind, ASan, Move Semantics & Profiling (2025)

Q: How do move semantics improve C++ performance?

Move semantics (C++11) transfer ownership of resources instead of copying them. When you return a vector from a function or pass a temporary to push_back, the compiler can move (steal the pointer) instead of copying all elements. std::move(x) casts x to an rvalue reference, telling the compiler to use the move constructor or move assignment. After a move, x is in a valid but unspecified state. Performance impact: moving a 1M-element vector is O(1) (pointer swap) instead of O(n) (full copy). emplace_back constructs directly in the container, avoiding an extra copy.

By CoodeVerse Editorial Team December 16, 2025 ✓ 2025 Verified ⏱ 22 min read 🎯 Intermediate 📦 GCC/Clang · C++11/17

Difficulty:

Intermediate — Prerequisites: Classes & Objects, Constructors & Destructors

⚡ Quick Answer: Debugging & Optimization in C++

Debugging tools: GDB (breakpoints, backtrace), AddressSanitizer (-fsanitize=address), Valgrind (--leak-check=full)
Bug categories: compile-time · runtime · logic · memory · undefined behavior
Memory errors: ASan catches leaks, buffer overflows, use-after-free at ~2x overhead
Compiler optimization: -O0 (debug) · -O2 (production) · -O3 (aggressive numeric code)
Move semantics: std::move transfers O(1) instead of O(n) copy
Profile first: use gprof or perf before optimizing — 80% of time in 20% of code
Cache friendliness: prefer std::vector over std::list; SoA over AoS for SIMD

C++ gives you maximum control and maximum performance — but that comes with maximum responsibility for correctness. This guide covers the complete workflow: finding bugs with the right tools, measuring where time is spent, and applying the right optimization techniques. Zero theoretical fluff — every section has working code you can run.

🐛

Bug Types

5 categories of C++ bugs

🔧

Debug Tools

GDB, ASan, Valgrind

🖥️

GDB Session

Real command walkthrough

🛡️

Sanitizers

ASan · UBSan · TSan

⚡

Compiler Flags

-O0 -O2 -O3 -march

🚀

Move Semantics

Benchmarked performance

🧠

Cache & Memory

reserve, SoA, alignment

📊

Profiling

gprof, perf, callgrind

🐛 Section 1

Bug Types — 5 Categories With Real Examples

Category	When detected	Example	Caught by
Compile-time errors	At compilation	Missing semicolons, type mismatches	Compiler (-Wall)
Linker errors	At linking	Undefined reference to symbol	Linker output
Runtime errors	During execution	Null dereference, division by zero	GDB, ASan
Logic errors	Wrong output	off-by-one, wrong operator	Tests, GDB print
Memory errors	Now or later	Leak, buffer overflow, use-after-free	ASan, Valgrind
Undefined behavior	Any time	Signed overflow, uninitialized read	UBSan, ASan

bug_examples.cpp — one of each bug typeC++

#include <iostream>
#include <vector>
using namespace std;

// Logic error: returns a*b instead of a+b
int add(int a, int b) { return a * b; }  // BUG: should be a+b

// Memory error: write beyond array bounds
void bufferOverflow() {
    int arr[5] = {};
    arr[10] = 42;  // BUG: out-of-bounds write (UB, caught by ASan)
}

// Memory leak: allocated but never freed
void memoryLeak() {
    int* p = new int(100);
    cout << *p << endl;
    // BUG: delete p; missing — caught by Valgrind / ASan
}

// Undefined behavior: signed integer overflow
void intOverflow() {
    int x = INT_MAX;
    x++;  // BUG: UB — caught by -fsanitize=undefined
    cout << x << endl;
}

int main() {
    cout << add(3, 5) << endl;  // prints 15 (not 8) — logic error
    return 0;
}

Output (wrong)

15 ← expected 8; logic error in add()

🔧 Section 2

Debugging Tools Comparison

Tool	Invocation	Detects	Overhead	Best for
GDB	gdb ./prog	Crashes, logic, call stack	Low	Step-through, variable inspection
LLDB	lldb ./prog	Same as GDB	Low	macOS / Clang users
AddressSanitizer	-fsanitize=address	Buffer overflows, UAF, leaks	~2x	Daily dev, CI pipelines
UBSanitizer	-fsanitize=undefined	Signed overflow, null deref, OOB	~1.1x	All production code
ThreadSanitizer	-fsanitize=thread	Data races, deadlocks	~5-15x	Multithreaded programs
Valgrind Memcheck	valgrind ./prog	Leaks, uninitialized reads	~10-20x	No-recompile leak hunting
Valgrind Callgrind	valgrind --tool=callgrind	Function call counts & time	~10-20x	Detailed profiling
Cppcheck	cppcheck prog.cpp	Static: null ptr, dead code, UB	None (static)	Pre-commit checks

Recommended development build command (all sanitizers):
g++ -Wall -Wextra -Werror -g -O0 -fsanitize=address -fsanitize=undefined -std=c++17 prog.cpp -o prog
Add this to your Makefile's debug target. It catches compile-time issues, memory errors, and undefined behavior simultaneously.

🖥️ Section 3

GDB Walkthrough — Real Session With Commands

GDB Command Reference

Command	Short form	What it does
break main	b main	Set breakpoint at main()
break prog.cpp:25	b prog.cpp:25	Set breakpoint at line 25
run [args]	r	Start/restart program
next	n	Execute next line (step over functions)
step	s	Execute next line (step into functions)
continue	c	Run until next breakpoint
print x	p x	Print value of variable x
display x	disp x	Print x after every step
backtrace	bt	Show full call stack (KEY for segfaults)
info locals		Print all local variables
watch x		Break when x changes value
up / down		Navigate the call stack
list	l	Show source code around current line
quit	q	Exit GDB

segfault_demo.cpp — program with null dereferenceC++

#include <iostream>
using namespace std;

int* findValue(int* arr, int n, int target) {
    for (int i = 0; i < n; i++)
        if (arr[i] == target) return &arr[i];
    return nullptr;  // not found
}

int main() {
    int data[] = {1, 3, 5, 7};
    int* result = findValue(data, 4, 99);  // 99 not in array → nullptr
    cout << *result << endl;  // BUG: dereferencing nullptr → SEGFAULT
    return 0;
}

GDB session — finding the null dereferenceShell

$ g++ -g -O0 segfault_demo.cpp -o demo
$ gdb ./demo
(gdb) run

Program received signal SIGSEGV, Segmentation fault.
0x0000000000401198 in main () at segfault_demo.cpp:13
13          cout << *result << endl;

(gdb) backtrace
#0  0x0000000000401198 in main () at segfault_demo.cpp:13

(gdb) print result
$1 = (int *) 0x0     ← NULL pointer! That's the bug.

(gdb) quit

Fix

if (result != nullptr) cout << *result; else cout << "Not found";

🛡️ Section 4

AddressSanitizer, UBSan & Valgrind

heap_overflow.cpp + ASan outputC++

#include <iostream>
using namespace std;

int main() {
    int* arr = new int[5];
    arr[7] = 42;   // heap-buffer-overflow: valid range is arr[0..4]
    delete[] arr;
    return 0;
}

ASan Output (g++ -fsanitize=address -g)

==1234== ERROR: AddressSanitizer: heap-buffer-overflow
WRITE of size 4 at 0x... thread T0
    #0 0x401... in main heap_overflow.cpp:6

0x... is located 8 bytes to the right of 20-byte region
allocated by main at heap_overflow.cpp:5

ASan is your first line of defense. Compile time: add -fsanitize=address -g. Runtime: ~2x overhead. Reports the exact line of the bad access AND the line where the memory was allocated. Works on heap, stack, and global buffers. Zero false positives in practice.

ub_examples.cpp + UBSan outputC++

#include <iostream>
#include <climits>
using namespace std;

int main() {
    // UB 1: signed integer overflow
    int x = INT_MAX;
    cout << x + 1 << endl;    // UB: signed overflow

    // UB 2: array out-of-bounds
    int arr[3] = {1,2,3};
    cout << arr[5] << endl;  // UB: index out of bounds

    return 0;
}

UBSan Output (g++ -fsanitize=undefined -g)

ub_examples.cpp:7: runtime error: signed integer overflow:
2147483647 + 1 cannot be represented in type 'int'

ub_examples.cpp:11: runtime error: index 5 out of bounds
for type 'int [3]'

memory_leak.cpp + Valgrind outputC++

#include <iostream>
using namespace std;

void allocateWithoutFree() {
    int* data = new int[100];
    data[0] = 42;
    // BUG: delete[] data; missing
}

int main() {
    allocateWithoutFree();
    return 0;
}

run with valgrindShell

g++ -g leak.cpp -o leak
valgrind --leak-check=full ./leak

Valgrind Output

==5678== 400 bytes in 1 blocks are definitely lost
==5678==    at 0x...: operator new[] (vg_replace_malloc.c:...)
==5678==    by 0x...: allocateWithoutFree() (leak.cpp:5)
==5678==    by 0x...: main (leak.cpp:11)
==5678== LEAK SUMMARY: definitely lost: 400 bytes

Fix: Add delete[] data; before returning, or replace int* data = new int[100]; with std::vector<int> data(100); — which deallocates automatically via RAII.

⚡ Section 5

Compiler Optimization Flags (-O0 to -O3)

Flag	What it enables	Use when
-O0	No optimization. Compile-time is fast. One-to-one correspondence between source and binary.	Debugging (-g debugging with -O0)
-Og	Minimal optimization that doesn't interfere with debugging.	Debug builds requiring some speed
-O1	Basic: dead code elimination, constant folding. Slightly faster compile than -O2.	Rarely needed
-O2	All -O1 plus: function inlining, loop invariant motion, vectorization, tail calls. Most common production choice.	Production builds
-O3	All -O2 plus: more aggressive inlining, loop unrolling, more vectorization passes. Can increase binary size.	Numeric/compute-intensive code
-Os	Optimize for code size. Disables opts that increase size.	Embedded, firmware
-march=native	Use all CPU instructions available on the build machine (SSE4, AVX2, etc.). Not portable.	Local benchmarks, HPC
-flto	Link-time optimization: optimize across translation units.	Large multi-file projects

measuring -O0 vs -O2 effect on tight loopShell

# Compile both versions
g++ -O0 -std=c++17 bench.cpp -o bench_O0
g++ -O2 -std=c++17 bench.cpp -o bench_O2
g++ -O3 -march=native -std=c++17 bench.cpp -o bench_O3

# Time each
time ./bench_O0   # ~2800ms
time ./bench_O2   # ~350ms  (8x faster)
time ./bench_O3   # ~280ms  (slightly more on numeric code)

Golden rule: never profile or benchmark -O0 builds. The -O0 numbers are meaningless for production performance. Always measure with -O2 (or -O3 for numeric code). And never debug with -O2 — the compiler's transformations make variables disappear from GDB.

🚀 Section 6

Move Semantics & emplace_back — Benchmarked

move_benchmark.cpp — copy vs move vs emplaceC++

#include <iostream>
#include <vector>
#include <chrono>
#include <string>
using namespace std;
using namespace std::chrono;

int main() {
    const int N = 100000;

    // 1. push_back with copy — creates temp, then copies into vector
    auto t1 = high_resolution_clock::now();
    vector<string> v1;
    v1.reserve(N);
    for (int i=0; i<N; i++) {
        string s = "hello_world_" + to_string(i);
        v1.push_back(s);   // copy
    }
    auto t2 = high_resolution_clock::now();

    // 2. push_back with std::move — transfer instead of copy
    vector<string> v2;
    v2.reserve(N);
    for (int i=0; i<N; i++) {
        string s = "hello_world_" + to_string(i);
        v2.push_back(move(s));  // move: O(1) instead of O(len)
    }
    auto t3 = high_resolution_clock::now();

    // 3. emplace_back — construct directly in vector, no temp at all
    vector<string> v3;
    v3.reserve(N);
    for (int i=0; i<N; i++)
        v3.emplace_back("hello_world_" + to_string(i));  // construct in-place
    auto t4 = high_resolution_clock::now();

    cout << "push_back (copy):   "
         << duration_cast<microseconds>(t2-t1).count() << " µs\n";
    cout << "push_back (move):   "
         << duration_cast<microseconds>(t3-t2).count() << " µs\n";
    cout << "emplace_back:       "
         << duration_cast<microseconds>(t4-t3).count() << " µs\n";
    return 0;
}

Output (approximate, -O2)

push_back (copy):   4200 µs
push_back (move):   2100 µs  ← 2x faster
emplace_back:       1800 µs  ← 2.3x faster

Move semantics rules: (1) Use emplace_back instead of push_back — constructs directly in the vector with no temporary. (2) Use std::move(x) when you no longer need x — transfers ownership O(1). (3) Always reserve(n) when you know the final size — eliminates reallocations. (4) Mark move constructors noexcept — std::vector uses move during reallocation only when noexcept.

🧠 Section 7

Cache-Friendly Code: reserve, SoA, and Memory Layout

cache_demo.cpp — vector::reserve + SoA vs AoSC++

#include <iostream>
#include <vector>
#include <chrono>
#include <numeric>
using namespace std;
using namespace std::chrono;

const int N = 10000000;

// AoS (Array of Structures) — x,y,z,mass interleaved in memory
struct ParticleAoS { float x,y,z,mass; };

// SoA (Structure of Arrays) — each field contiguous
struct ParticlesSoA {
    vector<float> x,y,z,mass;
    ParticlesSoA(int n) { x.resize(n); y.resize(n); z.resize(n); mass.resize(n,1.0f); }
};

int main() {
    // 1. reserve() eliminates reallocations
    vector<int> slow, fast;
    // slow: no reserve — triggers O(log n) reallocations
    auto t1 = high_resolution_clock::now();
    for (int i=0; i<N; i++) slow.push_back(i);
    auto t2 = high_resolution_clock::now();
    fast.reserve(N);  // fast: pre-allocate — O(1) insertions
    for (int i=0; i<N; i++) fast.push_back(i);
    auto t3 = high_resolution_clock::now();
    cout << "Without reserve: " << duration_cast<milliseconds>(t2-t1).count() << "ms\n";
    cout << "With reserve:    " << duration_cast<milliseconds>(t3-t2).count() << "ms\n";

    // 2. SoA mass sum (cache-friendly: all masses contiguous)
    ParticlesSoA soa(N);
    auto t4 = high_resolution_clock::now();
    float totalSoA = accumulate(soa.mass.begin(), soa.mass.end(), 0.0f);
    auto t5 = high_resolution_clock::now();

    // 3. AoS mass sum (cache-unfriendly: mass scattered every 16 bytes)
    vector<ParticleAoS> aos(N, {0,0,0,1.0f});
    auto t6 = high_resolution_clock::now();
    float totalAoS = 0;
    for (auto& p : aos) totalAoS += p.mass;
    auto t7 = high_resolution_clock::now();

    cout << "SoA mass sum: " << duration_cast<milliseconds>(t5-t4).count() << "ms\n";
    cout << "AoS mass sum: " << duration_cast<milliseconds>(t7-t6).count() << "ms\n";
    return 0;
}

Output (approximate, -O2, 10M elements)

Without reserve: 185ms
With reserve:    95ms   ← 2x faster
SoA mass sum: 12ms
AoS mass sum: 35ms    ← SoA 3x faster due to cache

📊 Section 8

Profiling: gprof, perf, callgrind

profiling workflowShell

# ====== gprof (function-level timing) ======
g++ -O2 -pg prog.cpp -o prog     # compile with -pg
./prog                             # run to generate gmon.out
gprof ./prog gmon.out | head -40   # top functions by time

# ====== perf (Linux — no recompile needed) ======
perf record -g ./prog              # sample with call graph
perf report                        # interactive TUI report
perf stat ./prog                   # high-level counters (IPC, cache misses)

# ====== Valgrind callgrind (per-line counts) ======
valgrind --tool=callgrind ./prog
kcachegrind callgrind.out.*        # GUI call graph viewer

# ====== Quick: time the whole program ======
time ./prog                        # wall clock time
/usr/bin/time -v ./prog            # +memory, page faults

Profile before you optimize. Typically 80% of execution time is spent in less than 20% of the code. Profiling shows you exactly which functions are hot. Optimizing cold code wastes time and hurts readability. The workflow: profile → identify the bottleneck → apply targeted optimization → profile again to verify improvement.

Best Practices Checklist

✅ Compile with -Wall -Wextra

Catches uninitialized variables, format mismatches, sign mismatches before runtime. Add -Werror in CI to force fixes.

✅ Use -fsanitize=address,undefined

Development build must include both sanitizers. Catches memory errors and undefined behavior at ~2x overhead — use in every CI pipeline.

✅ Profile before optimizing

Use gprof or perf to find the real bottleneck before writing a single optimization. Premature optimization is the root of much wasted effort.

✅ Use -O2 for production

-O2 gives 5-10x speedup over -O0 for numeric code with no correctness risk. Reach for -O3 only after profiling shows arithmetic bottlenecks.

✅ reserve() before filling vectors

If you know the final size, call reserve(n) before any push_back. Eliminates O(log n) reallocations — typically 2x faster insertion.

✅ emplace_back over push_back

emplace_back constructs directly in the container — no temporary, no move. Prefer it over push_back for non-trivial types.

✅ Smart pointers over raw new

unique_ptr makes leaks structurally impossible and eliminates the need for manual delete in destructors.

✅ Prefer std::vector over std::list

Vector's cache-friendly layout beats list's O(1) insertion in practice for almost all sizes. Use vector by default; benchmark before switching to list.

FAQ / Interview Questions

What is the difference between Valgrind and AddressSanitizer?▾

Valgrind works on any binary without recompilation — ~10-20x slowdown, detects leaks, uninitialized reads, and invalid memory access. AddressSanitizer requires recompilation (-fsanitize=address) but runs at only ~2x overhead, making it practical for CI pipelines. Use ASan for daily development; use Valgrind when you can't or don't want to recompile (third-party binaries) or need deeper analysis.

How do you debug a segmentation fault in C++?▾

Three approaches: (1) GDB: compile with -g -O0, run gdb ./prog, execute run, when it crashes type backtrace — shows the exact call stack and line. (2) ASan: compile with -fsanitize=address -g and run — ASan reports the error type (null dereference, buffer overflow), the faulting line, and the allocation site. (3) Core dump: ulimit -c unlimited, run the program, then gdb ./prog core and backtrace.

What is undefined behavior in C++ and why is it dangerous?▾

Undefined behavior means the C++ standard makes no guarantee — the program may crash, produce wrong output, appear correct in debug builds but fail in release, or behave differently between compilers. Compilers exploit UB for optimization: a loop that overflows a signed integer may have its bounds check removed because "signed overflow never happens." Detect UB with -fsanitize=undefined. Common UB: signed integer overflow, null pointer dereference, out-of-bounds array access, uninitialized variable read.

How do move semantics improve C++ performance?▾

Move semantics transfer ownership of heap resources (pointer + size) instead of copying the data. Moving a 1M-element vector is O(1) — swap three pointers; copying is O(n) — allocate new block and copy all elements. Use std::move(x) when you no longer need x. Use emplace_back to construct directly in a container without a temporary. Always mark move constructors noexcept — std::vector only uses move during reallocation when noexcept.

When should I use -O2 vs -O3?▾

Use -O2 for most production code — it's well-tested, safe, and typically provides 5-8x speedup over -O0. Use -O3 for compute-heavy numeric code (scientific computing, DSP, graphics pipelines) where aggressive vectorization and loop unrolling help. Be cautious with -O3 for code with complex control flow — larger binary size can hurt instruction cache. Always measure: compile both and benchmark; -O3 is not always faster than -O2 in practice.

Why is std::vector usually faster than std::list even for insertions?▾

Modern CPUs are 100-1000x faster than RAM access and rely on CPU caches. std::vector stores data contiguously — the CPU prefetcher loads upcoming elements before they're needed, resulting in mostly L1/L2 cache hits. std::list stores nodes in scattered heap locations — every traversal step triggers a random memory access, likely a cache miss (~200 cycles vs ~4 cycles for L1). Benchmarks consistently show vector outperforming list for element counts below millions, even when insertions/deletions are frequent, because the O(n) shift cost is paid in fast cache operations.

Related C++ Topics on CoodeVerse

Constructors & Destructors Advanced OOP Smart Pointers Move Semantics & RAII Templates STL Containers Multithreading 📚 Full C++ Course

CoodeVerse Editorial Team

Senior systems engineers and performance specialists. All benchmarks run on GCC 13, -O2, Linux x86-64. Code tested with AddressSanitizer and UBSan. About the team →

← Advanced OOP Templates & Generic Programming →

C++ Debugging & Optimization: GDB, Valgrind, ASan, Move Semantics & Profiling (2025)

⚡ Quick Answer: Debugging & Optimization in C++

Bug Types

Debug Tools

GDB Session

Sanitizers

Compiler Flags

Move Semantics

Cache & Memory

Profiling

Bug Types — 5 Categories With Real Examples

Debugging Tools Comparison

GDB Walkthrough — Real Session With Commands

GDB Command Reference

AddressSanitizer, UBSan & Valgrind

Compiler Optimization Flags (-O0 to -O3)

Move Semantics & emplace_back — Benchmarked

Cache-Friendly Code: reserve, SoA, and Memory Layout

Profiling: gprof, perf, callgrind

Best Practices Checklist

✅ Compile with -Wall -Wextra

✅ Use -fsanitize=address,undefined

✅ Profile before optimizing

✅ Use -O2 for production

✅ reserve() before filling vectors

✅ emplace_back over push_back

✅ Smart pointers over raw new

✅ Prefer std::vector over std::list

FAQ / Interview Questions

Master C++ from beginner to professional

Related C++ Topics on CoodeVerse

CoodeVerse Editorial Team