Program analysis techniques—originally from compiler research—have become essential for security. They help with testing, formal verification, and profiling. Two main dimensions to consider:
Static analysis examines code without running it. It can consider all possible execution paths, which makes it sound (no missed bugs) but often imprecise—it may report issues that cannot occur in practice (false positives).
The program is split into basic blocks (single-entry, single-exit chunks of code). Directed edges show possible control flow. This graph is the foundation for many analysis algorithms.
Dynamic analysis runs the program and observes its behavior. Each run reveals only the paths actually taken, so you need many executions to get broad coverage. The upside: observations are concrete and precise—no false positives if the environment is correct.
Binary analysis works directly on compiled executables. It is critical when source code is unavailable—for legacy systems, commercial software, and especially malware. Typical goals:
Perfectly detecting all vulnerabilities is undecidable—it would require solving the halting problem. In practice we use approximations, which lead to false positives (reported bugs that aren’t real) and false negatives (missed bugs).
Static heuristics and stack layout changes to detect buffer overflows at compile time. Effective but can produce false positives and miss some vulnerabilities.
Runtime protection: insert a canary value before the return address. If an overflow overwrites the return address, it also corrupts the canary, which is checked before returning—preventing exploitation even when the bug exists.
Host-based Intrusion Detection Systems (HIDS) monitor activity on a single machine. They use either misuse detection (signatures of known attacks) or anomaly detection (models of normal behavior; deviations trigger alerts). Program analysis helps build these models.
HIDS often models sequences of system calls. Common representations:
Static vs dynamic models: Static analysis derives “all possible” valid sequences—no false positives, but some attacks may still match. Dynamic analysis learns from observed runs—more realistic but can miss attacks and may flag benign behavior as anomalous.
Mimicry attacks: An attacker can insert extra, harmless system calls so the sequence matches a learned model and evades detection.
Symbolic execution runs a
program with symbolic inputs (e.g., x0, y0) instead of
concrete numbers. Variables hold expressions like
x0 + y0. When execution hits a branch (e.g.,
if (x > 0)), the analysis records the condition and
explores both paths. At the end, an SMT solver finds
concrete input values that satisfy the path conditions—yielding test
inputs that reach specific code.
Maps each variable to a symbolic expression. Example:
x↦x0, y↦y0, z↦x0+y0. As
execution proceeds, expressions grow (e.g.,
z↦x0+y0*2).
Records conditions along the current path. At
if (x > 0), one branch adds x0 > 0, the
other adds x0 ≤ 0. To reach a target, we ask the SMT
solver: “Find x0, y0 such that pct holds.” The solution gives
concrete inputs that trigger that path.
KLEE, Java PathFinder, S2E, and CUTE combine symbolic execution with concrete runs. They explore many paths (like static analysis) but generate real test inputs (like dynamic analysis). Scaling is limited by the number of feasible paths.