Program Analysis

Topics Static · Dynamic · Binary Analysis · Symbolic Execution · HIDS
01 //

Program Analysis Approaches

Program analysis techniques—originally from compiler research—have become essential for security. They help with testing, formal verification, and profiling. Two main dimensions to consider:

Source vs Binary
  • Source code: Easier to analyze; variable names and structure give meaning
  • Binary: Used when source is unavailable—malware, legacy apps, commercial software
Static vs Dynamic
  • Static: No execution; analyzes the code itself and can explore all possible paths
  • Dynamic: Runs the program and observes behavior; reflects what actually happens
02 //

Static Analysis

Static analysis examines code without running it. It can consider all possible execution paths, which makes it sound (no missed bugs) but often imprecise—it may report issues that cannot occur in practice (false positives).

Control Flow Graph (CFG)

The program is split into basic blocks (single-entry, single-exit chunks of code). Directed edges show possible control flow. This graph is the foundation for many analysis algorithms.

03 //

Dynamic Analysis

Dynamic analysis runs the program and observes its behavior. Each run reveals only the paths actually taken, so you need many executions to get broad coverage. The upside: observations are concrete and precise—no false positives if the environment is correct.

04 //

Binary Analysis

Binary analysis works directly on compiled executables. It is critical when source code is unavailable—for legacy systems, commercial software, and especially malware. Typical goals:

05 //

Can We Find Vulnerabilities?

Perfectly detecting all vulnerabilities is undecidable—it would require solving the halting problem. In practice we use approximations, which lead to false positives (reported bugs that aren’t real) and false negatives (missed bugs).

ProPolice (Stack Smash Protector)

Static heuristics and stack layout changes to detect buffer overflows at compile time. Effective but can produce false positives and miss some vulnerabilities.

StackGuard

Runtime protection: insert a canary value before the return address. If an overflow overwrites the return address, it also corrupts the canary, which is checked before returning—preventing exploitation even when the bug exists.

06 //

Host-based IDS

Host-based Intrusion Detection Systems (HIDS) monitor activity on a single machine. They use either misuse detection (signatures of known attacks) or anomaly detection (models of normal behavior; deviations trigger alerts). Program analysis helps build these models.

System-Call Models

HIDS often models sequences of system calls. Common representations:

  • Sliding window: Short sequences of N consecutive calls; simple but limited context
  • NFA (Nondeterministic Finite Automaton): State machine that accepts normal call sequences
  • PDA (Pushdown Automaton): Can model nested call patterns (e.g., open/close pairs)

Static vs dynamic models: Static analysis derives “all possible” valid sequences—no false positives, but some attacks may still match. Dynamic analysis learns from observed runs—more realistic but can miss attacks and may flag benign behavior as anomalous.

Mimicry attacks: An attacker can insert extra, harmless system calls so the sequence matches a learned model and evades detection.

07 //

Symbolic Execution

Symbolic execution runs a program with symbolic inputs (e.g., x0, y0) instead of concrete numbers. Variables hold expressions like x0 + y0. When execution hits a branch (e.g., if (x > 0)), the analysis records the condition and explores both paths. At the end, an SMT solver finds concrete input values that satisfy the path conditions—yielding test inputs that reach specific code.

Symbolic Store (σs)

Maps each variable to a symbolic expression. Example: x↦x0, y↦y0, z↦x0+y0. As execution proceeds, expressions grow (e.g., z↦x0+y0*2).

Path Constraint (pct)

Records conditions along the current path. At if (x > 0), one branch adds x0 > 0, the other adds x0 ≤ 0. To reach a target, we ask the SMT solver: “Find x0, y0 such that pct holds.” The solution gives concrete inputs that trigger that path.

Tools

KLEE, Java PathFinder, S2E, and CUTE combine symbolic execution with concrete runs. They explore many paths (like static analysis) but generate real test inputs (like dynamic analysis). Scaling is limited by the number of feasible paths.

08 //

Summary

Key Takeaways
  • Dynamic analysis: Observes real execution; no false positives if the environment is correct, but coverage is limited to paths that run
  • Static analysis: Can be sound (no missed bugs) but imprecise; false positives are common
  • Symbolic execution: Explores many paths and generates concrete inputs; limited by path explosion