Program Analysis | CS 6262 Network Security

01 //

Program Analysis Approaches

Program analysis techniques—originally from compiler research—have become essential for security. They help with testing, formal verification, and profiling. Two main dimensions to consider:

Source vs Binary

Source code: Easier to analyze; variable names and structure give meaning
Binary: Used when source is unavailable—malware, legacy apps, commercial software

Static vs Dynamic

Static: No execution; analyzes the code itself and can explore all possible paths
Dynamic: Runs the program and observes behavior; reflects what actually happens

02 //

Static Analysis

Static analysis examines code without running it. It can consider all possible execution paths, which makes it sound (no missed bugs) but often imprecise—it may report issues that cannot occur in practice (false positives).

Control-flow analysis: Which instructions can follow which; often represented as a Control Flow Graph (CFG)
Data-flow analysis: How values propagate through variables and memory
Control-dependence / data-dependence: Which operations depend on which branches or data

Control Flow Graph (CFG)

The program is split into basic blocks (single-entry, single-exit chunks of code). Directed edges show possible control flow. This graph is the foundation for many analysis algorithms.

03 //

Dynamic Analysis

Dynamic analysis runs the program and observes its behavior. Each run reveals only the paths actually taken, so you need many executions to get broad coverage. The upside: observations are concrete and precise—no false positives if the environment is correct.

What to observe: System calls, I/O, file access, memory allocations
Instrumentation: Insert extra code (hooks, probes) to log events or check invariants
Tools: Valgrind (memory errors), DynInst (runtime code patching)
Limitation: Only sees executed paths; hidden behavior (e.g., malware triggers) may never appear

04 //

Binary Analysis

Binary analysis works directly on compiled executables. It is critical when source code is unavailable—for legacy systems, commercial software, and especially malware. Typical goals:

Reverse engineering to understand behavior
Signature generation for detection
Identifying behavior patterns (e.g., malicious system-call sequences)

05 //

Can We Find Vulnerabilities?

Perfectly detecting all vulnerabilities is undecidable—it would require solving the halting problem. In practice we use approximations, which lead to false positives (reported bugs that aren’t real) and false negatives (missed bugs).

ProPolice (Stack Smash Protector)

Static heuristics and stack layout changes to detect buffer overflows at compile time. Effective but can produce false positives and miss some vulnerabilities.

StackGuard

Runtime protection: insert a canary value before the return address. If an overflow overwrites the return address, it also corrupts the canary, which is checked before returning—preventing exploitation even when the bug exists.

06 //

Host-based IDS

Host-based Intrusion Detection Systems (HIDS) monitor activity on a single machine. They use either misuse detection (signatures of known attacks) or anomaly detection (models of normal behavior; deviations trigger alerts). Program analysis helps build these models.

System-Call Models

HIDS often models sequences of system calls. Common representations:

Sliding window: Short sequences of N consecutive calls; simple but limited context
NFA (Nondeterministic Finite Automaton): State machine that accepts normal call sequences
PDA (Pushdown Automaton): Can model nested call patterns (e.g., open/close pairs)

Static vs dynamic models: Static analysis derives “all possible” valid sequences—no false positives, but some attacks may still match. Dynamic analysis learns from observed runs—more realistic but can miss attacks and may flag benign behavior as anomalous.

Mimicry attacks: An attacker can insert extra, harmless system calls so the sequence matches a learned model and evades detection.

07 //

Symbolic Execution

Symbolic execution runs a program with symbolic inputs (e.g., x0, y0) instead of concrete numbers. Variables hold expressions like x0 + y0. When execution hits a branch (e.g., if (x > 0)), the analysis records the condition and explores both paths. At the end, an SMT solver finds concrete input values that satisfy the path conditions—yielding test inputs that reach specific code.

Symbolic Store (σ_s)

Maps each variable to a symbolic expression. Example: x↦x0, y↦y0, z↦x0+y0. As execution proceeds, expressions grow (e.g., z↦x0+y0*2).

Path Constraint (pct)

Records conditions along the current path. At if (x > 0), one branch adds x0 > 0, the other adds x0 ≤ 0. To reach a target, we ask the SMT solver: “Find x0, y0 such that pct holds.” The solution gives concrete inputs that trigger that path.

Tools

KLEE, Java PathFinder, S2E, and CUTE combine symbolic execution with concrete runs. They explore many paths (like static analysis) but generate real test inputs (like dynamic analysis). Scaling is limited by the number of feasible paths.

08 //

Summary

Key Takeaways

Dynamic analysis: Observes real execution; no false positives if the environment is correct, but coverage is limited to paths that run
Static analysis: Can be sound (no missed bugs) but imprecise; false positives are common
Symbolic execution: Explores many paths and generates concrete inputs; limited by path explosion