Malware Analysis | CS 6262 Network Security

01 //

Malware Categories

Malware is software designed to infiltrate or damage systems. Understanding how attackers classify and hide malware helps defenders build better detection tools.

Viruses – Infect legitimate programs; replicate when the host runs
Worms – Spread over the network without user action
Trojans – Disguised as legitimate software; users install them willingly
Botnets – Networks of compromised machines controlled remotely
Spyware / adware – Harvest data or push unwanted content

Viruses

Attach to executables; spread by running infected files

Worms

Exploit network services to spread automatically

Trojans

Claim to be useful software; hide malicious payload

02 //

Traditional Detection

Signature-based: byte sequences in malware. Manual analysis, reverse engineering. Syntactic signatures easily evaded.

03 //

Evasion Methods

Malware authors use various techniques to evade detection. These fall into three broad categories depending on what the defender is doing.

Against signature detection: Polymorphism, metamorphism—change the binary so byte-pattern signatures no longer match
Against dynamic analysis: Anti-debugging, anti-VM, emulator detection—detect sandbox environments and refuse to run
Against static analysis: Anti-disassembly, packing, control-flow obfuscation—make code hard to inspect without executing it

Vs Signature

Polymorphism and metamorphism alter the binary so static byte signatures fail.

Vs Dynamic

Anti-debugging, anti-VM, and emulator detection—malware exits or sleeps if run in a lab.

Vs Static

Anti-disassembly, packing, and control-flow obfuscation hide the real code.

04 //

Polymorphic Code

Polymorphic malware encrypts its main body and uses a different encryption key for each copy. The decryptor (which must run in the clear) may have several variants or be obfuscated. Byte-sequence signatures on the body fail because the ciphertext changes every time.

Encrypted body – The real payload is encrypted; each instance uses a different key
Varying decryptor – A few decryptor variants, or obfuscation, make it harder to signature
Why signatures fail – The encrypted bytes change; the decryptor may be small and variable

Detection Approaches

Signature the decryptor – Works if the decryptor is not heavily obfuscated
Emulation – Run the decryptor in a safe emulator; once decrypted, the body can be scanned
Malware response – Many polymorphic samples use anti-emulation to refuse to run in sandboxes

05 //

Metamorphic Code

Metamorphic malware avoids encryption entirely. Instead, it rewrites its own code so each instance looks different but behaves the same. The entire body is obfuscated through transformations like code reordering, garbage insertion, equivalent instruction replacement, jump insertion, and packing.

Polymorphic vs metamorphic: Polymorphic encrypts the body; metamorphic rewrites it in-place
Transformations: Register renaming, no-op insertion, instruction reordering, equivalent opcode substitution
Example: W32/Simile used register renaming, no-op insertion, and instruction reordering to generate thousands of unique variants

Why Detection Is Hard

Identifying semantically equivalent code is undecidable. Syntactic signatures fail because the surface form changes. Semantics-based detection (e.g., behavior, data-flow) is more robust than pure syntax.

06 //

Anti-Static Analysis

These techniques make static inspection (disassembly, decompilation) difficult or misleading.

Anti-disassembly: Mix code and data so the disassembler misinterprets bytes; use indirect jumps so control flow is unclear; exploit variable-length x86 instructions so alignment is ambiguous
Self-modifying code: Generate or decrypt new code at runtime—the real code is invisible until execution
Packing: Encapsulate the real payload in a compressed or encrypted form; unpack only when the program runs

07 //

Dynamic Analysis

Running malware in a sandbox (e.g., VM) defeats most anti-static techniques—the real code must execute to be observed. However, malware can detect analysis environments and refuse to run.

VM detection: Nopill, Vmdetect, Redpill—check CPU flags, timing, or artifacts that betray a hypervisor
Debugger detection: IsDebuggerPresent, timing checks—detect if a debugger is attached and exit or behave differently
System-call tracing: DLL hooking, kernel drivers, or VMM-level interception (e.g., CWSandbox, TTAnalyze) to log behavior

Tracing Approaches

Hook system calls via DLL injection, a kernel driver, or the virtual machine monitor. Tools like CWSandbox and TTAnalyze capture API calls and file/network activity.

Evasion

Malware detects tracer/VM and exits or sleeps. Defenders must hide the analysis environment. Rootkits can also detect kernel drivers used for tracing.

08 //

Unpackers

Packed malware hides its real code until runtime. Unpackers automatically reveal the hidden code so it can be analyzed statically or used for signatures.

PolyUnpack: Builds a static model of the original executable; when execution diverges (new code runs that wasn't in the model), that code is considered unpacked—single-step and follow the instruction pointer (EIP)
Renovo / OmniUnpack: Detect when code that was written to memory is then executed—that is the unpacked payload; fine-grained (per instruction) or coarse (per page or syscall)
Common packers: UPX, Armadillo, commercial protectors—often add anti-debug, anti-trace, and obfuscation

PolyUnpack

Static model of original code; execution outside the model = unpacked region. Single-step and track EIP.

Renovo / OmniUnpack

Heuristic: when freshly written memory is executed, it's unpacked code. Fine-grained (instruction) or coarse (page/syscall).

Packers

UPX, Armadillo, commercial protectors. Often include anti-debug, anti-trace, and obfuscation.

09 //

Trigger-based Behavior

Malware often hides malicious behavior behind a trigger—a condition that may not be met during a short sandbox run. A single execution path misses the hidden code.

Timebombs: Act only after a date or delay; sandbox may not run long enough
Logic bombs: Variable-based conditions (e.g., "run only if file X exists")
Bot commands: Malicious behavior triggered by remote commands; no command = benign behavior
Conditional obfuscation (Sharif et al.): The key or condition is not in the code; e.g., if (Hash(cmd)==H)—finding the right cmd requires solving a hash, which is hard for symbolic execution

Multipath Exploration

Tools like Moser/Bitscope use an emulator, taint tracking, and path constraints. They save/reload state and use an SMT solver to explore alternative paths. Hash functions are non-linear, so symbolic execution struggles to discover inputs that satisfy conditions like Hash(input)==target. Input discovery often requires heuristics or concrete execution.

10 //

Summary

Key Takeaways

Polymorphic: Encrypted body, varying keys; decryptor may be obfuscated; signatures fail on the body
Metamorphic: Full code rewrite; semantics preserved; equivalence undecidable; syntax-based signatures fail
Unpackers: PolyUnpack (static model divergence), Renovo/OmniUnpack (write-then-execute heuristic)
Behavioral detection: Runtime templates and system-call monitoring; evaded by trigger-based and conditional behavior