Even legitimate sites can serve malware. Example: USAToday.com - compromised ad network served malicious JavaScript; users redirected to rogue AV; tricked into downloading malware.
Researchers monitored Alexa top 25,000 domains (of 252M). Browser in VM visited each; traffic analyzed for drive-by downloads.
New features → new exploits (e.g. PDF with embedded malicious Flash exploiting Acrobat Reader). Legitimate sites host drive-bys. Vulnerabilities reported months prior may remain unpatched. Social engineering: geo-location, temporally relevant events (e.g. "July 4th fireworks") make attacks compelling.
Defense-in-depth exists, but malware evolves to evade. Each layer has weaknesses.
| Defense | Evasion |
|---|---|
| Firewall | C&C traffic looks like normal web traffic |
| IPS/IDS | Custom encodings, encryption evade payload analysis |
| User Access Control | Users often consent without understanding |
| Antivirus | Signature matching fails against obfuscation; needs heuristics |
Parts or all of an executable are compressed, encrypted, or transformed. Unpack code is included; reverses transformation at runtime. Transformed code looks like random data; signature scanners ineffective.
Each packed instance looks different; decryption key random per run.
Attacker server continuously sends updated obfuscated malware to compromised hosts. disc.exe (new): 28% AV detection. postcard.exe (older): 90% detection. By the time researchers analyze one variant, it's obsolete.
Detected on first day (200K samples, 6 months)
Detected with delay (~54 days avg)
Still undetected after 6 months
| Technique | Hides From |
|---|---|
| Rootkits | Users |
| Mapping honey pots / security sites | Security mechanisms |
| Nonce-based encryption | Researchers (harder cryptanalysis) |
Goals: network/host detection and blocking, forensics and remediation, threat and trend analysis. Malware authors make analysis challenging; automation is essential.
1. Fully automated · 2. Static properties · 3. Interactive behavior · 4. Manual code reversing
1. Manual reversing · 2. Interactive · 3. Static · 4. Fully automated
Robust analyzers (in-memory hooks, CPU emulation) are invasive. Malware can detect them and refuse to run or alter behavior. Observer affects the observed. Dynamic analyzer detection is a standard malware feature.
No higher privilege; side effects discoverable; exception handling issues.
Different instruction semantics. Example: 16-byte x86 instruction (max 15) - bare metal throws illegal instruction; QEMU executes silently.
Determining whether two Turing machines have equal languages is undecidable. Cannot guarantee emulator matches hardware in all cases. Identical notion of time (network timing, covert channels) is also undecidable.
Ether fulfills transparency via hardware virtualization (Intel VT). Hypervisor has higher privilege than kernel; analyzer outside guest; minimal side effects (trap flag, syscall handling); RDTSC for time (privileged; analyzer can lie).
Extends Xen. Ether hypervisor component + userspace in Dom0. Malware runs in DomU (Windows guest). VM Exits (hardware traps) provide visibility. Instruction-by-instruction or system-call-by-system-call examination.
Extracts hidden code from obfuscated malware. Outperforms PolyUnpack, Renovo on packers like Armadillo, Obsidium, Themida, ThemidaVM.
Records system calls. Norman Sandbox, Anubis often fail on packed samples; EtherTrace succeeds across packing tools.
Malware is transformed into bytecode (language L) + emulator (runs on x86). Obfuscated binary = emulator + bytecode. Commercial tools: VMProtect, Code Virtualizer.
Emulator maintains VPC (Virtual Program Counter) pointing to next bytecode. Fetches bytecode → decodes opcode/operands → dispatches to execute routine (x86). Language L can be randomly generated.
| Approach | Impact |
|---|---|
| Static (whitebox) | Bytecode is data; only emulator analyzable |
| Greybox | Analysis on emulator, not malware; paths explored in emulator, not actual code |
| Manual RE | Doesn't scale; each instance can have new L and emulator |
Challenges: Bytecode location unknown; emulator (decode/dispatch/execute) unknown; VPC may be in correlated variables. Approach: Abstract variable binding → identify candidate VPCs → identify fetch-decode-execute loop → extract bytecode syntax/semantics → build CFG. Automated; no prior knowledge of L.
Formal framework for transparent analysis; five requirements; Xen extension; EtherUnpack/EtherTrace evaluation. http://ether.gtisc.gatech.edu/