Advanced Malware Analysis

01 //

Malware Prevalence

Even legitimate sites can serve malware. Example: USAToday.com - compromised ad network served malicious JavaScript; users redirected to rogue AV; tricked into downloading malware.

Alexa Case Study

Researchers monitored Alexa top 25,000 domains (of 252M). Browser in VM visited each; traffic analyzed for drive-by downloads.

39 domains resulted in drive-by downloads
87% involved Java exploits
46% served exploits via ad networks
7.8M users served malicious content; 1.2M likely compromised

Attack Vectors

New features → new exploits (e.g. PDF with embedded malicious Flash exploiting Acrobat Reader). Legitimate sites host drive-bys. Vulnerabilities reported months prior may remain unpatched. Social engineering: geo-location, temporally relevant events (e.g. "July 4th fireworks") make attacks compelling.

02 //

Malware Evolution & Traditional Defenses

Defense-in-depth exists, but malware evolves to evade. Each layer has weaknesses.

Defense	Evasion
Firewall	C&C traffic looks like normal web traffic
IPS/IDS	Custom encodings, encryption evade payload analysis
User Access Control	Users often consent without understanding
Antivirus	Signature matching fails against obfuscation; needs heuristics

03 //

Malware Obfuscation

Packing

Parts or all of an executable are compressed, encrypted, or transformed. Unpack code is included; reverses transformation at runtime. Transformed code looks like random data; signature scanners ineffective.

Packing Flow

Program A

→

Encrypt/Compress/Transform

→

Program A'

Each packed instance looks different; decryption key random per run.

Server-Side Polymorphism

Waledac Example

Attacker server continuously sends updated obfuscated malware to compromised hosts. disc.exe (new): 28% AV detection. postcard.exe (older): 90% detection. By the time researchers analyze one variant, it's obsolete.

AV Detection Study (McAfee)

53%

Detected on first day (200K samples, 6 months)

32%

Detected with delay (~54 days avg)

15%

Still undetected after 6 months

Obfuscation Targets

Technique	Hides From
Rootkits	Users
Mapping honey pots / security sites	Security mechanisms
Nonce-based encryption	Researchers (harder cryptanalysis)

04 //

Malware Analysis

Goals: network/host detection and blocking, forensics and remediation, threat and trend analysis. Malware authors make analysis challenging; automation is essential.

Why Automation?

Hundreds of thousands of new instances daily
DIY kits, packing tools, server-side polymorphism increase volume
Collected from crawlers, mail filters, honeypots, user submissions
Manual analysis untenable

Analysis Environments & Difficulty

Easiest → Hardest

1. Fully automated · 2. Static properties · 3. Interactive behavior · 4. Manual code reversing

Most → Least Info

1. Manual reversing · 2. Interactive · 3. Static · 4. Fully automated

05 //

The Malware Uncertainty Principle

Robust analyzers (in-memory hooks, CPU emulation) are invasive. Malware can detect them and refuse to run or alter behavior. Observer affects the observed. Dynamic analyzer detection is a standard malware feature.

Formal Transparency Requirements

1

Higher privilege - analyzer has more privilege than malware
2

No non-privileged side effects - malware cannot detect analyzer without privileged ops
3

Identical instruction execution semantics - same as real hardware
4

Transparent exception handling - same as real hardware
5

Identical measure of time - timing indistinguishable

Why Existing Tools Fall Short

In-Guest / VMs

No higher privilege; side effects discoverable; exception handling issues.

Emulation (QEMU, Simics)

Different instruction semantics. Example: 16-byte x86 instruction (max 15) - bare metal throws illegal instruction; QEMU executes silently.

EQTM Undecidability

Determining whether two Turing machines have equal languages is undecidable. Cannot guarantee emulator matches hardware in all cases. Identical notion of time (network timing, covert channels) is also undecidable.

06 //

Ether Malware Analyzer

Ether fulfills transparency via hardware virtualization (Intel VT). Hypervisor has higher privilege than kernel; analyzer outside guest; minimal side effects (trap flag, syscall handling); RDTSC for time (privileged; analyzer can lie).

Architecture

Extends Xen. Ether hypervisor component + userspace in Dom0. Malware runs in DomU (Windows guest). VM Exits (hardware traps) provide visibility. Instruction-by-instruction or system-call-by-system-call examination.

EtherUnpack & EtherTrace

EtherUnpack

Extracts hidden code from obfuscated malware. Outperforms PolyUnpack, Renovo on packers like Armadillo, Obsidium, Themida, ThemidaVM.

EtherTrace

Records system calls. Norman Sandbox, Anubis often fail on packed samples; EtherTrace succeeds across packing tools.

07 //

Emulator-Based Obfuscation

Malware is transformed into bytecode (language L) + emulator (runs on x86). Obfuscated binary = emulator + bytecode. Commercial tools: VMProtect, Code Virtualizer.

Fetch-Decode-Execute

Emulator maintains VPC (Virtual Program Counter) pointing to next bytecode. Fetches bytecode → decodes opcode/operands → dispatches to execute routine (x86). Language L can be randomly generated.

Impacts on Analysis

Approach	Impact
Static (whitebox)	Bytecode is data; only emulator analyzable
Greybox	Analysis on emulator, not malware; paths explored in emulator, not actual code
Manual RE	Doesn't scale; each instance can have new L and emulator

Reverse Engineering Emulators

Challenges: Bytecode location unknown; emulator (decode/dispatch/execute) unknown; VPC may be in correlated variables. Approach: Abstract variable binding → identify candidate VPCs → identify fetch-decode-execute loop → extract bytecode syntax/semantics → build CFG. Automated; no prior knowledge of L.

08 //

Summary

Advanced Malware Analysis - Takeaways

Prevalence - legitimate sites compromised; ad networks; drive-bys; social engineering
Obfuscation - packing, polymorphism; evades signatures; server-side updates obsolete analysis
Transparency - analyzer must be invisible; higher privilege, no side effects, identical semantics/time
Ether - hardware virtualization; Xen + Intel VT; EtherUnpack, EtherTrace
Emulator obfuscation - bytecode + emulator; thwarts static/greybox; automated RE possible via VPC identification