Advanced Malware Analysis

Topics Prevalence · Obfuscation · Transparency · Ether · Emulator Obfuscation
01 //

Malware Prevalence

Even legitimate sites can serve malware. Example: USAToday.com - compromised ad network served malicious JavaScript; users redirected to rogue AV; tricked into downloading malware.

Alexa Case Study

Researchers monitored Alexa top 25,000 domains (of 252M). Browser in VM visited each; traffic analyzed for drive-by downloads.

  • 39 domains resulted in drive-by downloads
  • 87% involved Java exploits
  • 46% served exploits via ad networks
  • 7.8M users served malicious content; 1.2M likely compromised
Attack Vectors

New features → new exploits (e.g. PDF with embedded malicious Flash exploiting Acrobat Reader). Legitimate sites host drive-bys. Vulnerabilities reported months prior may remain unpatched. Social engineering: geo-location, temporally relevant events (e.g. "July 4th fireworks") make attacks compelling.

02 //

Malware Evolution & Traditional Defenses

Defense-in-depth exists, but malware evolves to evade. Each layer has weaknesses.

Defense Evasion
Firewall C&C traffic looks like normal web traffic
IPS/IDS Custom encodings, encryption evade payload analysis
User Access Control Users often consent without understanding
Antivirus Signature matching fails against obfuscation; needs heuristics
03 //

Malware Obfuscation

Packing

Parts or all of an executable are compressed, encrypted, or transformed. Unpack code is included; reverses transformation at runtime. Transformed code looks like random data; signature scanners ineffective.

Packing Flow
Program A
Encrypt/Compress/Transform
Program A'

Each packed instance looks different; decryption key random per run.

Server-Side Polymorphism

Waledac Example

Attacker server continuously sends updated obfuscated malware to compromised hosts. disc.exe (new): 28% AV detection. postcard.exe (older): 90% detection. By the time researchers analyze one variant, it's obsolete.

AV Detection Study (McAfee)

53%

Detected on first day (200K samples, 6 months)

32%

Detected with delay (~54 days avg)

15%

Still undetected after 6 months

Obfuscation Targets

Technique Hides From
Rootkits Users
Mapping honey pots / security sites Security mechanisms
Nonce-based encryption Researchers (harder cryptanalysis)
04 //

Malware Analysis

Goals: network/host detection and blocking, forensics and remediation, threat and trend analysis. Malware authors make analysis challenging; automation is essential.

Why Automation?
  • Hundreds of thousands of new instances daily
  • DIY kits, packing tools, server-side polymorphism increase volume
  • Collected from crawlers, mail filters, honeypots, user submissions
  • Manual analysis untenable

Analysis Environments & Difficulty

Easiest → Hardest

1. Fully automated · 2. Static properties · 3. Interactive behavior · 4. Manual code reversing

Most → Least Info

1. Manual reversing · 2. Interactive · 3. Static · 4. Fully automated

05 //

The Malware Uncertainty Principle

Robust analyzers (in-memory hooks, CPU emulation) are invasive. Malware can detect them and refuse to run or alter behavior. Observer affects the observed. Dynamic analyzer detection is a standard malware feature.

Formal Transparency Requirements
  1. 1
    Higher privilege - analyzer has more privilege than malware
  2. 2
    No non-privileged side effects - malware cannot detect analyzer without privileged ops
  3. 3
    Identical instruction execution semantics - same as real hardware
  4. 4
    Transparent exception handling - same as real hardware
  5. 5
    Identical measure of time - timing indistinguishable

Why Existing Tools Fall Short

In-Guest / VMs

No higher privilege; side effects discoverable; exception handling issues.

Emulation (QEMU, Simics)

Different instruction semantics. Example: 16-byte x86 instruction (max 15) - bare metal throws illegal instruction; QEMU executes silently.

EQTM Undecidability

Determining whether two Turing machines have equal languages is undecidable. Cannot guarantee emulator matches hardware in all cases. Identical notion of time (network timing, covert channels) is also undecidable.

06 //

Ether Malware Analyzer

Ether fulfills transparency via hardware virtualization (Intel VT). Hypervisor has higher privilege than kernel; analyzer outside guest; minimal side effects (trap flag, syscall handling); RDTSC for time (privileged; analyzer can lie).

Architecture

Extends Xen. Ether hypervisor component + userspace in Dom0. Malware runs in DomU (Windows guest). VM Exits (hardware traps) provide visibility. Instruction-by-instruction or system-call-by-system-call examination.

EtherUnpack & EtherTrace

EtherUnpack

Extracts hidden code from obfuscated malware. Outperforms PolyUnpack, Renovo on packers like Armadillo, Obsidium, Themida, ThemidaVM.

EtherTrace

Records system calls. Norman Sandbox, Anubis often fail on packed samples; EtherTrace succeeds across packing tools.

07 //

Emulator-Based Obfuscation

Malware is transformed into bytecode (language L) + emulator (runs on x86). Obfuscated binary = emulator + bytecode. Commercial tools: VMProtect, Code Virtualizer.

Fetch-Decode-Execute

Emulator maintains VPC (Virtual Program Counter) pointing to next bytecode. Fetches bytecode → decodes opcode/operands → dispatches to execute routine (x86). Language L can be randomly generated.

Impacts on Analysis

Approach Impact
Static (whitebox) Bytecode is data; only emulator analyzable
Greybox Analysis on emulator, not malware; paths explored in emulator, not actual code
Manual RE Doesn't scale; each instance can have new L and emulator
Reverse Engineering Emulators

Challenges: Bytecode location unknown; emulator (decode/dispatch/execute) unknown; VPC may be in correlated variables. Approach: Abstract variable binding → identify candidate VPCs → identify fetch-decode-execute loop → extract bytecode syntax/semantics → build CFG. Automated; no prior knowledge of L.

08 //

Summary

Advanced Malware Analysis - Takeaways
  • Prevalence - legitimate sites compromised; ad networks; drive-bys; social engineering
  • Obfuscation - packing, polymorphism; evades signatures; server-side updates obsolete analysis
  • Transparency - analyzer must be invisible; higher privilege, no side effects, identical semantics/time
  • Ether - hardware virtualization; Xen + Intel VT; EtherUnpack, EtherTrace
  • Emulator obfuscation - bytecode + emulator; thwarts static/greybox; automated RE possible via VPC identification

Further Reading

Formal framework for transparent analysis; five requirements; Xen extension; EtherUnpack/EtherTrace evaluation. http://ether.gtisc.gatech.edu/