Malware is software designed to infiltrate or damage systems. Understanding how attackers classify and hide malware helps defenders build better detection tools.
Attach to executables; spread by running infected files
Exploit network services to spread automatically
Claim to be useful software; hide malicious payload
Signature-based: byte sequences in malware. Manual analysis, reverse engineering. Syntactic signatures easily evaded.
Malware authors use various techniques to evade detection. These fall into three broad categories depending on what the defender is doing.
Polymorphism and metamorphism alter the binary so static byte signatures fail.
Anti-debugging, anti-VM, and emulator detection—malware exits or sleeps if run in a lab.
Anti-disassembly, packing, and control-flow obfuscation hide the real code.
Polymorphic malware encrypts its main body and uses a different encryption key for each copy. The decryptor (which must run in the clear) may have several variants or be obfuscated. Byte-sequence signatures on the body fail because the ciphertext changes every time.
Metamorphic malware avoids encryption entirely. Instead, it rewrites its own code so each instance looks different but behaves the same. The entire body is obfuscated through transformations like code reordering, garbage insertion, equivalent instruction replacement, jump insertion, and packing.
Identifying semantically equivalent code is undecidable. Syntactic signatures fail because the surface form changes. Semantics-based detection (e.g., behavior, data-flow) is more robust than pure syntax.
These techniques make static inspection (disassembly, decompilation) difficult or misleading.
Running malware in a sandbox (e.g., VM) defeats most anti-static techniques—the real code must execute to be observed. However, malware can detect analysis environments and refuse to run.
Hook system calls via DLL injection, a kernel driver, or the virtual machine monitor. Tools like CWSandbox and TTAnalyze capture API calls and file/network activity.
Malware detects tracer/VM and exits or sleeps. Defenders must hide the analysis environment. Rootkits can also detect kernel drivers used for tracing.
Packed malware hides its real code until runtime. Unpackers automatically reveal the hidden code so it can be analyzed statically or used for signatures.
Static model of original code; execution outside the model = unpacked region. Single-step and track EIP.
Heuristic: when freshly written memory is executed, it's unpacked code. Fine-grained (instruction) or coarse (page/syscall).
UPX, Armadillo, commercial protectors. Often include anti-debug, anti-trace, and obfuscation.
Malware often hides malicious behavior behind a trigger—a condition that may not be met during a short sandbox run. A single execution path misses the hidden code.
if (Hash(cmd)==H)—finding the right
cmd requires solving a hash, which is hard for symbolic
execution
Tools like Moser/Bitscope use an emulator, taint tracking, and path
constraints. They save/reload state and use an SMT solver to explore
alternative paths. Hash functions are non-linear, so symbolic
execution struggles to discover inputs that satisfy conditions like
Hash(input)==target. Input discovery often requires
heuristics or concrete execution.