Get rid of clichés:
"Most of anti-virus software products detect malware pieces only through simple checksums. This is often the case for the anti-virus engines which are integrated into network gateways." People mainly believe that the main reason is that network gateways have limited resources to process the sheer amount of data exchanged through it "in real time". And also due to the fact that their OS is "embedded" thus limited. So people think that the bottom line is: "Too easy to bypass !"
Let's be clear: reality is more subtle, and in some case totally different. However for an educational purpose we are going to try to understand how to bypass a simple pattern matching attempt. For the absolute beginner, let me do an analogy: A crook wants to pass the customs of an airport. Fortunately the customs' guards have a warrant with a picture of him. The criminal will avoid to go to prison by just wearing sunglasses and a black hat when he arrives at the customs barrier. Pattern matching detection bypassed !
The script kiddies often use these tricks to bypass the anti-virus software solutions. But in fact is it enough? Do we have solutions to limit or avoid this kind of tricks?
The ancestor of pattern matching is to detect an exact sequence of bytes. Of course it was improved by the creation of wildcards known as jokers like " ? " or " * ". The former allows to match any bytes and the latter acts as a repeater and matches 0 or more sequences of the preceding pattern. <br/> <br/> In this process of improving pattern matching, we have created an equivalent to regular expressions, but for binary content. Unfortunately the mouse (attacker) has other tricks in its pocket: we can evade the detection by adding fields of bytes or code snippets that do nothing between the malicious patterns. This is named "junk" code. Then the next improvement is to act as a CPU and follow the execution flow of the sample scanned. For example we can follow the jump instructions (e.g., 'call', 'jmp', ...) and in some cases avoid the junk code whose the sole purpose is to hide by creating noise. <br/> <br/> The next tactic used by authors of malware pieces needs to be explained. It is related to obfuscated executable files. This is something between a mandarin and an onion. :D<br/> Let me explain why. When you want to eat a mandarin you peel it for removing its ("layer") skin and access to its juicy segments. An obfuscated sample is an executable that is made with an obfuscation layer whose sole purpose is to hide the underneath layer. When the over-layer is processed/executed by the CPU, it reveals a second layer. This second one is the body of the virus. Or it can be another obfuscation layer like an onion that is made of several tiers.
Of course, this culinary analogy is nothing more than an analogy: in effect, the upper layer of an executable file is not physically "removed"; rather, the bytes of the upper layer are transformed into the bytes of the layer underneath, via mathematical operations applied to them by a short piece of code at the beginning of the executable file, responsible for the "peeling" (usually called "the unpacker" code).
This unpacker code can be composed of lots of steps. For the sake of simplicity we will cover the case of an unpacker code that operates a rather simple transformation on the upper layer bytes, to reveal the layer underneath. We avoid the mathematical explanation of possible transformations (subscribe to cryptanalysis course or ask to Crypto Girl ;) ). The transformation used, is composed of an xor and an addition operations. Please take a look at the following animation: <br/> <br/> First the unpacker calls a function "aa_decrypt" that applies the transformation on the address provided as "src" function's argument. The src have the value of 0x65d000 and the related data doesn't seem to be valid instructions. After the execution of the "aa_decrypt" routine, we obtain valid ones. Then the execution flow can take the path of the de-obfuscated, underneath layer located at 0x65d000.
Now you have a better idea of what is an obfuscation layer. As usual it's a cover up that can be detected specifically, for instance by targeting the unpacker code itself with pattern matching. Nevertheless there are two issues with this solution. Firstly an accuracy problem arises, several viruses of different families will be detected under the same name because they share the unpacker code. Then a possible "false-positive" can occur for the same reason. The "false-positive" is to detect as malicious a clean file: nightmare scenario for the AV industry. Some clean programs want to hide their code for copyright or intellectual property purpose, and do use obfuscation layers peeled by unpackers, that may also be used by cybercriminals.
Now the goal is to match dynamically generated malicious patterns and not static ones as previously. This technology or process is linked to "sandbox" and "emulation". We will see next time what is the meaning of those buzzwords. I agree they are not trendy as "Cloud", "APT" and the whole troop. :D
-- the Reverse naM