Threat Research

Android byte-code obfuscation challenge

By Axelle Apvrille | July 30, 2012

DexLabs' @thuxnder has recently posted a challenge for Android which is both interesting as a challenge and as a PoC, because it shows how to fool Dex disassemblers.

Basically, his strategy consists in using a branch condition, opaque but always true in reality that jumps over the next instruction which is a fill-array-data-payload Dalvik instruction. Then, after the fill-array-data-payload, there are further Dalvik instructions.

Most disassemblers disassemble one instruction after the other, and hence understand the final instructions as meaningless array data, not as instructions. Such obfuscation strategies have already been used on x86, but they work in our case because Dalvik disassemblers are yet in their early ages.

Please see @thuxnder's write up and illustrations for more information.

However, there are several different algorithms to disassemble code, and for instance, it would be possible to notice that the first branch is always true, so that the second instruction is never executed, and consequently the latter code is interpreted as real instructions.

Fortunately, IDA Pro more or less understands the code correctly. Figure 1, below, shows what IDA Pro understands of an obfuscated method named download().

There is an initial test, if-eq, which is always true and jumps to a bunch of bytes. Hit key C to tell IDA Pro that those bytes are Dalvik instructions and not pure data, and see the difference at Figure 2.

Figure 1. De-obfuscated methods with IDA Pro

Figure 2. Dalvik byte-code is now readable

The challenge consists in finding a password (see Figure 3).

Figure 3. Main screen of the challenge when launched.

Thanks to @thuxnder for letting me post on this interesting challenge, but now please be aware that the following of the post spoils the fun of the challenge: if you intend to try the challenge, stop reading here and give it a try first.

We inspect methods of the DropActivity class, as that's where the interesting code lies. There are 7 methods: down_exec, download, exec, genchecksum, onCreate, onCreateOptionsMenu and test.

As Dexlab's writeup mentioned the exec() method, that's where I started my analysis. The de-obfuscated code shows that the method:

  1. builds a path string /temp/file

  2. loads the Dex at this address

  3. loads a class named 'bad'

  4. gets a method named 'getFlag' in that class (that does not show on the illustration, but believe me it does it after :)

  5. invokes the method, which returns a string

Of course, I was extremely curious to know what Dex file the challenge was loading. Moreover, note the challenge is named 'dexdropper': org.dexlabs.poc.dexdropper. So, I then looked at the download() method.

The method builds a URL to which it appends the filename 'payload.apk'.

invoke-direct                   {v10, v8}, (ref) imp. @ _def_String__init_@VL>
new-instance                    v11,
new-instance                    v12,
invoke-static                   {v10},
move-result-object              v13
invoke-direct                   {v12, v13}, (ref) imp. @ _def_StringBuilder__init_@VL>
const-string                    v13, aPayload_apk # "payload.apk"
invoke-virtual                  {v12, v13},

But where is the beginning of the URL? It is decrypted just before:

; initializing the encrypted array data
  new-array                       v7, v11,
  fill-array-data                 v7, encrypted_arraydata
; decrypting each byte of the URL.
; v5 is the loop index: goes from 0 to the length of the array
; v7 is the encrypted array
  array-length                    v11, v7
  if-lt                           v5, v11, decrypt_url
; performs an XOR with 0x23
  aget-byte                       v11, v7, v5
  xor-int/lit8                    v11, v11, 0x23
  int-to-byte                     v11, v11
  aput-byte                       v11, v8, v5
  add-int/lit8                    v5, v5, 1
  goto                            loop_decrypt_url

At Figure 4, we apply a simple IDC script to XOR the encrypted array and reveal the cleartext URL

Figure 4. Applying an XOR script to reveal the plaintext URL.

So, we download another package from, and analyze it of course. Actually, the payload is very small and carries no interesting information apart from a bad.getFlag() method that returns the string "payload has been downloaded and executed". It's a challenge, not a malware :) Good.

So, basically, we know what the challenge is meant to do: download a payload, execute it, and display that string. But how can we trigger this functionality? Obviously, we've got to find a password.

If we inspect genchecksum(), we see that it computes and returns the MD5 hash of a file. At this point, I can't resist saying that MD5 is crap, but that it's not a checksum but a hash function. Nevertheless, the caller of genchecksum() is down_exec(). This method puts pieces together: it downloads the payload (download()), computes the MD5 hash (genchecksum()), if the hash matches the expected value, executes it (exec()).

Where is the password? To be perfectly honest, I then analyzed the test() method. Test functions make me suspicious ;)

Figure 5. De-obfuscated code of method test()

The topmost lines of Figure 5 show some encrypted data which is being written to a byte array. Then, oh hooo, this byte array is being decrypted. It is not a simple XOR, but a chained XOR: you XOR the first byte with 0x00, and then, all other bytes are chained with the previous encrypted byte. In cryptology, this is called Cipher-block chaining (CBC).

I wrote a quick XOR program and decrypted the value:

I downloaded that file and read the following message:

$ cat flag_dexloaderobf_07201
this is not the password you are looking for ;)

if you are interested in Android RE and obfuscation techniques, send us an email

Argh! So, this is not the password :(

Let's look again at Figure 5. Once the array has been decrypted, the plaintext is put in v1 and ... not used. But we see the method gets the password we entered in the text field and compares it with an expected value stored in DropActivity_exi__.

This field is filled in the class constructor - a 5-byte array - but it is encrypted :(

const/4                         v0, 5
new-array                       v0, v0,
fill-array-data                 v0, arraydata_3597C
iput-object                     v0, this, DropActivity_exi___

It is decrypted in the onCreate() method. The algorithm consists, again, in an XOR with, as key, the length of the password, i.e 0x5.

  iget-object                     v1, this, DropActivity_exi___
  array-length                    v1, v1
  if-lt                           v0, v1, decrypt_password ; decrypt each byte

  iget-object                     v1, this, DropActivity_exi___
  iget-object                     v2, this, DropActivity_exi___
  aget-byte                       v2, v2, v0
  iget-object                     v3, this, DropActivity_exi___
  array-length                    v3, v3
  xor-int/2addr                   v2, v3 ; XOR with the length of the array
  int-to-byte                     v2, v2
  aput-byte                       v2, v1, v0
  add-int/lit8                    v0, v0, 1 ; increment counter
  goto                            loop_decrypt_pass

So, you simply have to XOR with 0x05 the ciphertext 0x73, 0x6a, 0x62, 0x64, 0x77. Result is: vogar. That's a small town in Iceland.

Figure 6. Challenge is solved!

Apart from the fun, this challenge also demonstrates a quite simple way to fool most disassemblers. IDA Pro is not completely fooled, and since the challenge has been released, Androguard has been patched.


-- the Crypto Girl

Update July 30th: a few update thanks to @thuxnder. The important Dalvik instruction is fill-array-data-payload (not fill-array-data).

Join the Discussion