Malware
From Geost to Locker: Monitoring the Evolution of Android Malware Obfuscation
We looked into the evolution of an Android malware's obfuscation methods through samples nearly a year apart, Geost and Locker. Adding context to this discussion is the discovery that the authors of the malware used an external obfuscation service.
In 2019, I looked into Geost, an Android trojan with interesting layers of obfuscation. This entry serves to show how its obfuscation method has evolved by comparing my findings from 2019 with new samples from 2020. It is also part of a larger research endeavor done with Masarah Paquet-Clouston, Maria Jose Erquiaga, and Sebastian Garcia.
Our joint investigation started with researchers looking into the activity of an Android trojan botnet. They discovered that the adversary group was in fact using an external service for APK obfuscation that had not been observed before. They then scrutinized this service by understanding its usage and uncovering its clients. We share the findings from our joint investigation in a paper and three blog entries.
This blog describes the technical aspect of the obfuscation methods that were used and the evolution of the obfuscator‘s code over time as reflected in different samples. Meanwhile, the other blogs can be found in GoSecure and Stratosphere. The other two blogs focus on topics surrounding the obfuscation method, including its discovery, effectivity in hiding malware, and economic value for adversaries.
As mentioned earlier, I first analyzed a Geost sample in 2019. Nearly a year later, I analyzed newer samples as part of the joint investigation. The second time, I analyzed three applications that were obfuscated with the same service: a Locker application, an SMS stealer, and a piece of adware. This blog concentrates on the differences between the Geost malicious application from 2019 and the Locker-obfuscated application from 2020. These differences show the evolution of the code generated by the obfuscation service in approximately a year.
It is evident in the preceding figure that all symbol names were obfuscated. The previous figure also shows the first level of obfuscation that I encountered. Interestingly, in one package, original class names were replaced by consecutively generated strings starting from one letter (such as “a” to “z” ) to two letters (such as “aa”, “ab”, and so on), with the last in the package ending in “fm”. The rest of the class names were replaced by randomly generated strings with lengths ranging from six to 12 characters, as in the first stage. It is probable that the change was done by the service obfuscator. It is also likely that symbols that were already in the former malicious APK had been obfuscated prior to the external obfuscation.
Strings in the malicious Geost sample were replaced by scrambled byte arrays; indeed, this is another level of obfuscation. A descrambling method is inserted in each class that formerly contained at least one string. This method always has three arguments that define the parameters of a descrambling algorithm; however, arguments are randomly ordered in each method to complicate the descrambling task. As a consequence, a single descrambling Python script did not suffice, and modifications had to be made for each class. An example of a descrambling method is illustrated in Figure 2.
This approach causes the task of reverse-engineering the second stage to be at the same level, given that it is at a higher complexity than the first. The same approach also proved to be an effective way of preventing unwanted information leaks compared to other malicious APK samples.
Higher amount of junk code
The amount of junk code in samples from mid-2020 is significantly higher than the amount in Geost samples from nearly a year earlier. This impression is supported by the numbers in Table 1. In addition to few dropper code changes, the number of junk classes and lines of junk code have both significantly increased.
|
Dropper + Reflection Code |
Attached Libraries |
Total |
||||||
Sample |
Classes |
Lines |
Size (kB) |
Classes |
Lines |
Size (kB) |
Classes |
Lines |
Size (kB) |
Geost |
52 |
12098 |
751 |
115 |
4553 |
145 |
167 |
16651 |
896 |
Locker |
363 |
20979 |
1009 |
352 |
22087 |
691 |
715 |
43066 |
1701 |
SMS_1 |
357 |
20657 |
1010 |
1424 |
86543 |
2970 |
1781 |
107200 |
3981 |
SwimmingPool |
366 |
21585 |
1043 |
352 |
22092 |
697 |
718 |
43677 |
1740 |
Table 1. A comparison of the complexities of different dropper APK samples
Encrypted malicious APK file name
In the case of the Geost sample, the encrypted second stage DEX file is stored in a file named “.cache” that is in turn stored in the root APK directory. This file is renamed to “.localsinfotimestamp1494987116” before being decrypted and stored in the “ydxwlab.jar” file. In all the newer samples, there is also a directory with the name “tracks” in the APK root directory, as seen Figure 3. This directory contains several randomly named files.
Files smaller than 100 bytes contain randomly generated strings. The file “online.m3u” contains the path “tracks/radio.ogg”. This radio.ogg file contains an encrypted malicious payload.
Notably, here there is also a small variation from the previous Geost sample, as there is now a very simple header that is prepended to the encrypted data. This header contains the length encoded in little-endian four bytes, as indicated in Figure 4. Also, the name of the temporary decrypted DEX file has been modified to “skjoxawp.jar”.
String concatenation method
Even though all strings used in the first stage are encrypted, for some reason three of the strings are split into two parts and then concatenated by an obscure method that I have named combineStrings. This method concatenates strings taken in the first and third argument only if the second argument is greater than zero. The split strings are “forName”, “android.app.Instrumentation”, and “newApplication”. This detail shows how the authors of the obfuscation code are especially cautious with these strings. All three strings are used in the core function, which loads and launches the second stage‘s code, as seen in Figure 5.
Here, the difference between the older and newer sample is evident only in the branch of code where the second argument is less than or greater to zero (see Figure 6), which is never really executed. As a result, the reason for the change is unknown.
Invoke method evolution
As with my previous entry, part of the code obfuscation is done by replacing certain system method calls with Java Reflection equivalents. In this method, the string arguments for reflection calling are encrypted. Figure 7 shows an example of this method.
I have named this stage invokeMethod. The main task of this method is to validate arguments and call correspondent class method.invoke(). In the new samples, this method is much more complex when compared with the version used for Geost. It is one of the main proofs that the code for the dropper is still evolving, and that the actors behind this service are still investing in code development.
The difference in complexity between both versions is illustrated in Figures 8 to 11. Figure 8 shows the invokeMethod of Geost, while Figures 9 to 11 show the code of the same method for Locker.
False flag code
Among the parts of the dropper code that were already described, the false flag code is probably the least important. Still, it is worth mentioning since surprisingly, changes were made on the new samples. All the strings in the dropper are encrypted and the symbol names are obfuscated, with the exception of the HttpURLConnection class and method symbols in the Geost sample. After deobfuscation, it is clear that related parts of code are never executed. This could be an additional diversionary tactic that was devised the author, meant to redirect the analyst’s attention to a false flag. This false flag mimics what appears to be the command-and-control (C&C) opening sequence, as shown in Figure 12.
This code was placed in invokeMethod() in the Geost sample. It should also be noted that the newer samples don’t use HttpURLConnection class. Instead, they use the SQLiteQueryBuilder class with other related methods, placing the false flag in the deleteFile()method, as seen in Figure 13.
Potential for automated payload decryption
All the strings in the first stage, together with the payload, are encrypted with the RC4 algorithm with the decryption key hardcoded. This is also true for all analyzed samples, both new and old. This also allows scripted payload decryption and further analysis — that is, after the key is found.
Fortunately, finding the key is not difficult, as there is a declaration of a byte[] variable with immediate initialization array values that contain the said key. Aside from the one byte[] variable, however, I did not find another case of such a declaration. The rest of the array of variables initialized at declaration are of int type; these are mostly byte array initialization literals. Nonetheless, these are placed directly as string constructor arguments.
Conclusion
The analysis of the samples that were obfuscated by the automated obfuscation service proves that the actors behind this service are still investing in its improvement. Thankfully, the changes in the code seem to have little to no impact yet on the quality of the obfuscation and final detection rate. Aside from the changes in the dropper code, the amount of inserted junk code has significantly increased. The file that contains the payload has been enriched by the change in header — possibly as an attempt to improve decryption reliability or to make decryption of the payload more complicated. Both the location and the name of the payload have also been changed to mimic a set of audio track files.
Despite these changes, characteristics of both the old and new samples make them possible and easy to decrypt. The fixed name of the directory and payload files, for example, makes the retrohunting of malicious files simple. Strings from the malicious payload that are stored in the APK resource files also make detection and retrohunting easier as these strings allow payload family classification.
However, despite these pitfalls, it is still worth monitoring the possible evolution of this obfuscation service. Thousands of samples obfuscated by this service also attest to its usability. It is, after all, easy to use and decreases the chances of detecting the obfuscated APKs. Generally, therefore, it succeeds in its primary goal of hiding malicious content.
This blogpost is the result of a collaboration among Masarah Paquet-Clouston from GoSecure, Vit Sembera, from Trend Micro, as well as Maria Jose Erquiaga and Sebastian Garcia from the Stratosphere Laboratory.