While in my hotel room in Paris during Hexacon 2022, I started drafting what would later become our binary zero-day identification feature. The idea was rather simple: executable binary goes in, vulnerabilities go out. In this blog post, I aim to provide insight into the design and implementation decisions made during the development of this feature, recognizing their potential value to security researchers in the community, as I believe transparency in our process is crucial before releasing any new functionality.

The ONEKEY Product Cybersecurity and Compliance Platform already supports a large array of firmware formats thanks to unblob and can uncover security issues in many different areas such as insecure configuration, hardcoded credentials, insecure communications, lack of binary hardening, and the zero-day identification in scripting languages such as PHP, Lua, or Python released last year.

Even if we were already covering a large area, one final frontier remained: executable binaries. Of course, we were already supplying detailed information about those binaries such as symbols, imported libraries, or binary hardening flags. This approach allows our customers to pinpoint enhancements, like enabling binary hardening flags for all their binaries, thereby increasing the cost of exploit development in the event of discovering a vulnerability in any of them.

An executable binary is a file that holds instructions in a format that the computer’s CPU can directly understand and execute. This file typically has a specific format, such as PE on Windows or ELF on Linux. The file is usually obtained by compiling code from languages like C or C++.

Still, we were not looking into those binaries for dangerous code paths or insecure function calls, leaving our customers blind to some risks that may be lurking in the dark. We therefore agreed on two things with the team: we would build a framework to shed some light on those areas, and we would take our time building it properly rather than rush it since we were quite aware of existing tools limitations.

Most platforms claiming to perform “zero-day” identification have indeed some shortcomings : 

This generates too much noise for analysts to skim through and another, more meticulous approach is needed to reduce the noise, filter out false positives, and supply a meaningful risk assessment. 

Prioritizing Binaries 

The platform automatically decides which binaries are analyzed by default, so you do not need to investigate yourself and click manually. The platform will select binaries for further analysis if it is remotely reachable by attackers. To the platform, remotely reachable means the executables consumes data over the network either as a client or a server, including the ones launched by front servers such as through inetd, or CGI scripts.

This way, anything that gets reported by the platform has the potential to be remotely exploitable. 

This noise reduction approach is at the heart of everything we do when adding new features to the product. We will always prioritize a low false positive rate over a low false negative rate. This approach does not prevent us from being aware of its limits. We know, for example, that our current analysis pipeline will not report vulnerabilities in executable binaries that are not directly exposed but still can be exploited remotely. This includes “second order vulnerabilities” where, for example, an executable is reading from a file or shared memory created and written to by a remotely exposed binary. This is an area of improvement we want to cover in the future. 

Gaining Visibility into Executable Binaries 

Our first goal is to gain access to the source code that was compiled to produce the executable binary. This process is known as decompilation and has been pioneered by Dr Cristina Cifuentes. IDA Pro has been the de-facto disassembler and decompiler for many years but the last 10 years have seen the emergence of some solid competitors like radare2, Binary Ninja, Hopper, rev.ng, and Ghidra.

Decompilation is the process of reversing the compilation process. When a program is written in a high-level programming language like C++ or Java, it needs to be translated into machine code (or an intermediate code) that the computer’s CPU can execute. This translation is done through a process called compilation, resulting in an executable binary file.

Decompilation, on the other hand, involves taking that compiled binary and trying to convert it back into a higher-level programming language or something close to it. It’s like reverse engineering the original source code from the compiled program.

After some experiments, we decided to use Ghidra as our decompilation backbone for the following reasons :

The next step was to choose whether we would apply static analysis to disassembly, intermediate language (IL), or decompiled code. While we’re quite familiar with Intermediate Language like ESIL and P-Code, it seemed to us that working on decompiled code would be the best way forward.

An Intermediate Language (IL) representation, also known as Intermediate Code or Intermediate Representation, is an abstraction used in the compilation process of programming languages. It sits between the high-level source code written by the programmer and the low-level machine code executed by the computer’s CPU.

This assumption was based on a few low signal observations. Among them:

After backing up this theory with a few experiments of our own, we knew that we wanted to apply static code analysis to decompiled code obtained by Ghidra. For anyone working on similar analysis pipelines, we also consider Ghidra’s P-Code to CPG to be an approach worth pursuing. The requirement that our end users should be able to walk through the decompiled code to assess findings led us to use decompiled code, among other things.

Enriching Ghidra Decompiled Code 

One important aspect of working on decompiled code is to understand the limitations of your decompiler. On that subject, an excellent paper from Dramko et al. provides A Taxonomy of C Decompiler Fidelity Issues, which is an excellent way of representing the different issues you see in decompiled code. 

Fidelity issues can be categorized as either readability issue or correctness issue. While readability is important from a manual reverser’s perspective, correctness takes precedence given our purpose. 

We apply different strategies to “enrich” the decompiled code. Let’s explore two of them: data type recovery, and recompilable C.

Data Type Recovery 

One main issue in Ghidra decompiled code is that the decompiler can be very dumb when recovering variables’ data types. Even when presented with a simple C code like this:

char* my_name = "Quentin";
printf("your name: %s\n", my_name);

Ghidra will output something along the line of:

printf(DAT_14326789, DAT_14326700); 

This is because type recovery for data labeled DAT_14326789 did not work, so Ghidra decompiler does not try to resolve its value when generating the decompiled code from P-Code.

It’s a major issue since we want our parser to understand that DAT_14326789 is a string so that we can apply filters on those values. For a rather naive example: a sscanf call is only problematic if the format string contains %s. So, we need the data type to be properly recovered. 

We apply this recovery pass during post-processing, before writing the decompiled code to files. 

Recompilable C 

Even if you end up using a fuzzy parser to ingest your decompiled C code, it’s a good idea to get rid of as many Ghidra quirks as possible. This will help the parser during the ingestion phase, improve data representation, and help you write better queries over the recovered Abstract Syntax Tree. 

The fine folks at John Hopkins APL introduced the concept of “Recompilable C” in their presentation about CodeCut (editor’s note: I can’t find their YouTube presentation back for the life of me, so if you’re the author or know where this talk is, get in touch !). By doing so, they list what would block decompiled C code from being recompiled. Additionally, they also show what can be done to improve decompiled code so that it can be ingested by a parser.

Specifically, the key issues they saw are: 

We therefore apply different strategies to get rid of these quirks, helping the language parser in the process.

Testing our Code Scanner

So now that we have chosen our decompiler and enriched its output so that our parser can easily ingest it, we needed to test our code scanner. 

The code scanner integration is unit tested and our rules are validated using integration test files. If you write a new rule, you have to write C code corresponding to positive and negative matching scenarios. This C code is compiled with GCC using different optimization levels, and each compiled ELF file is then decompiled into integration test files. This way we can immediately spot if a decompilation artifact would block a rule from working.

Further Noise Reduction

Our aim is to report issues that are remotely exploitable. For the exploitable part, we had to take care of systemic false positives that would get our customers into constant alert fatigue. 

Unsafe Copy from Uncontrolled Files 

One of our taint tracking sources considers files input. For example, a command injection or buffer overflow could be triggered by controlling the content of a file being read by an executable binary  

Let’s look at this code snippet taken from an old busybox ‘arp_show’ implementation: 

int arp_show() {
    char line[256];
    char ip[32];
    char hw[32];
    char flags[32]; 
    char addr[32]; 

    __stream = fopen("/proc/net/arp","r"); 

    if (__stream == (FILE *)0x0) { 
        puts("no proc fs mounted!"); 
    } 
    else { 
        fgets(line,0x100,__stream); 
        sscanf(line, "%s %s %s %s", ip, hw, flags, addr);
    }
}

A naive static analysis tool would consider that you have an unsafe copy operation since you’re reading 256 bytes from the file and copying unbounded content to 32 bytes long buffers. 

It’s bad. Or is it? The thing is that /proc/net/arp content can’t be controlled, which means there’s no way for this code path to be exploited. Other examples include unsafe data copy from /proc/partitions

To filter out this noise, our ruleset now verifies that any content read from a file comes from a file that can be manipulated by an unprivileged user.

Expected Behaviors

We consider environment variables as a valid source in our taint tracker. For example, CGI scripts are known to consume data sent by the HTTP server through environment variables. Something we quickly found was that we were matching on exploitable path, but these paths were expected behavior. 

For example, many Linux binaries rely on the PAGER environment variable to select which utility to launch when you run, say, git log. Others are documented behavior, like LDAPRC  (OpenLDAP) or LIBSMB_PROG (Samba).

Since we still want to report dangerous unchecked use of environment variables, we implemented a filter-list approach, only considering environment variables not in the filter list as valid sources. In the process, we discovered that these environment variables could be abused in GTFObin scenarios, but this is something for another blog post.

Unsafe Copy, Large Enough Buffers 

We also quickly figured out that naive rules to detect unsafe copy operations (e.g. strcpy, strcat) won’t cut it if we want to limit the noise. That’s why we use a combined approach to filter out scenarios where the destination buffer is dynamically allocated based on the source size, or when the source buffer size is checked against constant sizes prior to the copy operation. Making sure that not only the vulnerable code path is reachable, but also actively exploitable.

What It Looks Like 

To end users, the interface looks exactly the same as for static code analysis of scripting languages. The left pane shows the propagation of user controlled input from source to sink. The right pane shows the decompiled code. The highlighted line indicates where the user controlled data is being read, manipulated, or used.

As you click through the propagators, you’ll follow the tainted data right up to the location where it’s used. In this example, in a system call meaning arbitrary command injection.

What Can You Expect? 

We’re slowly rolling out the feature to our customers using an opt-in approach so that customers who only want SBOM (Software Bill of Material) and compliance keep the platform as it is while others can experience the new zero-day identification feature. 

The binary analysis feature currently looks for the following vulnerability classes: 

Soon, the platform will also report: 

Our Results So Far 

So far, and in the sample set of firmwares we used for testing and evaluation of new features, we’ve found hundreds of vulnerabilities affecting more than 20 vendors. We’re currently in the CVD process and will publish details as the bugs gets fixed.

With our upcoming advisory on Cisco WAP, you’ll be able to see it in action. We were inspired by an advisory from Synacktiv and scanned the firmware to find dozens of variants.

Our plan for the near future is to expand the ruleset while lowering the signal to noise ratio, collaborate with our customers in designing custom rules adapted to their needs, and continue digging through the pile of vulnerabilities we still have to triage.