Binary Static Analysis - The Final Frontier
While in my hotel room in Paris during Hexacon 2022, I started drafting what would later become our binary zero-day identification feature. The idea was rather simple: executable binary goes in, vulnerabilities go out. In this blog post, I aim to provide insight into the design and implementation decisions made during the development of this feature, recognizing their potential value to security researchers in the community, as I believe transparency in our process is crucial before releasing any new functionality.
The ONEKEY Product Cybersecurity and Compliance Platform already supports a large array of firmware formats thanks to unblob and can uncover security issues in many different areas such as insecure configuration, hardcoded credentials, insecure communications, lack of binary hardening, and the zero-day identification in scripting languages such as PHP, Lua, or Python released last year.
Even if we were already covering a large area, one final frontier remained: executable binaries. Of course, we were already supplying detailed information about those binaries such as symbols, imported libraries, or binary hardening flags. This approach allows our customers to pinpoint enhancements, like enabling binary hardening flags for all their binaries, thereby increasing the cost of exploit development in the event of discovering a vulnerability in any of them.
Still, we were not looking into those binaries for dangerous code paths or insecure function calls, leaving our customers blind to some risks that may be lurking in the dark. We therefore agreed on two things with the team: we would build a framework to shed some light on those areas, and we would take our time building it properly rather than rush it since we were quite aware of existing tools limitations.
Most platforms claiming to perform “zero-day” identification have indeed some shortcomings :
- Some consider any binary calling a dangerous function such as
strcpy
to be vulnerable, even if the call is made with a constant value and the binary is not exposed to user-controlled input in any way; - Others leave the decision of which binaries should be analyzed to their end users, as if they had the time and the technical ability to find which binaries are remotely reachable;
- They very rarely take binary hardening into consideration when assessing the risks. Should a stack buffer overflow in a binary protected with NX, PIE, ASLR, Stack Canary, and full RELRO really be reported as a high severity finding?
This generates too much noise for analysts to skim through and another, more meticulous approach is needed to reduce the noise, filter out false positives, and supply a meaningful risk assessment.
Prioritizing Binaries
The platform automatically decides which binaries are analyzed by default, so you do not need to investigate yourself and click manually. The platform will select binaries for further analysis if it is remotely reachable by attackers. To the platform, remotely reachable means the executables consumes data over the network either as a client or a server, including the ones launched by front servers such as through inetd, or CGI scripts.
This way, anything that gets reported by the platform has the potential to be remotely exploitable.
This noise reduction approach is at the heart of everything we do when adding new features to the product. We will always prioritize a low false positive rate over a low false negative rate. This approach does not prevent us from being aware of its limits. We know, for example, that our current analysis pipeline will not report vulnerabilities in executable binaries that are not directly exposed but still can be exploited remotely. This includes “second order vulnerabilities” where, for example, an executable is reading from a file or shared memory created and written to by a remotely exposed binary. This is an area of improvement we want to cover in the future.
Gaining Visibility into Executable Binaries
Our first goal is to gain access to the source code that was compiled to produce the executable binary. This process is known as decompilation and has been pioneered by Dr Cristina Cifuentes. IDA Pro has been the de-facto disassembler and decompiler for many years but the last 10 years have seen the emergence of some solid competitors like radare2, Binary Ninja, Hopper, rev.ng, and Ghidra.
After some experiments, we decided to use Ghidra as our decompilation backbone for the following reasons :
- it's open source
- it's well maintained
- we're used to it
- our ability to script it in Python
The next step was to choose whether we would apply static analysis to disassembly, intermediate language (IL), or decompiled code. While we're quite familiar with Intermediate Language like ESIL and P-Code, it seemed to us that working on decompiled code would be the best way forward.
This assumption was based on a few low signal observations. Among them:
- the Ghidra2CPG approach involving generating Joern Call Property Graph from Ghidra disassembly presented by F. Yamaguchi & C. Ursache in their “Ghidra2CPG: From Graph Queries to Vulnerabilities in Binary Code” talk at NoHatCon in 2021 while interesting, has limited architecture support.
- A paper presented by Mantovani et al. in 2022 titled The Convergence of Source Code and Binary Vulnerability Discovery – A Case Study was hopeful about the capacity to close the gap between source code and decompiled code to a point where it gets usable
After backing up this theory with a few experiments of our own, we knew that we wanted to apply static code analysis to decompiled code obtained by Ghidra. For anyone working on similar analysis pipelines, we also consider Ghidra's P-Code to CPG to be an approach worth pursuing. The requirement that our end users should be able to walk through the decompiled code to assess findings led us to use decompiled code, among other things.
Enriching Ghidra Decompiled Code
One important aspect of working on decompiled code is to understand the limitations of your decompiler. On that subject, an excellent paper from Dramko et al. provides A Taxonomy of C Decompiler Fidelity Issues, which is an excellent way of representing the different issues you see in decompiled code.
Fidelity issues can be categorized as either readability issue or correctness issue. While readability is important from a manual reverser's perspective, correctness takes precedence given our purpose.
We apply different strategies to "enrich" the decompiled code. Let's explore two of them: data type recovery, and recompilable C.
Data Type Recovery
One main issue in Ghidra decompiled code is that the decompiler can be very dumb when recovering variables' data types. Even when presented with a simple C code like this:
char* my_name = "Quentin"; printf("your name: %s\n", my_name);
Ghidra will output something along the line of:
printf(DAT_14326789, DAT_14326700);
This is because type recovery for data labeled DAT_14326789
did not work, so Ghidra decompiler does not try to resolve its value when generating the decompiled code from P-Code.
It's a major issue since we want our parser to understand that DAT_14326789
is a string so that we can apply filters on those values. For a rather naive example: a sscanf
call is only problematic if the format string contains %s
. So, we need the data type to be properly recovered.
We apply this recovery pass during post-processing, before writing the decompiled code to files.
Recompilable C
Even if you end up using a fuzzy parser to ingest your decompiled C code, it’s a good idea to get rid of as many Ghidra quirks as possible. This will help the parser during the ingestion phase, improve data representation, and help you write better queries over the recovered Abstract Syntax Tree.
The fine folks at John Hopkins APL introduced the concept of "Recompilable C" in their presentation about CodeCut (editor's note: I can't find their YouTube presentation back for the life of me, so if you're the author or know where this talk is, get in touch !). By doing so, they list what would block decompiled C code from being recompiled. Additionally, they also show what can be done to improve decompiled code so that it can be ingested by a parser.
Specifically, the key issues they saw are:
- Assumes external symbols are defined elsewhere
- Ghidra intrinsics (e.g.,
SUB84
,CONCAT44
) - Undefined types, for which Ghidra usually knows the size of access (e.g.,
undefined2
,undefined4
) - Weird memory syntax for unaligned memory reads/write (e.g.,
CTS.data_5_2_ = read_frame_copy._4_2_;
)
We therefore apply different strategies to get rid of these quirks, helping the language parser in the process.
Testing our Code Scanner
So now that we have chosen our decompiler and enriched its output so that our parser can easily ingest it, we needed to test our code scanner.
The code scanner integration is unit tested and our rules are validated using integration test files. If you write a new rule, you have to write C code corresponding to positive and negative matching scenarios. This C code is compiled with GCC using different optimization levels, and each compiled ELF file is then decompiled into integration test files. This way we can immediately spot if a decompilation artifact would block a rule from working.
Further Noise Reduction
Our aim is to report issues that are remotely exploitable. For the exploitable part, we had to take care of systemic false positives that would get our customers into constant alert fatigue.
Unsafe Copy from Uncontrolled Files
One of our taint tracking sources considers files input. For example, a command injection or buffer overflow could be triggered by controlling the content of a file being read by an executable binary
Let's look at this code snippet taken from an old busybox 'arp_show' implementation:
int arp_show() { char line[256]; char ip[32]; char hw[32]; char flags[32]; char addr[32]; __stream = fopen("/proc/net/arp","r"); if (__stream == (FILE *)0x0) { puts("no proc fs mounted!"); } else { fgets(line,0x100,__stream); sscanf(line, "%s %s %s %s", ip, hw, flags, addr); } }
A naive static analysis tool would consider that you have an unsafe copy operation since you're reading 256 bytes from the file and copying unbounded content to 32 bytes long buffers.
It's bad. Or is it? The thing is that /proc/net/arp
content can't be controlled, which means there’s no way for this code path to be exploited. Other examples include unsafe data copy from /proc/partitions
.
To filter out this noise, our ruleset now verifies that any content read from a file comes from a file that can be manipulated by an unprivileged user.
Expected Behaviors
We consider environment variables as a valid source in our taint tracker. For example, CGI scripts are known to consume data sent by the HTTP server through environment variables. Something we quickly found was that we were matching on exploitable path, but these paths were expected behavior.
For example, many Linux binaries rely on the PAGER
environment variable to select which utility to launch when you run, say, git log
. Others are documented behavior, like LDAPRC
(OpenLDAP) or LIBSMB_PROG (Samba).
Since we still want to report dangerous unchecked use of environment variables, we implemented a filter-list approach, only considering environment variables not in the filter list as valid sources. In the process, we discovered that these environment variables could be abused in GTFObin scenarios, but this is something for another blog post.
Unsafe Copy, Large Enough Buffers
We also quickly figured out that naive rules to detect unsafe copy operations (e.g. strcpy
, strcat
) won't cut it if we want to limit the noise. That's why we use a combined approach to filter out scenarios where the destination buffer is dynamically allocated based on the source size, or when the source buffer size is checked against constant sizes prior to the copy operation. Making sure that not only the vulnerable code path is reachable, but also actively exploitable.
What It Looks Like
To end users, the interface looks exactly the same as for static code analysis of scripting languages. The left pane shows the propagation of user controlled input from source to sink. The right pane shows the decompiled code. The highlighted line indicates where the user controlled data is being read, manipulated, or used.
As you click through the propagators, you'll follow the tainted data right up to the location where it's used. In this example, in a system
call meaning arbitrary command injection.
What Can You Expect?
We’re slowly rolling out the feature to our customers using an opt-in approach so that customers who only want SBOM (Software Bill of Material) and compliance keep the platform as it is while others can experience the new zero-day identification feature.
The binary analysis feature currently looks for the following vulnerability classes:
- Insecure Service Launch – An insecure service launch is when a binary enables or executes a network service like telnet or SSH, allowing remote access. Usually found in hidden "debug" features of embedded devices.
- Command Injection – A command injection vulnerability occurs when untrusted user input is improperly processed by a program, allowing an attacker to execute arbitrary commands on a system.
- Stack Buffer Overflows – A stack buffer overflow occurs when more data is written to a program's stack space than it can hold, leading to the overwrite of adjacent memory and potential exploitation of the program's execution.
- Heap Buffer Overflows – A heap buffer overflow occurs when more data is written to a program's heap space than it can hold, leading to the overwrite of adjacent memory and potential exploitation of the program's execution.
- Format String - A format string vulnerability occurs when an attacker exploits unchecked user input in a program's format string parameter, potentially leading to unauthorized access, information disclosure, or arbitrary code execution.
Soon, the platform will also report:
- Insecure Communication – Spot calls to network communication libraries in an insecure manner, such as explicitly disabling certificate validation over TLS connection – or using plaintext in the first place.
Our Results So Far
So far, and in the sample set of firmwares we used for testing and evaluation of new features, we've found hundreds of vulnerabilities affecting more than 20 vendors. We're currently in the CVD process and will publish details as the bugs gets fixed.
With our upcoming advisory on Cisco WAP, you'll be able to see it in action. We were inspired by an advisory from Synacktiv and scanned the firmware to find dozens of variants.
Our plan for the near future is to expand the ruleset while lowering the signal to noise ratio, collaborate with our customers in designing custom rules adapted to their needs, and continue digging through the pile of vulnerabilities we still have to triage.
Über Onekey
EIN SCHLÜSSEL ist der führende europäische Spezialist für Product Cybersecurity & Compliance Management und Teil des Anlageportfolios von PricewaterhouseCoopers Deutschland (PwC). Die einzigartige Kombination aus einer automatisierten Product Cybersecurity & Compliance Platform (PCCP) mit Expertenwissen und Beratungsdiensten bietet schnelle und umfassende Analyse-, Support- und Verwaltungsfunktionen zur Verbesserung der Produktsicherheit und -konformität — vom Kauf über das Design, die Entwicklung, die Produktion bis hin zum Ende des Produktlebenszyklus.
KONTAKT:
Sarah Fortmann
Leiter Marketing
sara.fortmann@onekey.com
euromarcom public relations GmbH
+49 611 973 150
team@euromarcom.de
VERWANDTE FORSCHUNGSARTIKEL
Security Advisory: Unauthenticated Command Injection in Mitel IP Phones
Discover critical vulnerabilities in Mitel SIP phones that allow unauthenticated command injection. Learn how outdated input parsing can expose your devices and why it's essential to scan firmware for security risks. Protect your network with our in-depth analysis and expert takeaways.
Bereit zur automatisierung ihrer Cybersicherheit & Compliance?
Machen Sie Cybersicherheit und Compliance mit ONEKEY effizient und effektiv.