Latest Developments in Unblob | Research

Resources

Research

Unblob 2024 Highlights: Sandboxing, Reporting, and Community Milestones

Quentin Kaiser

Lead Security Researcher

December 5, 2024

min read

TablE of contents

Unblob & Friends

True Sandboxing with Landlock

Improved Reporting of Carves

Better Randomness Measurement with χ²

Integration in Binary Ninja

Unblob & Academic Research

New Format Handlers

Sunsetting Python 3.8

Internship Program

What to Expect in 2025

READY TO UPGRADE YOUR RISK MANAGEMENT?

Make cybersecurity and compliance efficient and effective with ONEKEY.

Book a Demo

Times flies ! It's been almost a year since we've talked about unblob. We've been busy this year mostly around improved reporting and sandboxing of unblob operations. We were not very present at conferences and workshops, but we'll try to make a come back sooner or later.

We've went past our 2,000 stars objective on Github from last year but are stuck there. Don't hesitate to star the project since it helps the project to gain in visibility !

In the meantime, let's have a look at what happened in unblob land in 2024.

Unblob & Friends

First and foremost, we wanted to express our gratitude to people spreading the unblob's gospel. Specifically, we wanted to thank:

all the contributors that reported issues, opened pull requests or simply discussed all things unblob
the FACT maintainer team for their recognition of unblob's usefulness and questions about integration in FACT core.
Eloïse Brocas from Quarkslab for demonstrating unblob during her "Exploring Firmwares" workshop at hack.lu 2024
Brandon Miller from Vector35 for integrating unblob in Binary Ninja

Again, if you used unblob in your work, hobby, or in workshops tell us at research@onekey.com ! Even if you used it during that one pentest we want to know :)

True Sandboxing with Landlock

Last year we introduced the

FileSystem

FileSystem API in order to protect against path traversals and file writes outside the extraction directory. Even if valid, this approach is limited by the fact that it only works if the handler developer is willing to use it the right way. We still strongly recommend using it if you develop handlers, as it makes traversal attempts visible in logs and report file.

We were longing for stronger sandboxing and we found it with landlock. Landlock, as described by chrysn who initially suggested it to us in 2023 is:

[A] recent Linux access control mechanism similar to OpenBSD's unveil. [It] allows the process to take away its own privileges to the file system (or other system calls), and limit itself to writing to the files that it intends to write to (eg. the JSON report and the unpacking file). Unlike chroot, it requires no elevated privileges; plus it can be configured to run on a best-effort base in case the requirements on a relatively recent kernel are not acceptable.

Getting landlock support in unblob was made in two parts: first, we introduced it in unblob-native (our Rust based implementation for low-level and memory intensive operations), then we made unblob use the sandboxing API exposed by unblob-native.

Right now, the following limitations are applied to unblob and all subprocesses launched by unblob:

read access to / (python, shared libraries, extractor binaries and so on)
read-write access to /dev/shm (multi-processing)
read-write access to the extraction directory
directory creation on the extraction directory parent
read-write on the log file
read-write on the report file

With all of these limitations, the blast radius of a third-party extractor going haywire is severely limited. Of course, this sandboxing is transparently enabled only on Linux systems running on a kernel that supports the landlock API. For others, you'll see a message in the log if unblob cannot activate that feature.

If you want to learn more about landlock, have a look at the kernel documentation or read through the landlock.io material from l0kod.

Improved Reporting of Carves

Unblob has kind of two modes of operations when it's done scanning a file. Either the whole file is of a specific format and the extractor is called on it, extracting the content to a dedicated extraction directory OR each identified chunk in the file is carved out to a dedicated carve directory by unblob itself.

These directories, which are technically "carving directories" were not reported by unblob because they were not relevant to us. However, this caused problems for researchers who needed to have reporting information about everything created on the filesystem.

We therefore introduced carving reporting and the ability to make extract and carve suffixes configurable. With these changes, the "picture" provided by the JSON report is complete. Since the output directory can change depending on whether or not the source file is made of multiple chunks, the directory path is provided as part of summary output on the console.

Better Randomness Measurement with χ²

Entropy calculation using Shannon entropy is performed on unknown chunks (i.e. chunks that do not match known formats) so that users can assess whether it's a highly compressed or encrypted chunk. It's been there in unblob since the early days but an acquaintance recently introduced us to χ², which is better at differentiating between compressed and encrypted data.

χ² tests are effective for distinguishing compressed from encrypted data because they evaluate the uniformity of byte distributions more rigorously than Shannon entropy.

In compressed files, bytes often cluster around certain values due to patterns that still exist (albeit less detectable), resulting in a non-uniform distribution. Encrypted data, by contrast, exhibits nearly perfect uniformity, as each byte value from 0–255 is expected to appear with almost equal frequency, making it harder to detect any discernible patterns.

The χ² distribution is calculated for the stream of bytes in the chunk and expressed as an absolute number and a percentage which indicates how frequently a truly random sequence would exceed the value calculated. The percentage is the only value that is of interest from unblob's perspective, so that's why we only return it.

According to ent doc:

We [can] interpret the percentage as the degree to which the sequence tested is suspected of being non-random. If the percentage is greater than 99% or less than 1%, the sequence is almost certainly not random. If the percentage is between 99% and 95% or between 1% and 5%, the sequence is suspect. Percentages between 90% and 95% and 5% and 10% indicate the sequence is “almost suspect”.

We introduced χ² computation for unknown chunks in unblob and renamed the

EntropyReport

EntropyReport to

RandomnessReport

RandomnessReport since it now contains information about Shannon entropy and probability. The

--entropy-depth

--entropy-depth command line switch and

AnalysisConfig

AnalysisConfig setting has been replaced with

randomness-depth

randomness-depth.

You can see how effective that measure is in the screenshots below. Here's the probability for an english text:

‍

Here it is for the same file but XOR'ed with a16 bytes key. See how this has no impact on X² while it raises the Shannon entropy ?

‍

We see more movement when we compress the file with gzip:

‍

But the pattern is different than when we apply proper encryption:

Math is magic !

Integration in Binary Ninja

The fine folks at Binary Ninja released Blob Extractor, a Binary Ninja plugin that leverages the Unblob API to identify and extract compressed archives, file-systems, and other blobs embedded in container binaries such as flash dumps or firmware images.

I think the plugin developers describes best the use case here:

A common workflow for firmware reverse engineering is starting with a raw flash dump that was extracted via hardware reverse engineering techniques. As you know, raw flash dumps often consist of multiple container formats. Consider embedded Linux: within a single flash dump you might have a u-boot bootloader, a compressed init RAM disk CPIO, a Linux kernel, a squashfs compressed root file system and more. To access the various programs you wish to reverse engineer, you would use a tool like Unblob or Binwalk to extract to a directory, and then need to manually load the files you care about into a program like Binary Ninja for analysis.

By integrating this capability into Binary Ninja we can leverage the unblob API to extract to a temporary directory, and then allow the user to multi-select files they care about and import them into a Binja project for reverse engineering (using Binja's API). It also provides some other helper features like the ability to click on extracted text files to open in an external text editor so you can quickly view configuration files. The goal is to accelerate manual reverse engineering of firmware flash dumps and to help look at the system as a whole, as opposed to independent programs.

We were not aware this was in the works and it was a really pleasant surprise. It really demonstrates the power of our API. If you have ideas or are actively working on a project that leverages the Unblob API, talk to us.

Unblob & Academic Research

Unblob was actively used by the academic research community in 2024.

Specifically, it was used as the extraction step in ERS0: Enhancing Military Cybersecurity with AI-Driven SBOM for Firmware Vulnerability Detection and Asset Management, cited as "The community uses tools that unpack known formats, \[...\] Yet, the challenge persists as formats and encryption changes." in Mens Sana In Corpore Sano: Sound Firmware Corpora for Vulnerability Research (an excellent paper from Fraunhofer FKIE and Osnabruck University by the way).

Finally, it appeared in Harden-IoT: hardening the EoL devices by intercepting the attack vector for future B5G/6G IoT: "in this module, the firmware image is received as the input. Instead of using traditional tools such as binwalk, we use the more powerful unpacking tool unblob to extract the file system (usually named rootfs) from the original firmware".

Thanks for putting your trust in unblob, let us know if you use it in academic settings and need assistance.

New Format Handlers

The steady flow of handlers being released to the public has decreased a bit in 2024, but we still released a few new format:
‍

SquashFS version 1
Autel ECC Firmwares, based on sector7-nl initial work
YAFFS with page size of 2032 bytes (yes that exists)

Sunsetting Python 3.8

Python 3.8 reached EOL on 2024-10-07. Our dependencies will likely drop support for that version as well, so it will be a burden to keep-up support for it sooner or later. We released version 24.12.4 today, which is the last release that will support Python 3.8.

In the meantime, we're already supporting Python 3.13 that will be supported up to October 2029.

Internship Program

We're trying our best to setup some kind of internship program for unblob where CS students can join the unblob maintainer team and either focus on writing handlers or help improve unblob core. Something similar to Google Summer of Code.

In 2023, Antoine joined us and released a steady flow of new handlers (almost one every week over the course of his 14 weeks internship !).

In 2025, Raphael will join us so expect a bunch of handlers to drop between February and May.

What to Expect in 2025

In 2025 we hope to make the API documentation even easier to use and navigate. We want to improve error reporting when running in non-verbose mode. As explained earlier, a bunch of handlers should probably drop at some point, thanks to our intern.

Also, we've seen a lot of love for Kaitai to define firmware structures and we completely understand as it's something we considered prior to going with

dissect.cstruct

dissect.cstruct. We may go with a

KaitaiHandler

KaitaiHandler similar to what we did with

StructHandler

StructHandler so that it's even easier to port extractors to unblob.

Stay safe and enjoy the rest of the year.

The unblob maintainer team.

About Onekey

ONEKEY is the leading European specialist in Product Cybersecurity & Compliance Management and part of the investment portfolio of PricewaterhouseCoopers Germany (PwC). The unique combination of the automated ONEKEY Product Cybersecurity & Compliance Platform (OCP) with expert knowledge and consulting services provides fast and comprehensive analysis, support, and management to improve product cybersecurity and compliance from product purchasing, design, development, production to end-of-life.