Hello from the unblob development team ! We’ve been hard at work these last few months since we first introduced unblob to the world at Blackhat Arsenal and DEFCON Demo Labs.
First, the team joined the Infosec Campus Podcast episode 41 for a discussion on unblob and security tools development in general. Marton shared some details about unblob architecture at Hacktivity Budapest and Quentin had the opportunity to present it with a focus on security hardening at Black Alps.
We’re also very happy to see real world adoption of unblob. The EMBA maintainers were the first to integrate it into their framework. SEC-Consult researchers also started using it during their tests. We’ve also seen a few hints on Twitter that people adopted it as a first step in their vulnerability research.
As the project continues to grow, we wanted to share a few things with you.
A Long and Dangerous Path to Proper Tar Extraction
Our search for a proper tar extractor finally concluded. While most of our handlers extractors did not move a bit, we had some issues with the one we assigned to the tar handler.
We started off with GNU tar, but quickly found out that it was vulnerable to path traversal through symlink (this was prior to the global fix). 7zip seemed to be the obvious choice, so we switched. Turns out 7zip has a very interesting way of extracting symlinks. For 7zip, a symlink is a plaintext file holding the target path as text. This is known behavior and they don’t plan on changing it.
At that time we needed a quick solution so we moved to Python’s tarfile module. Big mistake. Turns out tarfile is vulnerable to straight up path traversals. We still don’t understand why the Python maintainers did not fix this but anyway. phLaul mentioned it to Quentin in the Hexacon lobby, so he drafted a PR on the train home.
Our implementation now simply inherits
TarFile from the
tarfile library and overloads the
extract function to add some defensive programming against path traversals. A re-worked example is shown below:
import os from pathlib import Path from tarfile import TarFile def is_safe_path(basedir: Path, path: Path) -> bool: try: basedir.joinpath(path).resolve().relative_to(basedir.resolve()) except ValueError: return False return True class SafeTarFile(TarFile): def extract(self, member, path="", set_attrs=True, *, numeric_owner=False): path_as_path = Path(str(path)) member_name_path = Path(str(member.name)) if not is_safe_path(path_as_path, member_name_path): logger.warn("traversal attempt", path=member_name_path) return super().extract(member, path, set_attrs, numeric_owner=numeric_owner)
At last, we have a working tar extractor !
Lifting Unblob’s Architecture Constraint
We knew Hyperscan was limited to x86/x86-64 platforms from the beginning but did not care that much since our environment runs on amd64 architecture. Soon after the public release we got our first requests for ARM 64bit support and we started devising ways of doing it with vectorscan.
It took some time to mature, but regressions introduced by the hyperscan Python library kickstarted the project to write our own wrapper around hyperscan/vectorscan. vlaci single handedly released a Python wrapper around the hyperscan/vectorscan library using CPython and Rust named pyperscan.
We moved to pyperscan in early December, meaning that unblob can now run on Apple M1 or Linux on aarch64 with similar performances. We’re currently packaging everything to make sure you can have a smooth installation experience.
Lifting Unblob’s Size Constraint
We recently identified that unblob could not work with files larger than 4GB. This is due to the fact that Hyperscan uses 32bit integer for offsets, and therefore mess up scanning for files larger than 2^32 bytes.
We lifted that constraint by moving from vectored mode scanning to streaming mode scanning. A byproduct of that change is a significant speed improvement in the order of 15% decreased processing time.
Binwalk Forever Day
We reported a path traversal bug affecting binwalk’s PFS extractor on October 26th through a pull request. No news since then, so be aware of firmwares you don’t know ! This bug can lead to remote command execution as demonstrated on stage at Black Alps by Quentin.
We noted that some projects like EMBA started backporting the patch, so different revisions of binwalk will probably start to appear here and there.
Small Changes, Big Impact
Of course those big improvements should not overshadow all the nice improvements we added to the framework over the last 3 months:
- We fixed our CPIO heuristics
- We fixed Jefferson
- We improved handling of truncated tar archives
- We added support for ZIP64
- We improved initrd extraction from kernels
- We squashed a bug in our XZ handler
- We got rid of an infinite loop in zlib handler
Our Wish List for 2023
As the year comes to a close, here’s our wish list for 2023:
- improved compression stream handlers
- first external contributor
- better documentation
- better console output
Until then, we wish you a very pleasant end of year. To a healthy 2023 full of extracted firmwares !