ClamAV released a critical patch a few days ago with fixes for two vulnerabilities reported by Simon Scannell:

The description of those bugs got our attention since we have format handlers in unblob for both DMG and HFS+. We therefore decided to spend some time trying to understand them and learn if we may be affected by similar bugs.

To do so, we performed patch diffing by comparing ClamAV version 1.0.0 and 1.0.1, downloaded from their release page on Github. The fix is not yet visible on their git history so we had to do it manually.


The first bug, CVE-2023-20052, is fixed with this patch:

< #define DMG_XML_PARSE_OPTS ((1 << 1 | 1 << 11 | 1 << 16) | CLAMAV_MIN_XMLREADER_FLAGS)

The fix simply removes the XML_PARSE_NOENT flag from the libxml2 parsing options. This flag controls whether or not the parser is allowed to perform entity substitutions and can lead to XML External Entity Injection (XXE) if left enabled.

This is a very common mistake, especially with a counter-intuitive name like “NOENT” introduced in 2016.

Exploitation Strategy

DMG files are composed of a “Data fork” containing disk blocks, followed by a Property List and a trailer. Here’s an excerpt from our DMG handler in unblob with some nice ASCII art:

# NOTE: the koly block is a trailer
# ┌─────────────┐ │
# │Data fork    │ │DataForkLength
# │contains     │ │
# │disk blocks  │ │
# │             │ │
# │             │ ▼
# ├─────────────┤ │
# │XML plist    │ │XMLLength
# ├─────────────┤ ▼
# │koly trailer │
# └─────────────┘

To exploit this vulnerability, you would need to place your malicious XXE payload within the property list to make libxml2 substitute and resolve external entities.

Exploitation Limitations

Since the XML_PARSE_NONET flag is set, the parser will not be able to establish outbound connections. This means that exfiltrating local file content with this bug is not possible. This is the error message that you get when you try anyway:

clamscan ../../samples/malicious.dmg 
Loading:    16s, ETA:   0s [========================>]    8.65M/8.65M sigs       
Compiling:   3s, ETA:   0s [========================>]       41/41 tasks 

I/O error : Attempt to load network entity
/home/quentin/research/clamav/samples/malicious.dmg: OK

----------- SCAN SUMMARY -----------
Known viruses: 8653236
Engine version: 1.0.0
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 2.00 MB
Data read: 0.02 MB (ratio 127.75:1)
Time: 21.642 sec (0 m 21 s)
Start Date: 2023:02:20 17:14:15
End Date:   2023:02:20 17:14:36

At this point, it is still unclear how one can leak content remotely with this bug, but the investigation is ongoing. We’re exploring two ideas at the moment: verbose logging and substituted XML content being written to temporary files.

Edit 21/02/2023: it’s possible to leak local files if the binary runs with --debug enabled. In the excerpt below we make it dump the content of /etc/passwd through XXE:

clamscan --debug malicious.dmg
LibClamAV debug: clean_cache_add: f7e8ac0c33c257d878344560fcc78d40 (level 0)
LibClamAV debug: cli_scandmg: Matched blkx
LibClamAV debug: cli_scandmg: wanted blkx, text value is root:x:0:0:root:/root:/bin/bash


This bug is a heap buffer overflow affecting the hfsplus_fetch_node function. CalmAV fixed it by expanding the function signature with a buffSize integer value, which is used later in the code to perform a size check.

static cl_error_t hfsplus_fetch_node(cli_ctx *ctx, hfsPlusVolumeHeader *volHeader, hfsHeaderRecord *catHeader,
                                     hfsHeaderRecord *extHeader, hfsPlusForkData *catFork, uint32_t node, uint8_t *buff)

static cl_error_t hfsplus_fetch_node(cli_ctx *ctx, hfsPlusVolumeHeader *volHeader, hfsHeaderRecord *catHeader,
                                     hfsHeaderRecord *extHeader, hfsPlusForkData *catFork, uint32_t node, uint8_t *buff, size_t buffSize);

Additionally, a size check within the function body (line 89 to 92) was included:

/* Fetch a node's contents into the buffer */
static cl_error_t hfsplus_fetch_node(cli_ctx *ctx, hfsPlusVolumeHeader *volHeader, hfsHeaderRecord *catHeader,
                                     hfsHeaderRecord *extHeader, hfsPlusForkData *catFork, uint32_t node, uint8_t *buff,
                                     size_t buffSize)
    bool foundBlock = false;
    uint64_t catalogOffset;
    uint32_t startBlock, startOffset;
    uint32_t endBlock, endSize;
    uint32_t curBlock;
    uint32_t extentNum = 0, realFileBlock;
    uint32_t readSize;
    size_t fileOffset = 0;
    uint32_t searchBlock;
    uint32_t buffOffset = 0;


    /* Make sure node is in range */
    if (node >= catHeader->totalNodes) {
        cli_dbgmsg("hfsplus_fetch_node: invalid node number " STDu32 "\n", node);
        return CL_EFORMAT;

    /* Need one block */
    /* First, calculate the node's offset within the catalog */
    catalogOffset = (uint64_t)node * catHeader->nodeSize;
    /* Determine which block of the catalog we need */
    startBlock  = (uint32_t)(catalogOffset / volHeader->blockSize);
    startOffset = (uint32_t)(catalogOffset % volHeader->blockSize);
    endBlock    = (uint32_t)((catalogOffset + catHeader->nodeSize - 1) / volHeader->blockSize);
    endSize     = (uint32_t)(((catalogOffset + catHeader->nodeSize - 1) % volHeader->blockSize) + 1);
    cli_dbgmsg("hfsplus_fetch_node: need catalog block " STDu32 "\n", startBlock);
    if (startBlock >= catFork->totalBlocks || endBlock >= catFork->totalBlocks) {
        cli_dbgmsg("hfsplus_fetch_node: block number invalid!\n");
        return CL_EFORMAT;

    for (curBlock = startBlock; curBlock <= endBlock; ++curBlock) {

        foundBlock  = false;
        searchBlock = curBlock;
        /* Find which extent has that block */
        for (extentNum = 0; extentNum < 8; extentNum++) {
            hfsPlusExtentDescriptor *currExt = &(catFork->extents[extentNum]);

            /* Beware empty extent */
            if ((currExt->startBlock == 0) || (currExt->blockCount == 0)) {
                cli_dbgmsg("hfsplus_fetch_node: extent " STDu32 " empty!\n", extentNum);
                return CL_EFORMAT;
            /* Beware too long extent */
            if ((currExt->startBlock & 0x10000000) && (currExt->blockCount & 0x10000000)) {
                cli_dbgmsg("hfsplus_fetch_node: extent " STDu32 " illegal!\n", extentNum);
                return CL_EFORMAT;
            /* Check if block found in current extent */
            if (searchBlock < currExt->blockCount) {
                cli_dbgmsg("hfsplus_fetch_node: found block in extent " STDu32 "\n", extentNum);
                realFileBlock = currExt->startBlock + searchBlock;
                foundBlock    = true;
            } else {
                cli_dbgmsg("hfsplus_fetch_node: not in extent " STDu32 "\n", extentNum);
                searchBlock -= currExt->blockCount;

        if (foundBlock == false) {
            cli_dbgmsg("hfsplus_fetch_node: not in first 8 extents\n");
            cli_dbgmsg("hfsplus_fetch_node: finding this node requires extent overflow support\n");
            return CL_EFORMAT;

        /* Block found */
        if (realFileBlock >= volHeader->totalBlocks) {
            cli_dbgmsg("hfsplus_fetch_node: block past end of volume\n");
            return CL_EFORMAT;
        fileOffset = realFileBlock * volHeader->blockSize;
        readSize   = volHeader->blockSize;

        if (curBlock == startBlock) {
            fileOffset += startOffset;
        } else if (curBlock == endBlock) {
            readSize = endSize;

        if ((buffOffset + readSize) > buffSize) {
            cli_dbgmsg("hfsplus_fetch_node: Not enough space for read\n");
            return CL_EFORMAT;

        if (fmap_readn(ctx->fmap, buff + buffOffset, fileOffset, readSize) != readSize) {
            cli_dbgmsg("hfsplus_fetch_node: not all bytes read\n");
            return CL_EFORMAT;
        buffOffset += readSize;

    return CL_CLEAN;

Without this protection, the call to fmap_readn on line 94 could lead to a heap buffer overflow. The function simply reads (readSize) bytes from (ctx->fmap) fmap at (fileOffset) offset into (buff+buffOffset) destination buffer.

The destination buffer is a node allocated in hfsplus_walk_catalog like this:

static cl_error_t hfsplus_walk_catalog(cli_ctx *ctx, hfsPlusVolumeHeader *volHeader, hfsHeaderRecord *catHeader,                                                                      
hfsHeaderRecord *extHeader, hfsHeaderRecord *attrHeader, const char *dirname)                                                                  
// ---snip---
uint8_t *nodeBuf                = NULL;
nodeLimit = MIN(catHeader->totalNodes, HFSPLUS_NODE_LIMIT);
thisNode  = catHeader->firstLeafNode;
nodeSize  = catHeader->nodeSize;
nodeBuf = cli_malloc(nodeSize);
// --snip--

cli_malloc is a simple wrapper around malloc with some sanity checks:

void *cli_malloc(size_t size)
	void *alloc;

    if(!size || size > CLI_MAX_ALLOCATION) {
	cli_errmsg("cli_malloc(): Attempt to allocate %lu bytes. Please report to\n", (unsigned long int) size);
	return NULL;

    alloc = malloc(size);

    if(!alloc) {
	cli_errmsg("cli_malloc(): Can't allocate memory (%lu bytes).\n", (unsigned long int) size);
	return NULL;
    } else return alloc;

This vulnerability provides rather strong exploitation primitive as it allows the attacker to control:

More details about the HFS+ format can be found on Apple’s website at

Exploitation Strategy

We have yet to explore the actual exploitation on a default installation of ClamAV. Please note that clamscan and libclamav are hardened on modern Linux distros (NX, stack canary, PIE, fortify, partial RELRO). An attacker would – at the very least – need to find an information leak (maybe by exploiting the XXE ?) on top of finding interesting objects to overwrite on the heap in order to take control of process execution.

Folks at Qualys recently demonstrated they could exploit a double free vulnerability affecting OpenSSH on OpenBSD, so I’m betting someone out there can definitely write an exploit for this ClamAV vuln.

Key Takeaways

As already demonstrated in our previous blog posts, file format parsing is a difficult and complex endeavor. The R&D team at ONEKEY always keeps an eye on recently published vulnerabilities and looks into them to check if there are new things to learn and adopt in our own products such as unblob.

Speaking of unblob, we do not parse the DMG property list XML structure and we default to defusedxml for any XML parsing needs. So, we’re safe on that side – at least for now. Additionally, we rely on 7zip to perform extraction of DMG and HFS, which does not seem to be affected by similar memory corruption in the HFS catalog parsing code.