The fact that ZIP files include the catalog/directory at the end is such nostalgia fever. Back in the day it meant that if you naïvely downloaded the file, a partial download would be totally useless. Fortunately, in the early 2000s, we got HTTP's Range and a bunch of zip-aware downloaders that would fetch the catalog first so that you could preview a zip you were downloading and even extract part of a file! Good times. Well, not as good as now, but amusing to think of today.
> ... a partial download would be totally useless ...
no, not totally. The directory at the end of the archive points backwards to local headers, which in turn include all the necessary information, e.g. the compressed size inside the archive, compression method, the filename and even a checksum.
If the archive isn't some recursive/polyglot nonsense as in the article, it's essentially just a tightly packed list of compressed blobs, each with a neat, local header in front (that even includes a magic number!), the directory at the end is really just for quick access.
If your extraction program supports it (or you are sufficiently motivated to cobble together a small C program with zlib....), you can salvage what you have by linearly scanning and extracting the archive, somewhat like a fancy tarball.
> the directory at the end is really just for quick access.
No, its purpose was to allow multi floppy disks archives. You would insert the last disk, then the other ones, one by one…
XPS (Microsoft's alternative to PDF) supported this. XPS files were ZIP files under the hood and were handled directly by some printers. The problem was the printer never had enough memory to hold a large file so you had to structure the document in a way it could be read a page at a time from the start.
At work, our daily build (actually 4x per day) is a handful of zip files totaling some 7GB. The script to get the build would copy the archives over the network, then decompress then into your install directory.
This works great on campus, but when everyone went remote during COVID it wasn't anymore. It went from three minutes to like twenty minutes.
However. Most files change only rarely. I don't need all the files, just the ones which are different. So I wrote a scanner thing which compares the zip file's filesize and checksum to the checksum of the local file. If they're the same, we skip it, otherwise, we decompress out of the zip file. This cut the time to get the daily build from 20 minutes to 4 minutes.
Obviously this isn't resilient to an attacker, crc32 is not secure, but as an internal tool it's awesome.
How would this have compared to using rsync?
Not as much geek cred for using an off the shelf solution? ;)
Partial zip shouldn't be totally useless and a good unzip tool should be able to repair such partial downloads. In addition to catalog at end zip also have local headers before each file entry. So unless you are dealing with maliciously crafted zip file or zip file combined with something else, parsing it from start should produce identical result. Some zip parsers even default to sequential parsing behavior.
This redundant information has lead to multiple vulnerabilities over the years. As having redundant information means that a maliciously crafted zip file with conflicting headers can have 2 different interpretations when processed by 2 different parsers.
Partial downloads weren't useless, though, as other commenters have said.
The PKZIP tools came with PKZIPFIX.EXE, which would scan the file from the beginning and rebuild a missing central archive. You could extract any files up to the truncated file where your download stopped.
I hate that the most common video container on the web does this too. Most non-"stream-ready" mp4 files lack even the basic information such as height/width until the file has completed loading.[1]
Well what do you want it to do, it doesn't know full directory with offsets until it's done compressing and dispersed directory would have lousy access pattern for quick listing. And you know, if you are compressing you probably want the smallest file so duplicate directories are not idea.
Debian's `unzip` utility, which is based off of Info-ZIP but with a number of patches, errors out on overlapping files, though not before making a 21 MB file named `0` - presumably the only non-overlapping file.
unzip zbsm.zip
Archive: zbsm.zip
inflating: 0
error: invalid zip file with overlapped components (possible zip bomb)
Yep, these kinds of format shenanigans are increasingly rejected for security reasons. Not zip bombs specifically, but to prevent parser mismatch vulnerabilities (i.e. two parser implementations decompressing the same zip file to different contents, without reporting an error).
I think these mitigations are misguided and I've had false-positives at least once. Rather than caring about structural details (overlapping files etc.), decompressors should just limit the overall decompression ratio by default (bytes in vs bytes out). It shouldn't matter how the ratio is achieved.
I wonder if there's any reverse zip-bombs? e.g. A realy big .zip file, takes long time to unzip, but get only few bytes of content.
Like bomb the CPU time instead of memory.
Trivially. Zip file headers specify where the data is. All other bytes are ignored.
That's how self extraction archives and installers work and are also valid zip files. The extractor part is just a regular executable that is a zip decompresser that decompresses itself.
This is specific to zip files, not the deflate algorithm.
There are also deflate-specific tricks you can use - just spam empty non-final blocks ad infinitum.
If you want to spin more CPU, you'd probably want to define random huffman trees and then never use them.
I had claude implement the random-huffman-trees strategy and it works alright (~20MB/s decompression speed), but a minimal huffman tree that only encodes the end symbol works out even slower (~10MB/s), presumably because each tree is more compact.
That would be a big zip file, but would not take a long time to unzip.
Isn't that mathematically impossible?
I'm pretty sure it's mathematically guaranteed that you have to be bad at compressing something. You can't compress data to less than its entropy, so compressing totally random bytes (where entropy = size) would have a high probability of not compressing at all, if no identifiable patterns appear in the data by sheer coincidence. Establishing then that you have incompressible data, the least bad option would be to signal to the decompressor to reproduce the data verbatim, without any compression. The compressor would increase the size of the data by including that signal somehow. Therefore there is always some input for a compressor that causes it to produce a larger output, even by some miniscule amount.
Why's that? I'm not really sure how DEFLATE works but I can imagine a crappy compression that's like "5 0" means "00000". So if you try to compress "0" you get "1 0" which is longer than the input. In fact, I bet this is true for any well-compressed format. Like zipping a JpegXL image will probably yield something larger. Much larger.. I don't know how you do that.
Someone shared a link to that site in a conversation earlier this year on HN. For a long time now, I've had a gzip bomb sitting on my server that I provide to people that make a certain categories of malicious calls, such as attempts to log in to wordpress, on a site not using wordpress. That post got me thinking about alternative types of bombs, particularly as newer compression standards have become ubiquitous, and supported in browsers and http clients.
Unfortunately, as best as I can see, malicious actors are all using clients that only accept gzip, rather than brotli'd contents, and I'm the only one to have ever triggered the bomb when I was doing the initial setup!
In one of my previous jobs, I got laid off in the most condescending way, only to be asked days later by my former boss to send her some documents. If only I knew about this then...
You have bigger enemies more worthy of that personal risk. This comment bewilders me a bit.
Don't commit felonies because you're unhappy with your former employer.
Is it a felony to crash someone's computer?
Would it even crash a computer? They would fill up their hard drive but that would just yield warnings to the user in most operating systems. Chances are they would kill it manually because it would take a long time
If it causes more than $5k in damage. Otherwise, it's a misdemeanor.
But you probably don't want to be investigated for either.
A deliberate act of revenge against a former employer... wouldn't be given much benefit of the doubt by the courts.
Violations of the Computer Fraud and Abuse Act (CFAA) can be either misdemeanors or felonies. It's definitely broad enough that doing so could get you in serious trouble if pursued.
Possibly, yes.
If done deliberately…
Nothing wrong with getting some satisfaction. Just don't do it in a way that can be traced back to you.
Okay, so I know back in the day you could choke scanning software (ie email attachment scanners) by throwing a zip bomb into them. I believe the software has gotten smarter these days so it won’t simply crash when that happens - but how is this done; How does one detect a zip bomb?
I don't understand the code itself, but here's Debian's patch to detect overlapping zip bombs in `unzip`:
The detection maintains a list of covered spans of the zip files
so far, where the central directory to the end of the file and any
bytes preceding the first entry at zip file offset zero are
considered covered initially. Then as each entry is decompressed
or tested, it is considered covered. When a new entry is about to
be processed, its initial offset is checked to see if it is
contained by a covered span. If so, the zip file is rejected as
invalid.
So effectively it seems as though it just keeps track of which parts of the zip file have already been 'used', and if a new entry in the zip file starts in a 'used' section then it fails.
I wonder if this has actually been used for backing up in real use cases (think how LVM or ZFS do snapshotting)?
I.e. an advanced compressor could abuse the zip file format to share base data for files which only incrementally change (get appended to, for instance).
And then this patch would disallow such practice.
For any compression algorithm in general, you keep track of A = {uncompressed bytes processed} and B = {compressed bytes processed} while decompressing, and bail out when either of the following occur:
1. A exceeds some unreasonable threshold
2. A/B exceeds some unreasonable threshold
In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.
On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.
Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.
Embarrsingly simple for a scanner too as you just mark as suspicious when this happens. You can be wrong sometimes and this is expected
Decompression is equivalent to executing code for a specialized virtual machine. It should be possible to automate this process of finding "small" programs that generate "large" outputs. Could even be an interesting AI benchmark.
Many of them already do this. [0]
It is a much easier problem to solve than you would expect. No need to drag in a data centre when heuristics can get you close enough.
My guess is this is a subset of the halting problem (does this program accept data with non-halting decompression), and is therefore beautifully undecidable. You are free to leave zip/tgz/whatever fork bombs as little mines for live-off-the-land advanced persistent threats in your filesystems.
it's not. decompression always ends since it progresses through the stream always moving forward. but it might take a while
Is it possible to implement something similar but with a protocol that supports compression?
Can we have a zip bomb but with a compressed http response that gets decompressed on the client? There are many protocols that support compression in some way.
There was https://idiallo.com/blog/zipbomb-protection earlier this year. It sends highly compressed output of /dev/zero. No overlapping files or recursively compressed payloads.
The fact that ZIP files include the catalog/directory at the end is such nostalgia fever. Back in the day it meant that if you naïvely downloaded the file, a partial download would be totally useless. Fortunately, in the early 2000s, we got HTTP's Range and a bunch of zip-aware downloaders that would fetch the catalog first so that you could preview a zip you were downloading and even extract part of a file! Good times. Well, not as good as now, but amusing to think of today.
> ... a partial download would be totally useless ...
no, not totally. The directory at the end of the archive points backwards to local headers, which in turn include all the necessary information, e.g. the compressed size inside the archive, compression method, the filename and even a checksum.
If the archive isn't some recursive/polyglot nonsense as in the article, it's essentially just a tightly packed list of compressed blobs, each with a neat, local header in front (that even includes a magic number!), the directory at the end is really just for quick access.
If your extraction program supports it (or you are sufficiently motivated to cobble together a small C program with zlib....), you can salvage what you have by linearly scanning and extracting the archive, somewhat like a fancy tarball.
> the directory at the end is really just for quick access.
No, its purpose was to allow multi floppy disks archives. You would insert the last disk, then the other ones, one by one…
XPS (Microsoft's alternative to PDF) supported this. XPS files were ZIP files under the hood and were handled directly by some printers. The problem was the printer never had enough memory to hold a large file so you had to structure the document in a way it could be read a page at a time from the start.
At work, our daily build (actually 4x per day) is a handful of zip files totaling some 7GB. The script to get the build would copy the archives over the network, then decompress then into your install directory.
This works great on campus, but when everyone went remote during COVID it wasn't anymore. It went from three minutes to like twenty minutes.
However. Most files change only rarely. I don't need all the files, just the ones which are different. So I wrote a scanner thing which compares the zip file's filesize and checksum to the checksum of the local file. If they're the same, we skip it, otherwise, we decompress out of the zip file. This cut the time to get the daily build from 20 minutes to 4 minutes.
Obviously this isn't resilient to an attacker, crc32 is not secure, but as an internal tool it's awesome.
How would this have compared to using rsync?
Not as much geek cred for using an off the shelf solution? ;)
Partial zip shouldn't be totally useless and a good unzip tool should be able to repair such partial downloads. In addition to catalog at end zip also have local headers before each file entry. So unless you are dealing with maliciously crafted zip file or zip file combined with something else, parsing it from start should produce identical result. Some zip parsers even default to sequential parsing behavior.
This redundant information has lead to multiple vulnerabilities over the years. As having redundant information means that a maliciously crafted zip file with conflicting headers can have 2 different interpretations when processed by 2 different parsers.
Partial downloads weren't useless, though, as other commenters have said.
The PKZIP tools came with PKZIPFIX.EXE, which would scan the file from the beginning and rebuild a missing central archive. You could extract any files up to the truncated file where your download stopped.
I hate that the most common video container on the web does this too. Most non-"stream-ready" mp4 files lack even the basic information such as height/width until the file has completed loading.[1]
Well what do you want it to do, it doesn't know full directory with offsets until it's done compressing and dispersed directory would have lousy access pattern for quick listing. And you know, if you are compressing you probably want the smallest file so duplicate directories are not idea.
Debian's `unzip` utility, which is based off of Info-ZIP but with a number of patches, errors out on overlapping files, though not before making a 21 MB file named `0` - presumably the only non-overlapping file.
This seems to have been done in a patch to address https://nvd.nist.gov/vuln/detail/cve-2019-13232https://sources.debian.org/patches/unzip/6.0-29/23-cve-2019-...
Yep, these kinds of format shenanigans are increasingly rejected for security reasons. Not zip bombs specifically, but to prevent parser mismatch vulnerabilities (i.e. two parser implementations decompressing the same zip file to different contents, without reporting an error).
I think these mitigations are misguided and I've had false-positives at least once. Rather than caring about structural details (overlapping files etc.), decompressors should just limit the overall decompression ratio by default (bytes in vs bytes out). It shouldn't matter how the ratio is achieved.
I wonder if there's any reverse zip-bombs? e.g. A realy big .zip file, takes long time to unzip, but get only few bytes of content.
Like bomb the CPU time instead of memory.
Trivially. Zip file headers specify where the data is. All other bytes are ignored.
That's how self extraction archives and installers work and are also valid zip files. The extractor part is just a regular executable that is a zip decompresser that decompresses itself.
This is specific to zip files, not the deflate algorithm.
There are also deflate-specific tricks you can use - just spam empty non-final blocks ad infinitum.
If you want to spin more CPU, you'd probably want to define random huffman trees and then never use them.I had claude implement the random-huffman-trees strategy and it works alright (~20MB/s decompression speed), but a minimal huffman tree that only encodes the end symbol works out even slower (~10MB/s), presumably because each tree is more compact.
The minimal version boils down to:
That would be a big zip file, but would not take a long time to unzip.
Isn't that mathematically impossible?
I'm pretty sure it's mathematically guaranteed that you have to be bad at compressing something. You can't compress data to less than its entropy, so compressing totally random bytes (where entropy = size) would have a high probability of not compressing at all, if no identifiable patterns appear in the data by sheer coincidence. Establishing then that you have incompressible data, the least bad option would be to signal to the decompressor to reproduce the data verbatim, without any compression. The compressor would increase the size of the data by including that signal somehow. Therefore there is always some input for a compressor that causes it to produce a larger output, even by some miniscule amount.
Why's that? I'm not really sure how DEFLATE works but I can imagine a crappy compression that's like "5 0" means "00000". So if you try to compress "0" you get "1 0" which is longer than the input. In fact, I bet this is true for any well-compressed format. Like zipping a JpegXL image will probably yield something larger. Much larger.. I don't know how you do that.
Previously discussed in 2019, https://news.ycombinator.com/item?id=20352439
Someone shared a link to that site in a conversation earlier this year on HN. For a long time now, I've had a gzip bomb sitting on my server that I provide to people that make a certain categories of malicious calls, such as attempts to log in to wordpress, on a site not using wordpress. That post got me thinking about alternative types of bombs, particularly as newer compression standards have become ubiquitous, and supported in browsers and http clients.
I spent some time experimenting with brotli as a compression bomb to serve to malicious actors: https://paulgraydon.co.uk/posts/2025-07-28-compression-bomb/
Unfortunately, as best as I can see, malicious actors are all using clients that only accept gzip, rather than brotli'd contents, and I'm the only one to have ever triggered the bomb when I was doing the initial setup!
In one of my previous jobs, I got laid off in the most condescending way, only to be asked days later by my former boss to send her some documents. If only I knew about this then...
You have bigger enemies more worthy of that personal risk. This comment bewilders me a bit.
Don't commit felonies because you're unhappy with your former employer.
Is it a felony to crash someone's computer?
Would it even crash a computer? They would fill up their hard drive but that would just yield warnings to the user in most operating systems. Chances are they would kill it manually because it would take a long time
If it causes more than $5k in damage. Otherwise, it's a misdemeanor.
But you probably don't want to be investigated for either.
A deliberate act of revenge against a former employer... wouldn't be given much benefit of the doubt by the courts.
Violations of the Computer Fraud and Abuse Act (CFAA) can be either misdemeanors or felonies. It's definitely broad enough that doing so could get you in serious trouble if pursued.
Possibly, yes.
If done deliberately…
Nothing wrong with getting some satisfaction. Just don't do it in a way that can be traced back to you.
Related. Others?
A better zip bomb [WOOT '19 Paper] [pdf] - https://news.ycombinator.com/item?id=20685588 - Aug 2019 (2 comments)
A better zip bomb - https://news.ycombinator.com/item?id=20352439 - July 2019 (131 comments)
A valid HTML zip bomb - https://news.ycombinator.com/item?id=44670319 - July 2025 (37 comments)
I use zip bombs to protect my server - https://news.ycombinator.com/item?id=43826798 - April 2025 (452 comments)
How to defend your website with ZIP bombs (2017) - https://news.ycombinator.com/item?id=38937101 - Jan 2024 (75 comments)
The Most Clever 'Zip Bomb' Ever Made Explodes a 46MB File to 4.5 Petabytes - https://news.ycombinator.com/item?id=20410681 - July 2019 (5 comments)
Defending a website with Zip bombs - https://news.ycombinator.com/item?id=14707674 - July 2017 (183 comments)
Zip Bomb - https://news.ycombinator.com/item?id=4616081 - Oct 2012 (108 comments)
Okay, so I know back in the day you could choke scanning software (ie email attachment scanners) by throwing a zip bomb into them. I believe the software has gotten smarter these days so it won’t simply crash when that happens - but how is this done; How does one detect a zip bomb?
I don't understand the code itself, but here's Debian's patch to detect overlapping zip bombs in `unzip`:
https://sources.debian.org/patches/unzip/6.0-29/23-cve-2019-...
So effectively it seems as though it just keeps track of which parts of the zip file have already been 'used', and if a new entry in the zip file starts in a 'used' section then it fails.I wonder if this has actually been used for backing up in real use cases (think how LVM or ZFS do snapshotting)?
I.e. an advanced compressor could abuse the zip file format to share base data for files which only incrementally change (get appended to, for instance).
And then this patch would disallow such practice.
For any compression algorithm in general, you keep track of A = {uncompressed bytes processed} and B = {compressed bytes processed} while decompressing, and bail out when either of the following occur:
1. A exceeds some unreasonable threshold
2. A/B exceeds some unreasonable threshold
In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.
On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.
Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.
Embarrsingly simple for a scanner too as you just mark as suspicious when this happens. You can be wrong sometimes and this is expected
Decompression is equivalent to executing code for a specialized virtual machine. It should be possible to automate this process of finding "small" programs that generate "large" outputs. Could even be an interesting AI benchmark.
Many of them already do this. [0]
It is a much easier problem to solve than you would expect. No need to drag in a data centre when heuristics can get you close enough.
[0] https://sources.debian.org/patches/unzip/6.0-29/23-cve-2019-...
My guess is this is a subset of the halting problem (does this program accept data with non-halting decompression), and is therefore beautifully undecidable. You are free to leave zip/tgz/whatever fork bombs as little mines for live-off-the-land advanced persistent threats in your filesystems.
it's not. decompression always ends since it progresses through the stream always moving forward. but it might take a while
Is it possible to implement something similar but with a protocol that supports compression? Can we have a zip bomb but with a compressed http response that gets decompressed on the client? There are many protocols that support compression in some way.
Previously: I use zip bombs to protect my server (idiallo.com) 1076 points https://news.ycombinator.com/item?id=43826798
There was https://idiallo.com/blog/zipbomb-protection earlier this year. It sends highly compressed output of /dev/zero. No overlapping files or recursively compressed payloads.
(2019) with last update in 2023.
Added. Thanks!