Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That reduces, but doesn't eliminate, the amount of information you're leaking. If you pad the data to a multiple of some fixed block size, you'll still learn something across the boundary between two block sizes. An attacker can do a CRIME-style attack that includes a wrong password guess plus some of their own padding, and vary the amount of padding until it is just big enough to take n + 1 blocks instead of n. Then they can vary the password until it goes back to taking n blocks.

Randomizing the padding limit also reduces the leaked data, but doesn't eliminate it: your random numbers come from some distribution, so the attacker just has to repeat each inbound message many times, and do some stats. If a correct password guess gives them, say, 1000 bytes with standard deviation 500, and an incorrect one gives them 1001 with standard deviation 500, they simply need to issue a ton of requests.

(If you're padding the data to a fixed absolute size, period, and you know no message is smaller then that, then sure, please pad the message, but there's also not a whole lot of point in compressing it at all. Leave it uncompressed, and pad that.)



> That reduces, but doesn't eliminate, the amount of information you're leaking.

All cryptographic systems save the one time pad do leak some information, in the sense that they are not information theoretically secure. For instance we know that the output of standard block or stream ciphers (with fixed plaintext) is distributed over an exponentially small fraction of the possible output space.

So really the question here is whether one can pad to the point where it is computationally infeasible to launch this kind of attack, and whether this padding amount is so large as to defeat the compression entirely.

For example, two normal distributions with variance 1, means ~ 2^(-k) away from each other to each other can require ~ 2^(2k) trials for a constant probability of hypothesis test success (say 1/3).

Has there been any work analyzing this rigorously, taking the length leakage as side information available to the attacker?


You still want to compress in the first place as a means of removing entropy and some known plain text if possible. Most systems that give you the option of compressing or not do so because you may have already used some other method of compressing the data (E.G. it's a video file or already a compressed transport archive).

The addition of padding, even worthless (hopefully at least pseudo-random) garbage, up to some reasonable minimum message size, is a form of making analysis of message /size/ useless.


I've heard this advice before a few times, but I've never seen a rigorous analysis of what "removing entropy" is supposed to mean. Any half-reasonable encryption system will deal just fine with low-entropy inputs, and produce an equally rigorous output. (ECB mode is not a half-reasonable encryption system, but even, say, 1DES-CTR satisfies this requirement.)

Compressing inputs that are completely not controlled by an attacker is fine. For instance, gzipping static files on your CDN is totally fine.

Padding to a minimum message size does not make message size useless. It only makes message sizes below the threshold useless. If an attacker can control part of the input (which is the threat model for things like CRIME, where an attacker-controlled URL and a non-attacker-controlled cookie header are part of the same HTTP request), they can just provide their own padding to get the input past the minimum size. Setting a fixed and unchangeable size for messages works (... provided there's nothing secret in the number of messages!).




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: