That reduces, but doesn't eliminate, the amount of information you're leaking. I...

gajjanag · on June 28, 2016

> That reduces, but doesn't eliminate, the amount of information you're leaking.

All cryptographic systems save the one time pad do leak some information, in the sense that they are not information theoretically secure. For instance we know that the output of standard block or stream ciphers (with fixed plaintext) is distributed over an exponentially small fraction of the possible output space.

So really the question here is whether one can pad to the point where it is computationally infeasible to launch this kind of attack, and whether this padding amount is so large as to defeat the compression entirely.

For example, two normal distributions with variance 1, means ~ 2^(-k) away from each other to each other can require ~ 2^(2k) trials for a constant probability of hypothesis test success (say 1/3).

Has there been any work analyzing this rigorously, taking the length leakage as side information available to the attacker?

mjevans · on June 29, 2016

You still want to compress in the first place as a means of removing entropy and some known plain text if possible. Most systems that give you the option of compressing or not do so because you may have already used some other method of compressing the data (E.G. it's a video file or already a compressed transport archive).

The addition of padding, even worthless (hopefully at least pseudo-random) garbage, up to some reasonable minimum message size, is a form of making analysis of message /size/ useless.

geofft · on June 29, 2016

I've heard this advice before a few times, but I've never seen a rigorous analysis of what "removing entropy" is supposed to mean. Any half-reasonable encryption system will deal just fine with low-entropy inputs, and produce an equally rigorous output. (ECB mode is not a half-reasonable encryption system, but even, say, 1DES-CTR satisfies this requirement.)

Compressing inputs that are completely not controlled by an attacker is fine. For instance, gzipping static files on your CDN is totally fine.

Padding to a minimum message size does not make message size useless. It only makes message sizes below the threshold useless. If an attacker can control part of the input (which is the threat model for things like CRIME, where an attacker-controlled URL and a non-attacker-controlled cookie header are part of the same HTTP request), they can just provide their own padding to get the input past the minimum size. Setting a fixed and unchangeable size for messages works (... provided there's nothing secret in the number of messages!).