Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The fact that this is being used in an analytics product that claims to be compliant with all privacy laws is horrifying. There’s no way this is compliant and it’s deceptive.


Please explain why this isn’t compliant?


Arguably this can become personally identifiable, much like a persons height of 7 feet becomes personally identifiable. How many 7 foot people live in Elko Nevada? (I have no idea, perhaps there's an entire colony of them.) But most very tall people, well, stand out. "You're that tall guy from Elko!"

Early on, it's not personally identifiable. No doubt there can be a lot of folks visiting the site only 10 times and never again.

But as someone continues to visit, they begin to narrow down who they are to "You're that guy that comes in here every day with a yellow hat". They may not "know" who you are but, they "know" who you are.

Eventually, there may be that one person that has the highest hit rate, who always stands out.


> there may be that one person that has the highest hit rate, who always stands out.

They could stop incrementing once they get to 10 (or something that's high but common enough to be shared by 1,000s of people).


> You're that guy that comes in here every day with a yellow hat

Yes but you have absolutely nothing at all to associate that back to a person. Where are you going to find the data "personal information of some kind of the people who visit your site a lot?" You're not collecting it.


See my reply to b34r. In addition assigning users into “anonymous” cohorts is a similar principle to FLoC which is likely not GDPR compliant https://searchengineland.com/googles-current-floc-tests-aren...


> Processing personal data to generate the cohort assignment without the proper consent could also be a violation

Using personal data to assign a cohort counts as using personal data. Duh. The approach described in the article doesn't use any personal data, though?


> Using personal data to assign a cohort counts as using personal data. Duh. The approach described in the article doesn't use any personal data, though?

Quoting the European commission:

"Personal data is any information that relates to an identified or identifiable living individual. Different pieces of information, which collected together can lead to the identification of a particular person, also constitute personal data."

I'd hazard a guess that it's the second part under which the EC might find this to be within scope.


If I gave you a list of all the last-modified headers from a day, how would you use that information to identify a person?


The definition of personal data under the GDPR is anything that can be used to uniquely identify a natural person (with sufficiently high probability). Both cookies and date-modified meet that definition identically, as do IP addresses.

That doesn't mean you can't use it at all. It just places strong restrictions on what purpodes you can use it for. The important point is just that those restrictions are the same under GDPR for all of these technologies. It doesn't matter how you uniquely identify users, what matters is what you do with that information.


They don't assign a unique date-modified to each user. They assign everyone the same date modified on their first visit of the day. I don't accept that this could be used to uniquely identify a natural person.

You may be able to look at the headers and see that a certain user made the most requests that day. That still tells you nothing about their identity.


Nothing in the technique described here allows to identify an individual directly or indirectly because 'identifiers' are not unique and really no different than standard 'last-modified' dates. Even if they were unique further data would have to be collected in order to be able to identify individuals and turn everything into personal data.

What the technique may fall foul of, though, are cookie laws.


You can't just scare quotes anonymous without explaining how it could deanonymize you. You're sitting there with full access to the count data they collect. Use any statistical methods you like, figure out what visits were me.


That seems very different, as those cohorts are based on actual personal data (correct me if I’ve misunderstood this about FLoC). That’s fundamentally different from a counter I think.


Yes that’s right, FLoC is explicitly using personal data. But now consider that that data is “you visited a gardening website in the past month” and compare it with “you visited this website 3 times yesterday” and the two methods don’t look so different.


I guess we all have different instincts when it comes to this, but I find it much more expected and acceptable that a website can see that I’m returning, than that they get to know about random other interests I have based on my general browsing history.


The article you quote does not suggest that "assigning users into “anonymous” cohorts is ... is likely not GDPR compliant" and I fail to see how that would be the case. Rather it seems to mention concerns that processing personal data to do so may be problematic.


Because the GDPR isn't about any specific technology, but concerns any processing of personal data:

https://gdpr.eu/what-is-gdpr/

Edit: Huh, I stand corrected I don't know if this would count as personal data.


Storing a cache header is not an issue, but if it is used as a unique identifier for user analytics purposes, it is almost certainly personally identifying information, at least after combining with other data. Since they are not disclosing that they store something they use to ID users, it is likely a GDPR violation, at least in spirit, and that spirit is exactly what GDPR seeks to control.


> after combining with other data

The post says that they don't combine datapoints because that would negate privacy.


They don’t but anyone using their service could.


It is personal data regardless of how it is used. The only question is if that use of personal data is permissive.

Using it for user analytics, which is neither required to run the service, nor in the users interest, nor reasonably expected by the user, is almost definitly illegitimate use.


I assume because it stores persistent data on the user's PC without consent (last-modified), just like a cookie.


This is a form of data collection and tracking that is definitely against GDPR unless the user is informed of it and consents to it. As it stands, there is no such notification or consent. IANAL but I strongly suspect will get you fined in the EU.


What personal information is being collected here?


GDPR doesn't just cover personal info, it also forbids tracking without consent, which includes cookies and other means. This is just a technical trick to track someone sans cookie, so I'm 100% certain they will fine anyone doing it unless they get consent.


The GDPR is entirely about personal data stored by the processor [0]. In principle, if the tracking is entirely client-side, and never produces any traces in how the client accesses your server, then the GDPR alone has no ability to stop it. (Not to say that it cannot run afoul of other regulations.) If the results of the tracking are somehow sent back to your server, then it most likely becomes personal data subject to the GDPR.

[0] https://gdpr-text.com/read/article-1/


Why? It’s anonymous and doesn’t collect any user data other than IP and stuff from the user agent


It’s not anonymous in a low-entropy situation. A user can be indirectly identified. This would violate GDPR.


I don’t see how it can be used as described to identify an individual person.

Multiple requests end up with the same time stamp which means individuals are not traceable but as an aggregate countable


Only multiple requests within a given second get the same time stamp. So if you have less than 86k hits per day, then all your time stamps could be unique.

Edit: I misread the article here, where it said each visit incremented the counter by one second. So my calculation is not correct!


No, they are truncating the timestamp to the day. So all visitors to the site on a specific day get the same initial timestamp.


Ah so they are, thanks! That’s much better. Though for a very, very low-traffic site this would still let me track unique visitors.


It is designed to track unique visitors, but not differentiate between them at all.

both you and i visit the same new site today, we both get a file our browser caches with today's date at 00:00:01. Tomorrow when we go to the same site, our browser says we got the file yesterday, so the server sends a new modified date to the browser, set to tomorrow's date at 00:00:02. Both of us have the same "new" file with the new modification date/time.

if i go back the following day, the only thing the server knows for certain, from just this header, is that i've visited twice before. So i'm not counted as a unique visitor.

That this could be used by assigning a unique timestamp to each visitor is where everyone's mind is going, and it feels like half are annoyed there's another way to leak information, and the other half are annoyed they didn't think of it prior to the end-of-year marketing bonus deadline.


The technique could be used for a lot of tracking.

However, it sounds like they're using it just for quite minimal tracking. It sounds like the only thing they're tracking is how many people viewed the site how many times. They'll know that on a particular day, 1 person viewed the site 500 times, but won't know anything identifying about that person (e.g. IP, name, gender, any sort of unique ID).


How do you go from timestamp to identifying someone?

~Every HTTP response has a Date field with a second-resolution timestamp that might be unique. Are you equally concerned about that?


But how do I then tie that unique timestamp to an actual person? Which is what GDPR is concerned about.

(edit: spelling)


Birthday paradox means that will be far lower.


No it wouldn't.


Yes it would because a unique time stamp allows me to indirectly identify a user.


It is not a unique timestamp though. Each day, all visitors start at 00:00:00. All users that visit the site a second time get the timestamp 00:00:01 and so on.


Where are people getting these insane reads of GDPR. Any bit of entropy is not going to violate GDPR. First, an active client-server connection is required for any kind supposed "identity" contained here, which would of course include far more unique bits of identity/entropy, such as IP. Secondly, even if the full DB of page view counts were leaked you could not actually use it to identify a user.

You have somehow perverted GDPR to believe it to mean `no client may ever hold a unique state`. Good luck to anyone making a claim that this is NOT possible in anything but the most rudimentary application.


How?


I agree. Well crafted laws (like the GDPR) forbid any kind of tracking without consent. It’s the what and not the how. It doesn’t matter if it’s via cookies or any other way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: