Statement on Google’s conduct by founder of CelebrityNetWorth.com (2019) [pdf]

teddyh · on Aug 10, 2020

The primary case here made against Google might be questionable behavior from Google, but might not be illegal, since simple numbers like a dollar amount can’t be copyrighted, and AFAIK there is no other legal impediment to Google doing what they did (IANAL).

What looks far more inculpatory to me is this, on the last page:

[…] Because it controls essentially the entire internet, Google has endless levers at its disposal to significantly harm or snuff out a rival.

I don’t think it’s a coincidence that our organic rankings have continued to suffer since I’ve become a vocal critic of their practices. Earlier this year, within weeks of the publication of a Wired magazine article that included quotes from me and a recap of our story, the CNW mobile app was banned from the Google Play store without explanation or recourse. As a result we were also banned from using the Google’s dominant Ad Mob mobile ad platform.

This is far more damning of Google abusing its monopoly, not only on search, but also on the Android platform.

tareqak · on Aug 10, 2020

> And what about those conjured celebrities I added as a precaution? All five were all scraped right into Google’s search result pages. It provided undeniable proof that after being turned down, Google simply went ahead and stole the entire database of content CNW took eight years and over a million dollars to build.

A number might not be copyrightable, but what about a news article? What would happen if all the world’s new sites just denied the Google crawler using robots.txt? What if they also returned 404s if accessed from an IP address known to be owned by Google?

Jefff8 · on Aug 10, 2020

Databases have protection under EU law. https://ec.europa.eu/digital-single-market/en/protection-dat...

The EU is also interested in Google, and this looks like a slam-dunk case on the face of it.

wombatmobile · on Aug 10, 2020

Right on, Jeff8.

And when Teddyh says "simple numbers like a dollar amount can’t be copyrighted" he's ignoring the fact that google isn't just copying dollar amounts. Google's copying a data set which establishes a dollar amount as a net worth of a particular person. That's a database.

The only difficulty for CNW will be coming up with the money, time, and emotional energy to achieve a result in court over as many years as it takes Google to exhaust all legal possibilities, of which there are as many as the defendant has dollars.

The problem is, we don't have a justice system, we have a legal system.

scoopertrooper · on Aug 10, 2020

The mere fact that he compiled the database isn't sufficient to qualify for copyright protection. He'll have to show that he used a sufficient level of creativity to arrive at the numbers he displayed.

Based on his testimony, I suspect he'd have a reasonable case to argue, but he'd have to show that his researchers carefully weighed different sources and made other judgements for each individual celebrity rather than just running some script over public property records.

https://en.wikipedia.org/wiki/Wikipedia:Copyright_in_lists#C...

mthoms · on Aug 10, 2020

The article says they employed a number of researchers to come up with the numbers.

I don't think property records (or any other publicly available information) would be enough to come up with an accurate figure in an automated fashion. So, there's some educated guesses (aka "informed estimations") going on and therefore a form of "creativity" (IMHO).

scoopertrooper · on Aug 10, 2020

I agree if the testimony is taken at face value, it makes a pretty reasonable case. However, there's nothing to say that the values he arrived at were accurate (not that it'd necessarily be material to the case) or that his 'researchers' were not just some content farm to write up articles around those numbers in order to enhance SEO. These are the kind of things that'd be deliberated upon in a court case.

mthoms · on Aug 10, 2020

I think we agree. If the numbers required actual research they should be copyright-able.

esrauch · on Aug 11, 2020

I'm not sure about that; the mass of the moon to more decimal places would surely require actual research potentially at great expense, but the resulting number shouldn't be copyrightable since the resulting number is a fact about nature.

scoopertrooper · on Aug 10, 2020

If each entry was individually researched by a person that made judgements by weighing various sources of information against each other to arrive at an estimate, then yes I think they'd have a pretty strong legal argument.

trogdor · on Aug 10, 2020

Amusingly, the reality that their net worth figures are (for the most part) made up works to their advantage here.

trogdor · on Aug 10, 2020

Why would that be necessary? Google copied celebrities the CNW founder invented out of thin air. Not just their net worth, but their very existence.

scoopertrooper · on Aug 11, 2020

Itd be hard to calculate damages for copying the net worths of imaginary people.

tareqak · on Aug 10, 2020

I would argue that even the appearance of taking someone else's work should be a damning blow to Google internally. Google prides itself of having some of the world's best academic researchers under their roof, and plagiarism in an academic context is usually a career-ending affair.

I finally understand what people meant when they would say "for some people, the Internet is Google". Google started off as a search engine, and many people were able to search for answers to their questions such that Google's search engine became the de facto platform for question-answering for this group of people. Over time, Google developed capabilities or acquired capabilities in answering questions in growing number of niches e.g. scraping websites for snippets like in this article or acquiring companies that had developed this sort of capabilities like ITA for flights [0]. As a result, the Google platform that is search started to vertically integrate these different niches bit by bit starving out incumbent competitors via prioritization of search results.

I can can think of one analogy that would be more obviously problematic and troubling: imagine a library system spread across a large geographic area with many branches that was wealthy enough to buy their own book publisher. Maybe it starts out with non-fiction and the quality of the published books remains high. What if the library system then started to prefer this associated publisher? At one point, the library network might end up saying that the other non-fiction books published in the same time period are not being lent out enough to be worth stocking, and so removes / puts them away in long-term storage. One day this library system buys a new publisher, but this time it is fiction, but maybe just one genre like adventure or children's books. Then, the same pattern of behavior emerges.

The library system / network I described above sounds like a very useful and successful platform: almost like a public utility.

Another example might be a telegraph / telephone company anytime before ~1950 buying national newspapers.

[0] https://www.google.com/press/ita/comp.html

an_opabinia · on Aug 10, 2020

Nobody forces you to (checks notes) create celebrity net worth databases.

I just feel like we live in a time with more information than ever, the problem clearly being inaccurate information or propaganda. The problem isn’t insufficient marginal knowledge. Obviously we live in this high knowledge world because of, not despite, Google, so it’s hard to see if you’d really be on the right side of justice if you’re against “big knowledge”

dwild · on Aug 10, 2020

> Nobody forces you to (checks notes) create celebrity net worth databases.

So? What does it have to do with having rights over it? You can say that with literally anything.

You go on afterward arguing about inaccurate information, which has nothing to do with the current subject, and still fail to see why it would be important to protect revenue source over information that was curated...

nkrisc · on Aug 10, 2020

How is Google "Big Knowledge" when the actively play a role in deciding what knowledge gets disseminated or buried? You said yourself the problem is inaccurate information or propaganda which Google is a part of if they're suppressing information they don't like.

fortran77 · on Aug 10, 2020

And these numbers aren't "facts". They're the result of research based on estimates from information about movie deals, real estate investments, business deals, combined with some methodology for producing an estimate based on that. This is particular creative expression of an idea, not just a compilation of facts.

mfer · on Aug 10, 2020

Taking someone else's copy written content and presenting it as your own can be seen as content theft. Imagine if a book author did that to a chapter of a book.

When the one doing it can essential present the case that they should be allowed to do it or they will destroy your business then it's an abuse of power. When they are in a position to actually do it because they have so much of the power in the space it can be argued to be a monopoly and abuse of monopoly power.

Google isn't just copying a number. They are copying text from other sites to present.

dumbfounder · on Aug 10, 2020

It's much less clear in this case. What about displaying the title of the website, is that ok? Just about everyone would say yes. What about the first few sentences of the page? Ok, I think we are good here, probably ok. What about an intelligent snippet of the website that highlights your search terms? Well, ok we are starting to get a bit gray, because now for some searches we don't need to go to the website anymore. But is it wrong or illegal? Well what about formatting that in a nice way that prominently displays the answer to your query? Well now we have an issue, but how big of a leap was it to get there? But is it wrong because they targeted the website and built parsers specifically for it, or because it is done at all? I don't know. But I think it was a slippery slope that got them there.

teddyh · on Aug 10, 2020

> Taking someone else's copy written content […]

(You should read up a bit on this; it’s obvious you have not since you call it “copy written” when the term is “copyright” and a work can therefore be “copyrighted”. The term is about who has the right to copy a given work, and has nothing to do with “writing”.)

Copying text someone else has written is indeed copyright infringement (commonly, but technically inaccurately, called “theft”). Except for the narrowly defined classes of copying called “fair use”, Google would indeed be breaking the law if they did that, but:

> Google isn't just copying a number. They are copying text from other sites to present.

That’s not what the topic was about, and therefore not what my comment was about. My comment concerned the specific case made by the PDF document, which was about Google copying individual dollar amounts, not text.

hluska · on Aug 10, 2020

> (You should read up a bit on this; it’s obvious you have not since you call it “copy written” when the term is “copyright” and a work can therefore be “copyrighted”. The term is about who has the right to copy a given work, and has nothing to do with “writing”.)

You used someone’s grammar as a reason to piss all over a reasonable comment, merely because it didn’t 100% validate your own?

You should read up a bit on basic human decency and even confidence. Otherwise, you just look like a know it all who can’t handle learning new things. That’s weak.

teddyh · on Aug 10, 2020

Maybe you should realize that merely commenting upon someone’s word choices, inferring inexperience, and providing some elucidation, is not the same as “piss all over”. I mean, nowhere did I say anything else negative about anything in mfer’s comment; I did not even disagree with anything they said!

Ergo, please take your lecturing about “basic human decency” somewhere else.

Proziam · on Aug 10, 2020

Maybe, just maybe, you should take the criticism to mean that your intent didn't come across well and that you have an opportunity to improve how you communicate with others.

Food for thought.

teddyh · on Aug 14, 2020

Of course I did that, too.

bryanrasmussen · on Aug 10, 2020

>(You should read up a bit on this; it’s obvious you have not since you call it “copy written” when the term is “copyright” and a work can therefore be “copyrighted”. The term is about who has the right to copy a given work, and has nothing to do with “writing”.)

Perhaps the parent comment was from some country where things are different, like the language.

teddyh · on Aug 10, 2020

Perhaps, but this case was explicitly about US copyright, and may even hinge on the specific intricacies on whether database copyright in US law covers this specific case.

renlo · on Aug 10, 2020

NAL and the only information I have is from a 2 credit intellectual property class I took in college, but facts (like dollar amounts) are not copyrightable, but curated collections of facts (like an almanac) are copyrightable. It seems this would fall under the latter category. Also from the LinkedIn - hiQ Labs lawsuit, it seems that unfair business practices are factored in. Google is implementing unfair business practices that stifle competition by taking the data to make a competing product.

mfer · on Aug 10, 2020

IANAL either. But, there is a difference between taking a straight dollar amount (a fact) and description written that includes the fact.

If all Google did was display the dollar estimate that would be one thing. They go beyond that to include textual content pulled directly from sites.

hyperdimension · on Aug 11, 2020

Thank you for not saying 'IANAL.'

I can't help but to read it, uh, the 'other' way, and I always think "well there's nothing wrong with that, but why bring it up here?"

55555 · on Aug 11, 2020

Copywriting ("copy[,] written") is an industry term.

teddyh · on Aug 11, 2020

True, but that term means something completely different from what was discussed here, and the sentence does not make sense if you assume that meaning. It’s quite obvious to me that they, like many others, erroneously thought of “copyright” as “copywrite”.

pkilgore · on Aug 10, 2020

Just because it wasn't illegal (or questionably legal) yesterday doesn't mean we can't make it illegal tomorrow.

I would surmise that the most popular laws ever written by humanity, collectively, started with a bunch of people thinking some version of "That behavior is questionable."

So sure, the last part is bad too. But that statement was being provide to people who write new laws. Not quite understanding why the current legality of Google's copying is that significant in that context.

PedroBatista · on Aug 10, 2020

The "it's not illegal" game is what every dominant company or powerful mobster wants you to play. Because it's their game and they spend a great deal of money to ensure they always win.

teddyh · on Aug 10, 2020

Yes. Therefore, if we want to win, we should not play that game. Which is why I pointed to something else in the document which much more strongly implicated wrongdoing by Google.

throwaway2048 · on Aug 10, 2020

Also a great deal of money to shape what's legal.

mrtksn · on Aug 10, 2020

> but might not be illegal, since simple numbers like a dollar amount can’t be copyrighted

It might be the case in the USA but there this thing called "Database right": https://en.wikipedia.org/wiki/Database_right

In the case of the business in question, they employed people and spent resources to create that database. It's only fair if they get compensated for their work if the numbers that they collected is to be copied.

ehnto · on Aug 10, 2020

Well we can hope to surface avenues of monopolistic behavior that allow a company to abuse their position within the confines of the law through committees like this. Just because it follows the letter of the law doesn't mean it's not malicious or damaging.

onetimemanytime · on Aug 10, 2020

>>but might not be illegal, since simple numbers like a dollar amount can’t be copyrighted, and AFAIK there is no other legal impediment to Google doing what they did (IANAL).

Forbes for one spends a LOT of money hiring people to try to estimate the net-worth. So it is a number--but one that someone paid a lot of money to get it. Illegal? Maybe. Not-ethical? Very much so IMO

trogdor · on Aug 10, 2020

How do you know how much money they spend?

onetimemanytime · on Aug 11, 2020

No, but I've read of them calling several people, looking at documents people send to prove their worth etc etc. Still money

teddyh · on Aug 10, 2020

Well, yes; I did say it was questionable.

jtbayly · on Aug 10, 2020

Then protect it with a non-disclosure agreement and don’t make it public. But certainly you can’t copyright it.

Jefff8 · on Aug 10, 2020

Databases do have copyright protection under EU law if they are original.

mlthoughts2018 · on Aug 11, 2020

A curated database like this can be copyrighted. The author even mentions this in the article, that the net worth numbers are not considered as commodity information like the height of a famous building.

teddyh · on Aug 11, 2020

He says that the information was hard for him to obtain, but he does not actually, what I can see, argue that the information is covered by copyright.

hluska · on Aug 10, 2020

> This is far more damning of Google abusing its monopoly, not only on search, but also on the Android platform.

No, it’s really not. I just visited celebritynetworth.com. One article took 462 http requests, loaded 9mb of content and though DOM content loaded in 434ms, it took 1 minute to finish loading everything.

The sheer amount of adware on this site is staggering.

Let’s put down the pitchforks and put on our critical thinking caps. Does serving all that adware provide anything close to an acceptable user experience?? And if not, doesn’t it look more like this website merely fell out of search engine fashion as things like performance and UX began to trump other factors?

cik2e · on Aug 10, 2020

You’ve totally glossed over the fact that there was an additional slide in the rankings after the site owner went public with this story. And what about the app being banned from the App Store?

Why would someone go through the trouble of compiling extensive research for free? Ads pay for things that people want but are unwilling to actually spend dollars on. That’s the state of things, whether we like it or not. Don’t hate the player, hate the game.

hluska · on Aug 10, 2020

I think that again, we need to put our pitchforks down and think through this critically. Maybe someone from Google did decide that CNW is about to blow the lid on their scam and decide to shit all over its business model. But let’s step back for an instant and think over some other possibilities.

Google released some pretty major core updates in 2018 and 2019. A lot of these updates seemed to focus on what the SEO community has taken to refer to as EAT (expertise, authority and trust). Google’s own guidance on some of these indicate a similar theme - they’re interested in sources, quality of analysis and other trust metrics. Some major sites saw some really massive drops over these years.

When I checked out CNW earlier on today, I read a couple of articles. There were no sources at all. The net worth calculation seems to be the individual’s reported earnings less state and federal taxes. And the content looks like it was lifted from Wikipedia - in some cases, entire sentences were reprinted word for word.

To get that, my browser had to make 426 requests, download 9mb of mostly adware and display ads almost everywhere.

Maybe this is all a big Google ploy but I think it’s more likely that their algorithm has changed to favour high quality sites as opposed to sites that used to get search performance because of valuable keywords in their domain.

As for their Android app, I’ve never owned a phone that runs Android so I can’t find and download an Android version but I decided to go down the rabbit hole and install CNW in my iPhone. They’ve apparently been ad free for two years, which is nicer than their web experience. But the content is the same kind of stuff as their website - entire sentences seem lifted directly from Wikipedia, their news section is similar, it’s just slightly repackaged stories from other sources.

Thanks to today, I now know that Takeshi 6ix9ine made $5m for a livestream which pushed his net worth to a reported $8 million. I’ve never felt so old or out of touch in my life...:)

arh68 · on Aug 11, 2020

> I now know..

You prove it is an acceptable user experience. Of course it's easy to scroll past ads.

When I read Encyclopedia Brittanica, there are no sources. Yes, it's a pain in the butt to grab the book off of the shelf, but it'd still be wrong for Google to copy it.

If the data is no better than Wikipedia, why didn't Google copy Wikipedia? With all their PhDs and $100M bonuses, they could not do better. They copied CNW because the CNW dataset was better. The fakes prove it. To deny the CNW's data's worth is illogical, pitchforks or no pitchforks.

flokie · on Aug 11, 2020

Just curious - how do you know it's lifted from Wikipedia vs. the other way around? It's quite hard to trace the sources.

E.g. did a quick search for Jerry Seinfeld, CNW is a lot more robust than Wikipedia (or the inc. magazine article it cites)

mcv · on Aug 10, 2020

That's not the issue OP is addressing. If Google is ranking them lower because it's a slow, crappy, ad-ridden site, that's understandable. It's stealing their data and presenting it as their own that's the problem here.

A site being badly designed does not mean they're suddenly not protected by copyright.

hluska · on Aug 10, 2020

As I mentioned to another commenter, I took a deep down into the CNW rabbit hole and even downloaded their app. I read a few articles and compared them to the individuals’ Wikipedia pages. The content appears to be a mild rewrite of their Wikipedia pages and in many cases, entire sentences are copied verbatim.

As for net worth, near as I can tell, for minor celebrities, they take their reported earnings, subtract relevant taxes and call that their net worth. For major celebrities, they seem to rely on Forbes and/or Business Insider as their sources.

ethanwillis · on Aug 10, 2020

Maybe if they served up some of Google's AMP™ pages they would get higher search rankings.

hluska · on Aug 10, 2020

Actually, they use AMP. I read the 6ix9ine article on AMP and it was a much nicer mobile experience.

thefreeman · on Aug 10, 2020

I know you are being sarcastic but it would result in a 10x better experience for the user.

ethanwillis · on Aug 10, 2020

The joke I'm trying to make is that if they had a much nicer site design it still wouldn't be ranked higher. It would only be ranked higher if and only if Amp was used... because of bias.

toyg · on Aug 10, 2020

Any right or principle worth defending will always apply to you as well as to people you don't like.

hluska · on Aug 10, 2020

I get your point but I went down a CNW rabbit hole and I’ve got to tell you, there isn’t a whole lot of original reporting being done. There are no sources and in many cases (6ix9ine is a great example), the content seems to be a rewrite of his Wikipedia page. In other cases, entire sentences are reprinted verbatim from Wikipedia. The net worth calculation is basically reported earnings less relevant taxes. For major celebrities, they seem to rely on publications like Forbes and Business Insider coverage.

throwaway2048 · on Aug 10, 2020

So they deserve to have their content hijacked (unattributed) by Google?

hluska · on Aug 10, 2020

Here’s a fun rabbit hole for you to go down. Find CNW’s article on 6ix9ine and compare it to his Wikipedia page. Then take his reported earnings and subtract relevant taxes.

bufferoverflow · on Aug 10, 2020

It's not just the numbers though. It's a database of names with numbers, all clearly stolen.

I'm pretty sure it's covered by copyright infringement laws.

MaximumYComb · on Aug 10, 2020

Can ficticious characters be copyrighted? Google stole those from home too.

jcims · on Aug 10, 2020

Given enough time tinkering with the prompt, and assuming the Wired article and website were in the training data, I think GPT-3 could probably suss out the relationship between the critical comments in Wired and the net worth website. It clearly can identify sentiment and maybe from that even infer the strained relationship of the site to Google

Seems plausible that Search could autonomously and possibly without human intent begin deranking sites that directly or indirectly surface negative opinions. At some point we could quite literally have to worry about hurting the feelings of our search engines.

jcims · on Aug 10, 2020

Just tried this through AI Dungeon to see what happens:

Prompt

You are Bob Warner, founder of CelebrityNetWorth.com, and are speaking at a hearing before the House Judiciary Subcommittee on Antitrust, Commercial and Administrative Law. The subject is Google. You start your testimony as follows:

"Thank you, Chairman Cicilline, Ranking Member Sensenbrenner, Committee Chair Nadler and Ranking Member Collins,

GPT-3 Response

for inviting me to testify today about Google's anticompetitive conduct in the search market. I am honored that my company, CelebrityNetWorth.com, has been chosen by the subcommittee to speak on this important issue.

I want to begin by thanking the members of this committee for their interest in our industry. As you know, we are an information technology business that provides Internet marketing services to small businesses and individuals. We have been providing these services since 1996. Our service allows users to post advertisements on the Internet in exchange for money. In addition, we provide other services such as web hosting and e-mail services.

CelebrityNetWorth.com is an information website that provides public information about celebrities. It was founded in 2008 and has been very successful. We do not collect user information; we simply provide the public with information about celebrities. For example, on our website, you can find out how much money Kim Kardashian is worth.

We have thousands of users and track over 3 million individuals. We get a combined 6 million monthly page views. We receive an average of 75,000 unique visitors per month. We are very successful and I thank this subcommittee for the help it has given us over the years.

Our website has had many successes, but our crowning achievement was exposing Google for the shakedown racket that it is. Our story began in 2006 when we tried to enter the search engine market. At that time, Google had a near monopoly in the search engine market.

In 2009, we applied to be listed in the Google search engine as a secondary link. We went through all the proper channels and even paid money to Google. After we paid them, our website's popularity increased substantially.

jcims · on Aug 10, 2020

Then I backed off the context quite a bit. (Note I prompt for Google in the middle) Doesn't surface the conflict noted in the PDF but makes an interesting observation about FaceBook.

Prompt

You are Bob Warner, founder of CelebrityNetWorth.com, and are having lunch with an old friend of yours named Phil. You're discussing what the last ten years has been like with the site and some of the major challenges you've faced. The conversation goes as follows:

Phil: Hey Bob, I've been thinking about how much things have changed over the years and how you've had to adapt to keep up with them. What would you say are some of the major challenges you've had keeping CelebrityNetWorth.com up and running and making money?

GPT-3 Response

Bob: Well, it's a combination of several factors. One is that we've always tried to stay on top of all the new sites popping up and trying to get our content out there first. We were also lucky enough to be in business long before most other sites were. So even though we started off small, we've managed to grow into one of the largest sites around. Another factor is that we've always kept our prices low because we didn't want to compete with free sites. Nowadays, many people can get their information for free but they still charge a lot of money for their services. So we offer a good deal for our users and we've found that they keep coming back to us.

We also realized that we had to start diversifying our income streams because advertising alone just isn't enough anymore. There's so much out there that it's actually really hard to keep people's attention, and a lot of people block ads nowadays.

Prompt

Phil: What do you think about Google?

GPT-3 Response

Bob: I've been getting a lot of questions about them. I don't think they're a threat to us because we offer way more information than they do. We're not in competition with them.

However, I'm getting a little worried about Facebook. I see them moving into our turf by creating their own "Celebrity Pages" and getting content from their users.

WalterGR · on Aug 10, 2020

Google is incredibly shady. There’s been a manual penalty in place against my website The Online Slang Dictionary (http://onlineslangdictionary.com) for the better part of a decade. I know this because the data suggested it - and then a Google employee confirmed it.

It would be easy to explain it away because Aaron Peckham, owner of Urban Dictionary, worked at Google when Matt Cutts was head of the Web Spam team, and they knew each other. But that’s incredibly circumstantial. I do know that when I confronted Cutts about the penalty here on HN he lied about it - which does lead one to wonder about his motive.

I’ve written in length about the manual penalty here: http://onlineslangdictionary.com/pages/google-panda-penalty/

Edit 0825 Pacific: I have to work. I’ll respond to comments later. You can also reach out directly - contact info is in my profile.

dang · on Aug 10, 2020

> I do know that when I confronted Cutts about the penalty here on HN he lied about it

That's an extreme thing to say and crosses into personal attack. The thread you seem to be referring to (https://news.ycombinator.com/item?id=5418864) doesn't seem to support what you've said here. At a minimum, you're breaking the site guideline which says: "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith." https://news.ycombinator.com/newsguidelines.html

I'm not saying your claims about Google and your site are false—I have no idea; it sounds like you have every right to be frustrated. Also, I don't know Matt. But these sorts of accusations ought not to be slung carelessly on HN. Since we're trying to have a site that staves off internet-default outcomes, we all need to be careful about this.

I have experience with these dynamics in the HN context and can tell you that people jump to conclusions all the time about why they were penalized or banned. Their conclusions are almost always wrong and overly dramatized. Frequently they take to the forum to declaim about how badly they were treated, and they always put it in strikingly specific and factual-sounding terms, as if they know for sure. But they don't know for sure; in fact they don't know at all. They've just completely made it up.

The HN context is simpler and has lower stakes than Google. (At least here, people can get factual answers to specific questions if they ask.) But that only strengthens the point: these domains are complicated and the dramatic explanations that people come up with out of frustration are almost always wrong. Of course it's frustrating is Google has penalized your site, but please don't let that boil over into accusations against specific people unless you can strictly demonstrate what you're saying.

WalterGR · on Aug 10, 2020

I understand. Feel free to kill this thread.

please don't let that boil over into accusations against specific people unless you can strictly demonstrate what you're saying.

I realized that I’m using stronger verbiage here than in the article on my website. I haven’t updated the article in some time. If I recall correctly, it doesn’t contain any evidence/proof that explicitly contradicts specifically what he told me in the linked HN thread. I will update the article, and keep in mind the contents of your comment. Out of legitimate curiosity, at what level in the org chart is it okay to say that someone lied? We can say that the CEO lied. Matt was a senior employee who was also very public: he was the face of Google’s ranking algorithms.

dang · on Aug 11, 2020

I don't really think in terms like "at what level in the org chart is it okay", but if pressed I guess I'd have to say that it's never ok to say that someone lied when you don't have evidence of that.

Keep in mind that the word "lie" is strong medicine. It implies not only that someone said something untrue but that they deliberately intended to deceive.

Mandatum · on Aug 11, 2020

I think it's fair to say HN policies around naming and shaming are pretty cut-and-dry in the comment section: that is, unless it's the subject of the article or link by the OP, it's not okay.

So from what I can tell, if someone writes an article about an employee of a company, people can say whatever they like in the comment section - anecdotes as colleagues, stories of similar environments, a mix of facts and claims. The article in question may be mostly heresy, and we've seen in the past this happen with the likes of Bezos, or very public people within our field like Stallman, or founders with questionable morals (see the latest AI controversies, or ads, or hacking, or regulations)..

But as soon as we specifically name someone in the comments, it's considered a witch-hunt. I think this is only true the first time it happens, or prior to them being targeted outside of this community.

Once someone writes a blog post with at least some facts on the subject, or the mainstream media writes about them - suddenly they're fair-game.

dang, we know this isn't going to get any easier to moderate. And I think writing broad, sweeping rules around this won't work. However, could you possibly write a post about it? You've expanded in the comments section in response to my own failure to live up to these standards (sorry again!), which I think was a good amount of information.

https://news.ycombinator.com/item?id=23306743

I think it could be good for some community self-reflection, and to see where the line of discourse vs gossip sits today.

dang · on Aug 11, 2020

It's my intention to collect the moderation comments on various of these perennial topics, cherry-pick the best bits, and turn them into either some essays or perhaps a kind of super-FAQ. That's why I link to old moderation comments using hn.algolia.com; the answers have started to converge in the last few years, and maybe there's a blissful future in which we can simply link to official explanations rather than endlessly reworking them. So yes in principle, though when there will be time for it is another question.

rozab · on Aug 10, 2020

Do you have any evidence at all for these extraordinary claims? Why would Urban Dictionary go to such lengths to suppress a competitor which posed no threat[0]?

I also notice that the homepage hasn't been updated at all for at least 8 years[1]. I suspect this may have had something to do with its decline.

[0]: https://trends.google.com/trends/explore?date=2004-01-10%202...

[1]: https://web.archive.org/web/20120107121814/http://onlineslan...

DanBC · on Aug 10, 2020

You may want to read these posts:

https://news.ycombinator.com/item?id=9977372

https://news.ycombinator.com/item?id=5419890

ender7 · on Aug 10, 2020

I have to say, after reading those posts, Matt is looking more trustworthy than the OP.

Especially considering the definitely-not-a-sock-puppet post by jimboykin [1], an account that was created immediately after this thread and has but a single post on HN.

[1] https://news.ycombinator.com/item?id=5476100

sct202 · on Aug 10, 2020

I mean part of this issue is that there is no feedback to the website owner, which is shown by the guy only getting a response here on HN.

Like isn't it kind of insane that the top Google search engineer provided customer service to OP, but he couldn't get a response thru a normal channel.

WalterGR · on Aug 10, 2020

Thank you for the links.

When I refer to Matt Cutts lying, I mostly mean his comment here: https://news.ycombinator.com/item?id=5426209

I have no idea what that business is with that company.

I did everything in his comment to address the claimed automated penalties: removed every single ad and every single citation from the site. No changes in ranking at all, after months. I forget how many. 6 months?

That’s because Matt Cutts was lying: there was a manual penalty in place at that time.

WalterGR · on Aug 10, 2020

I have evidence - it’s in the link. I also have proof, but I’m not comfortable with publishing my communication with the Google employee right now.

I have no idea why the penalty is in place. What I describe is “incredibly circumstantial.”

You say that my site poses no threat. Is that because it hasn’t seen traffic growth or become popular? The site is no threat exactly because the penalty is in place and has been for almost a decade. That’s a long time for a website. Traffic growth was very healthy before all of this.

The home page has been updated more recently than that, but you’re right that it still has the same page structure. With the manual penalty in place, it doesn’t matter how often I update the skin of the site or the home page text: the site is dead.

jasonwatkinspdx · on Aug 10, 2020

Just to save everyone the effort, the closest thing to evidence amid that blog post's many wild allegations is that the pandas update changes around citations hurt the site's google flow. This isn't surprising. That hit a lot of low quality websites that were essentially arbitraging google to aggregate content duplicated elsewhere on the web. I worked for a similar business with considerably more traffic. We didn't interpret it as some directed conspiracy against us. We knew from day 1 that's a risk of developing such a site, and why original content still matters a great deal. We adapted and moved on.

The remaining proof claimed is that a whistleblower told him Matt Cutts implemented a penalty manually. But while our poster is quite willing to share the allegation, they're not willing to share the communication.

There's no proof on offer here.

maps7 · on Aug 10, 2020

Unfortunately I followed the links before reading your comment. I wasted my time - there's no proof of lies or anything here. Just some claims.

sroussey · on Aug 10, 2020

Breaking the rules gets you dinged until you fix things. Repeatedly breaking rules results in more permanent dings. Both with search and with Adsense.

specialist · on Aug 10, 2020

Edit: Is there any way to prove that Google isn't/hasn't unfairly penalized this site?

I'm more willing to suspend Hitchens Razor when a well known defendant has such a long history of doing much worse.

WalterGR · on Aug 11, 2020

Is there any way to prove that Google isn't/hasn't unfairly penalized this site?

No.

temp667 · on Aug 10, 2020

Dude - I went to your side. I looked up a random word - it had a 3 word definition with 12 seconds to interaction, at least a second of blocking content, a pageload time of 25 seconds (!! for three words!), 352 requests! 1.5MB page size, and content farm crap filling out the rest of the page.

I mean, I'm a user of the web, and I hated sites like this.

You score terribly among all major search providers I tried BTW. Your page-speed score was an F.

You have something like 50 redirect chains on a page with three words!!

Some of these chains are 20 items long (ie, 20 redirects). I mean - how does this even happen?

Quick heads up - This is going to rank terribly everywhere.

Yes, google does "penalize" content farms, and they did a big redo a while back to really try to push this type of 99% repeating content off their index. I've no doubt they hit some good sites, yours doesn't look like one at first glance, but even if I just go unlucky there were tons of sites (review sites / how to sites) that just had endless repeating content with very little originality.

How do folks at google even have time to deal with sites like this and these types of complaints "I confronted Matt" etc...

WalterGR · on Aug 10, 2020

I have no idea what you’re talking about. The site has never had any of that.

Did you go to the right site?

http://onlineslangdictionary.com/

What URLs are you seeing this on? What browser and OS?

I wonder if anyone else can repro what this new user is describing.

arh68 · on Aug 11, 2020

No issues rendering in Firefox on Linux.

GET /random-word/ does indeed 302 exactly once.

22 requests, 344.58 KB / 12.62 KB transferred, Finish: 619 ms, DOMContentLoaded: 483 ms, load: 640 ms. Pretty okay.

sundarurfriend · on Aug 12, 2020

Be pretty ironic if they went to the Urban Dictionary, it does sound like a plausible UD experience.

someperson · on Aug 10, 2020

> Due to the time suck of fighting Google's penalty, this site is not yet served via SSL

You probably should get onto enabling SSL/TLS as soon as possible. It supposedly effects search engine ranking.

WalterGR · on Aug 10, 2020

Very true. Due to the penalty, the site gets less and less traffic every year. Enabling SSL/TLS might help a bit, but I have to face facts: the site is dead.

badwolf · on Aug 10, 2020

Not to belittle your experience, but it seems this may be a bit of a negative feedback loop. Haven't updated the homepage because "what's the point I'm already punished" etc...

WalterGR · on Aug 10, 2020

I spent 80 hours a week from when the penalties started in 2011 until I went broke and got outside employment in 2014. I’ve worked on it for years worth of time since then, including updating the home page. (It’s a content site. Updating the home page is a thing, but people come to the site via searches for slang terms, not for the home page...)

There is nothing I can do with the manual penalty in place, and it doesn’t make sense to throw good time after bad. It’s been 9 years since the penalties started and there’s been no improvement since then. Not only is it not healthy to continue banging my head against the wall of Google, since we’re on a site for a startup incubator: it doesn’t make monetary sense.

paganel · on Aug 10, 2020

re: Matt Cutts

I suspect he was the Googleguy from the Webmasterworld.com forums around 2002-2003, when Google (the company) was telling website owners: "trust us, we only need your website data in order to index it and to make it more easily searchable, no foul play here, we're never going to become a portal like Yahoo, we're never going to become your competitors".

Suffice is to say that things didn't turn out that way, and it has always bugged me that the people directly involved in this deception (i.e. people like Matt Cutts) have never even said "we're sorry". They just took their fat pay-checks and vested their shares and they were done.

daveed · on Aug 10, 2020

Are you saying that the only thing that Google did was to deprioritize sites with a lot of citations? Not that they deprioritized your site specifically? This kind of flag would affect every site they crawl. I feel like it's much more likely that it was made to prevent some class of abusers, and you were caught up in the scope of their net.

I mean, every time they make a change in ranking some people get hurt and some people get helped. If this is a universal change and not tied to your domain I have a hard time believing that a team would've added this flag because a former employee makes a competing site with a slightly different model from you.

WalterGR · on Aug 10, 2020

Are you saying that the only thing that Google did was to deprioritize sites with a lot of citations?

That was the automated penalty. The rest of your comment is correct. The manual penalty came later.

I have a hard time believing that a team would've added this flag because a former employee makes a competing site with a slightly different model from you.

As I said, that’s incredibly circumstantial. I don’t know the motive, cause, etc. I’m not going to publish my communication with the Google employee who informed me about the manual penalty at this time, so what I’ve written in that article is the extent of the public information at this time.

lalaland1125 · on Aug 10, 2020

You appear to have roughly the same ranking in both Bing and Google for two words that I tried ("bro", "womp womp"). Do you believe there is also a conspiracy against your website at Microsoft?

WalterGR · on Aug 10, 2020

Nope.

It’s been 9 years since the penalties started. One could say that the site had taken something of a popularity hit since then.

mawise · on Aug 10, 2020

While this is all very shady, it also seems to be a poor long-term business decision for Google.

"Our revenue and profits have dropped accordingly,which has forced me to cut our staff down to a bare bones team ... the quality of our content and product have dropped"

If CNW is no longer able provide that data because of financial insolvency, then Google no longer has a source they can scrape. This smells like the poor short-term incentives that drive individual roles instead of simply unethical behavior on Google's part.

flutterdude420 · on Aug 10, 2020

> If the penalty weren't in place, we would get about double the visitors of that hate site. Given the rise of white nationalism in the West, does it make sense that a single person within Google can destroy a dictionary, while hate sites flourish?

Going to need evidence here. Your post comes off as a bit of a one-man pity party without it.

WalterGR · on Aug 10, 2020

Yeah that text is... long in the tooth. It was written before all of the deplatformings, when Google was sending tons of traffic to such sites.

Aside from that, I tried to be as neutral as possible. But the site is my life’s work so far - it’s hard to leave emotion out of it.

At this time I’m not going to publish my communication with the Google employee who informed me about the penalty, so if you think it’s a pity party, I’m okay with that. I’d rather that be the case than risk the livelihood of someone who did the right thing.

cynicalkane · on Aug 10, 2020

> I do know that when I confronted Cutts about the penalty here on HN he lied about it - which does lead one to wonder about his motive

The preamble to this post hints of paranoia, but this is where we know for sure. There's no reason to believe Cutts would have any personal knowledge or involvement or motivation, except that he knew another person who might have had some weak interest in maybe doing something. Of course, the poster feels that, having drawn this chain of connections, that he must be right and Cutts must be lying.

The poster feels entitled now to personally attacking a specific well-known search engineer, one who has been active here in the past. This seems like a conduct violation on here or any reasonable discussion site.

WalterGR · on Aug 10, 2020

There's no reason to believe Cutts would have any personal knowledge

Cutts was the head of the Web Spam team at the time. That team is responsible for penalties.

He and I had an exchange here years ago. The text of his comment implies that he looked up penalties, so he had knowledge: https://news.ycombinator.com/item?id=5426209

This seems like a conduct violation on here or any reasonable discussion site.

If that’s the case, I’m happy to discuss it with dang or another admin.

epanchin · on Aug 10, 2020

How did you site rank on other search engines? Do you have evidence it outranked UD on yahoo, etc?

WalterGR · on Aug 10, 2020

Very unfortunately I don’t have historical ranking data.

adventured · on Aug 10, 2020

I have 20+ years of SEO experience, going back before it was commonly called SEO, before Google even existed. I cut my SEO teeth figuring out how to manipulate AltaVista rankings.

Here, I've marked up a typical content page of yours, showing what's wrong very blatantly. It took about 30 seconds to diagnose the whole site based on the repeating content farm structures and thin content problems.

https://i.imgur.com/k5brlYr.png

That page maybe has 20 or 30 words of quality content on it. That will get you an extreme demotion from Google if you're running a whole site like that in a content farm structure.

Your site was very likely penalized due to low quality, shallow content. Google tagged it as being a content farm. No conspiracy is necessary, it's obvious why it's penalized. Any SEO expert would point out the countless problems in the first few minutes of a discussion with them.

I looked at numerous pages, all were of very low quality, and thin / shallow on content.

On one page I checked there were 471 words, most of which were repeating low value words or from low value segments of the page (eg the "Link to this slang definition" section). Google knows the page is thin on quality content, aka shallow. I found this to hold true across all the pages I checked.

On most of the pages I checked the most frequent non-common words were "definitions" and "include" - ie hyper low value content. Your typical page looks like it might have 50 or fewer words of quality non-repeating content (content unique to just that page, written by a human). That's hundreds of words shy of what it needs to be at an absolute minimum.

The pages all have repeating content farm structures (which were increasingly common 8-12 years ago) with shallow content in those areas of the page, which is something Google sought to clean up around the time you're claiming the penalty occurred, which makes perfect sense.

You built a content farm, with a lot of shallow content pages, Google dramatically changed how it treats those types of sites, and your site most likely got nuked because of it. If you build a content farm structured site, where all the pages have repeating segments and your offset of positive value only consists of dozens of words of quality text on each page, you stand zero chance of getting traffic. That started being true 8-10 years ago as Google updated its algorithms to target content farms, to wipe them out of existence. They killed a billion dollar site, eHow, circa 2010-2011 in the process:

https://en.wikipedia.org/wiki/EHow

WalterGR · on Aug 10, 2020

Thanks for your feedback. It’s quite similar to other feedback I’ve gotten, from non-employees on Google’s webmaster forums.

My site is the only site on the web with legitimate definitions of slang terms and citations of their use and other data. My site offensivest.com is the only large-scale analysis of the vulgarity of text works - using data from the site. Until a couple years ago, it was the only slang thesaurus available without payment on the web. I invite you to check out similar sites to calibrate your sense of whether it’s low or high quality. For example, you could compare my site’s content, ads, and ranking to internetslang.com. https://internetslang.com/DBA-meaning-definition.asp Is representative of the site.

I don’t understand your saying that my site is a content farm. Almost every definition has been hand-edited by me. The citations are all blacked out. In terms of calibration, https://urbanthesaurus.org/ is another good site to look at. It takes its definitions directly from Urban Dictionary.

But fundamentally it boils down to: a Google employee looked it up and told me that there’s a manual penalty against my site.

adventured · on Aug 10, 2020

> It’s quite similar to other feedback I’ve gotten, from volunteers on Google’s webmaster forums.

Then why haven't you corrected the thin content problem? Your site is drowning in content farm structures and thin content.

> My site is the only site on the web with legitimate definitions of slang terms and citations of their use and other data.

What you're saying has nothing to do with the problem. Your pages do not have enough original content on them, they're suffering from an extreme shallow content penalty. If only 5-10% of the words on your pages are original, quality content and the rest is content farm structured words, you are going to get massively penalized for that.

> Almost every definition has been hand-edited by me. The citations are all blacked out. I don’t understand your saying that my site is a content farm.

I'm saying that Google has judged it to be a content farm based on how they rank content and how they look for mass repeating structures on a site's pages vs how much high quality original content is on each page. This isn't complex SEO, this is very, very simple to diagnose. I wasn't joking when I said it took all of 30 seconds to figure out the problem with your site.

If all you have on a given page is 30 to 50 words of original text (text you wrote by hand so to speak), you will not rank, you will not get much traffic. That's SEO 101. Most of your topic page layout consists of repeating structures, such as: the header, the footer, the map, share / tweet / email section, related words, how common is this slang, how vulgar is this slang, link to this slang. That's all of zero or negative value.

One more time: ~95% of your page structure is of zero value for SEO purposes, and the only value on the pages is a few dozen words of original text. It's not enough; it's not even remotely close to being enough. That needs to be hundreds of words ideally, at a minimum.

> But fundamentally it boils down to: a Google employee looked it up and told me that there’s a manual penalty against my site.

That's very clearly not the problem with your site. It's the extremely shallow/thin content stacked against the repeating content farm structures on every page.

mcintyre1994 · on Aug 10, 2020

> And what about those conjured celebrities I added as a precaution? All five were all scraped right into Google’s search result pages. It provided undeniable proof that after being turned down, Google simply went ahead and stole the entire database of content CNW took eight years and over a million dollars to build.

I remember a few years ago Google had a big blog post about how they’d injected some fake search results to catch Bing scraping like this. It certainly seemed like they thought it was a bad thing back then.

strgcmc · on Aug 10, 2020

I worked at Microsoft from 2010-2012, and the crux of the issue there was that, for Microsoft users who had the Bing toolbar installed, the toolbar would track what you were visiting in your browser and use that session/journey information to try and improve the relevancy of Bing results. If a lot of people who searched for "best pancake house" (regardless of what search engine you used) end up visiting ihop.com within the same browser session, well then Bing would want to rank ihop.com higher for that search phrase.

Google noticed this, and for unique/low-traffic search terms, was able to synthetically generate enough "fake" traffic that the "fake" traffic became the dominating signal for those terms, and therefore Bing started directing users to the fake results.

The bad thing here IMO is the level of tracking of users via this toolbar, but fundamentally this seems as bad as any other digital fingerprinting or advertiser tracking as anything else that's become common on the web. This is not to excuse Microsoft for doing a bad thing, but it really had very little to do with "scraping Google", which somehow became the popular media takeaway for this.

EDIT: a decent contemporaneous article in Wired: https://www.wired.com/2011/02/bing-copies-google/, mentioning how Microsoft was using the clickstream data from the browser/toolbar, not scraping Google results per se

whalabi · on Aug 11, 2020

That's the first I've heard of that side of the story, and I'm pretty appalled because everyone I know believes "Bing scraped Google" - which I think was how Google framed it.

I wonder how common misunderstandings like this can be prevented or treated.

mcintyre1994 · on Aug 11, 2020

Thanks for the correction - I never really heard any followup on this at the time and definitely missed this explanation. Obviously the tracking is bad but it's definitely a different thing to the way it was presented as them actually crawling a competitor.

sidibe · on Aug 10, 2020

Recently one of the lyrics sites had a similar complaint that might be the same thing happening in this case (and maybe it was the same issue for Bing's case too).

If you specify that you don't want Google to take snippets, but a third party ignores the robots.txt or meta tags and has a permissive one of its own Google can just scrape it from there.

A missing detail here is what did the textbox for the fake celebrities' net worth on Google link to?

giarc · on Aug 10, 2020

I believe it was Genius that was using rare unicode characters in the lyrics that don't really show up when viewing the page. They were able to see that Google had those same characters in their lyric results and therefore Google was just scraping Genius.

sidibe · on Aug 10, 2020

Ah thanks, that's the case. There's definitely some wrongdoing there but it's murky who is responsible since Google actually got the lyrics from LyricFind who stole it from Genius.

giarc · on Aug 12, 2020

Ironically this news came out today.

https://techcrunch.com/2020/08/11/court-dismisses-genius-law...

jefftk · on Aug 10, 2020

Playing around with real celebrities:

* https://www.google.com/search?q=larry+david+net+worth -> featured snippet from Wikipedia, showing an estimate

* https://www.google.com/search?q=bill+gates+net+worth -> database style response, showing answer with no link

throwaway189262 · on Aug 10, 2020

And yet I constantly get CAPTCHA's just from blocking tracking cookies.

The biggest scraper of them all goes to incredible lengths to prevent scraping. How ironic.

I hope the US govt demolishes these monopolies. It's not just in web either, the closest historical precedent I can think of is the Robber Barons of the 30's

ehnto · on Aug 10, 2020

It's them protecting their technology moat I figure, not so ironic, more of a hoarder protecting their hoard.

Google234 · on Aug 10, 2020

There’s a thing called a robot.txt.

xondono · on Aug 10, 2020

Yes, and google has been observed several times to ignore the robots.txt file

Google234 · on Aug 10, 2020

No, this is true in general. Maybe there’s been some bugs (I would be surprised, can you link these multiple examples?) but Google respects robot.txt files

pkilgore · on Aug 10, 2020

Even assuming Google would not ignore that for a popular enough site (they would know!), there is no robot.txt setting that says "index my site, but please don't steal my content, thanks."

throwaway189262 · on Aug 11, 2020

Yeah then they won't index you at all and you lose all your traffic instantly. How about a setting to index my site without scaping my content and using it all over the place without attribution

alex_young · on Aug 10, 2020

I searched for a couple of celebrities, and in one case celebritynetworth.com is the top result, in the other it is the third result.

Both queries show info boxes with the estimated net worth, both cite other pages (no link to celebritynetworth.com), and one showed another number than celebritynetworth.com.

celebritynetworth.com has a robots.txt. This file has no restriction on Google or any other bot.

Each celebrity page seems to have the following meta tag:

  <meta name=googlebot content="index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1">

Given the above, it seems like Google is correctly listing celebritynetworth.com high in it's rankings, and using other resources for the info box, which is what celebritynetworth.com is asking them to do.

While it's hard to speculate about what happened in the past, it looks like Google is doing the correct thing now, and that this information isn't actually so novel in general.

tomduncalf · on Aug 10, 2020

"The Featured Snippet also incorporated images of the celebrity scraped from the web to create a widget that took up 40% of a desktop result page and 80% of a mobile result page. This is still how Google displays most net worth results today"

FWIW I tried "larry david net worth" and "craig david net worth" on Google from the UK and didn't get this result - Larry David was a snippet from Wikipedia, clearly attributed, and Craig David was from CNW, again clearly attributed - basically an oversize search result with the relevant bit highlighted. Not sure if this is a recent change in response to this or not

gundmc · on Aug 10, 2020

Out of curiosity, I tried searching a number of "$X Net Worth Queries" to see how it looks today. Snoop Dogg, Mike Trout, Will Smith.

For each I got an infobox sourced from Wikipedia or a news article and Celebrity Net Worth was somewhere in the top 1-3 organic results.

philprx · on Aug 10, 2020

When you consider the impact/cost of an anti-trust case vs. the ease for Google to change rankings so that the evidence site (CNW) is back in the top 3 results...

Of course you would do that if you were Google... And here there's nothing illegal, it would just defuse more journalistic bombs for Google. Smart move.

jarym · on Aug 10, 2020

I only see one solution: Make Google split the search business out so that it only provides search and nothing else. That was Google's stated 'point' of their own site and they should be held to it.

They are like a bad rash, just ripping off from businesses of all sizes and relying on their dominance in search and financial muscle to get away with it.

Siphoning off search into its own unit would defang Google's ability to use their search engine dominance to upend others business models and would also reduce the incentives to do such shady things as well.

mikece · on Aug 10, 2020

Another solution is to make Google pay for the estimated damages caused to the website (I think it's reasonable to extrapolate growth for a few more years) as well as a massive punitive damage as well because of the callousness and deliberate manner in which they contacted the site owners AND TOLD HIM THEY WOULD TAKE WHAT THEY WANT. I hope Google is forced to pay CNW no less than one billion dollars for this obvious abuse of their position and theft of their content.

redm · on Aug 10, 2020

I think this quote is the most interesting in the testimony:

In June 2019, search engine analyst Rand Fishkin put together a report about Google using data from web analytics firm Jumpshot. The data show that today an estimated 48.96% of all Google searches end with the searcher NOT clicking through to a website. The same report estimates that 7% of all search clicks go to a paid ad result and 12% go to properties owned by Google’s parent company Alphabet. Moreover, those stats do not even show the full extent of the problem because the data largely relied upon desktop devices and could not track searches that took users to a Google-owned app like the YouTube or Google Maps.

praveen9920 · on Aug 10, 2020

I'm not sure if this is ethical and legal problem. But one thing for sure is Google has too much leverage both technically and financially.

Google has slowly increased the reliability on Google for consumers everywhere that they are essentially a monopoly which is bad for the market. It is about time a new player comes into picture with same user interface and different business model which can empower these internet entrepreneurs.

techntoke · on Aug 10, 2020

The same could be said for Microsoft, Amazon, Apple, etc. At least Google open sources their operating system, browser and are mostly responsible for Kubernetes. If people are going to have such high expectations of one company why do they not equally apply the judgement.

throwaway2048 · on Aug 10, 2020

There is also similar problems with all those companies too, not sure why you think that is some kind of defence of Google.

techntoke · on Aug 10, 2020

Because those companies are ignored while everyone jumps aboard the Google hate train, and if anything is going to be done to fix it then those changes need to be applied equally to all companies at the same time. Otherwise it does more harm than good.

throwaway2048 · on Aug 10, 2020

People complain about the actions of all those companies quite regularly, Google isn't a victim here.

tyingq · on Aug 10, 2020

Doesn't this behavior eventually hurt Google? If there's no incentive for third parties to collate, verify, and organize facts...eventually there isn't anyone to scrape. And the facts go stale.

pwinnski · on Aug 10, 2020

Whether it will hurt them before the heat-death of the universe is an open question.

On an internet populated entirely by 100 people, it wouldn't take long for shady practices to hurt shady companies. On the internet we have, in which there seems to be an unending supply of people falling over themselves in their eagerness to see their content--any content--displayed on Google's pages for free, the "eventually" in your sentence is doing quite a lot of work.

pkilgore · on Aug 10, 2020

What's funny, is that it hurts the consumer.

Which (for worse) has been the reason to roll back all the antitrust laws that would have stopped Google from ever becoming powerful enough to do this in the first place.

Google of course argues "faster results" are good for the consumer, but if you strip away the incentives for publishing information in the first place, you quickly end up with "bad results" then "no results."

remus · on Aug 10, 2020

It would depend on how the websites google are collecting data from operate. For example, somewhere like wikipedia isn't harmed through google's re-use of the facts they collate.

ubercow13 · on Aug 10, 2020

They are if they have less visitors to see their donation requests, stop making enough money and shut down.

tyingq · on Aug 10, 2020

Wikipedia's traffic from Google is certainly down. https://www.similarweb.com/corp/blog/how-wikipedia-lost-3-bi...

pkilgore · on Aug 10, 2020

Wikipedia is directly harmed, in the most definite way, by not being able to collect donations from the people that rely on the information it collects.

klyrs · on Aug 10, 2020

Once it's too stale, they'll just kill the product. If somebody fills the niche in the future, they can just buy them or scrape them.

tehabe · on Aug 10, 2020

An odd witness for the very true issues people have with Google Search. Google is crediting all websites it uses information from. It might be sad for such a site but I wonder how the creator thought that this is sustainable. Sites with biographical data for famous people already existed and for them adding net worth information is easy. So if not Google had kill them, they might have being killed by IMDb oder Wikipedia.

ehnto · on Aug 10, 2020

According to the document, the net worth was unique researched data that is still yet to be accurately replicated. In other words, the company created the data, they did not simply display it. If they were sunk by IMDb copying their data then the data source goes away too, unless IMDb invests in a research team.

What happens when CNW goes offline, how will Google replace the now missing data? Remembering that celebrities constantly change net worth and new ones show up every week, so it's a moving target that someone needs to chase.

tareqak · on Aug 10, 2020

Crediting the source while providing the information has the same benefit for the user accessing that information, but not the same benefit for website presenting that information. The website loses on whatever benefit a visiting user brings in e.g. advertising revenue, lead-in to a subscription / patronage, a plea for a donation etc.

I still remain very much against ads, but I have a more sympathetic take to financing journalism as a result of reading this account. Perhaps just having the headline be free, but having the rest be behind a paywall is a better middle ground?

dustingetz · on Aug 10, 2020

Surely they can find a stronger example than a spammy SEO business that didn't exist before google, couldn't exist without and doesn't exist after? Google sucks now but not finding this compelling

calcifer · on Aug 10, 2020

Why would the nature of the business be relevant to a House committee hearing? CNW is not on trial here.

pkilgore · on Aug 10, 2020

Why does it matter? Do some people not get the protection of laws? Do some people not get representation before elected bodies? Who decides?

treszkai · on Aug 10, 2020

Information without DRM can easily become a public good: it can be copied without limits and a loss in quality. Once it's out in the open, people will access it easier, but that also makes it near-impossible for the information-owner to make a living. At the beginning, Google was beneficial to both the information-seekers and the information-owners because it connected them (while showing ads in the search results). With the "featured snippet" it increased the user experience by a little (and multiply this with the user count), while depriving the original owner of information from its entire revenue. (Much like how Spotify made music cheaper for everyone, at the expense of artists.) The nice way to solve this would be if Google gave a cut to CelebrityNetWorth, as Spotify gives to artists. (While Google would bring CNW more requests this way, their price would also be depressed; Spotify doesn't pave the way to the riches. [1])

[1]: https://www.digitalmusicnews.com/2018/04/12/streaming-music-...

manish_gill · on Aug 10, 2020

I have Zero respect for the Google engineers and PMs who worked on this project, scraping content without attributing to the original source, and subsequent change in the search ranking of the original source.

A very bad taste in the mouth indeed.

dimsumsumdim · on Aug 10, 2020

I am not trying to defend Google or its practices. But how is the activity of CNW researchers different from what is Google doing? Unless the author elaborates on their method, "Educated guess" sounds like scrapping content without sourcing the author in the first place.

mrtksn · on Aug 10, 2020

AFAIK there's no other database that they scrapped from. Reading sources like news articles, documents etc. is very different from copying the database that someone else created.

It's like the difference of tapping Yelp's API and create a competing service and looking at Yelp, Google, Squirespace etc. to create a restorant recommendations website.

dimsumsumdim · on Aug 10, 2020

Humans can also scrap. If CNW employs researchers, they must be using sources. These sources are not referenced in their content. Either CNW is merely making up the net worth figures, or their researchers are pulling data from somewhere to provide information about 25,000+ celebrities. If the second is valid, they aren't that different from what Google is doing.

mrtksn · on Aug 10, 2020

I answered your question about how CNW researchers are different. Using sources to create a knowledge base for something is very different than hooking up to a knowledge base that was already curated and displaying it as your own without compensating the people who curated it.

Yes, sure. It would be nice of them to disclose their own sources too but I fail to see how this is related to Google straight out copying their curated database and showing it to its own users.

Like, why are we discussing it? What is the agenda here? Are we trying to conclude that CNW owners are also not that nice? Are you claiming that CNW copied your own research and demand justice too? If you explain what is your point here, maybe we can have a productive discussion instead of repeating the same stuff.

sidibe · on Aug 10, 2020

Understable not to include the sources after that email exchange because if they source the articles on their site Google could just cite those articles. What isnt clear to me is where the fake ones were attributed to in the info box. For the real celebrities it seems like they're mostly from Wikipedia which cites articles (who maybe look it up from CNW?)

arh68 · on Aug 11, 2020

Indeed. When you're in the business of taking IP from small companies and calling it yours, canaries start to choke. It's like making a pink version of anything: you're done, you're out of ideas, you have ceased to innovate.

switch11 · on Aug 10, 2020

agreed 100%

We went through something very similar

where Google 'disappeared' an entire category of apps from the App Store because one of our competitor/friend companies in the same area as us was growing very fast

That guy started talking about adding thousands of new users a day. Within 2 weeks Google had removed ALL (every single one) apps in that category

Also this is a search vertical. So it's the very definition of anti trust

*

anyways, it's good that people are talking about this

fakedang · on Aug 11, 2020

Which category, if I may ask?

errantmind · on Aug 10, 2020

It looks more like an ethical argument than a legal one. I don't understand how this data would be protected under US law. Can someone explain if you think otherwise?

Is the issue that the data was scraped? Or is the issue Google appropriated the scraped data for use in their search results?

From my perspective it just looks like a bad business model: spend a lot of time and effort estimating net worth, and then publish it on the open web. Publicly available data can be freely and legally scraped (and I wouldn't want it any other way): https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-l...

bachmeier · on Aug 10, 2020

> It looks more like an ethical argument than a legal one. I don't understand how this data would be protected under US law. Can someone explain if you think otherwise?

That question kind of misses the point. This was a US House hearing. The right question, in the context of such a hearing, is whether it should be legal. The other thing to keep in mind is that this was the action of a company with considerable market power. There's not much that's off limits in principle once it turns into an antitrust case.

Having said that:

> it just looks like a bad business model

seems about right to me. No other companies had done this, and that might be because of the difficulty of monetizing numbers. Few people are going to pay for the information, but it's ultimately just a number. Consumers are better off the faster they get the number they're looking for.

mike_d · on Aug 10, 2020

This has been settled case law for decades thanks to sports almanacs. Facts (like someones net worth) cannot be copyrighted. However a compilation of facts can be.

If CNW had published a book, Google purchased it off of Amazon, and paid an intern to retype it in to fact boxes - that is clear copyright violation.

The fact that Google bought the data from a third party data broker who scraped CNW does not absolve them of liability in my mind.

piscisaureus · on Aug 10, 2020

> Facts (like someones net worth) cannot be copyrighted. However a compilation of facts can be.

... in the US.

In the EU and UK there is separate copyright-like legislation that covers this, so-called "database rights".

https://en.m.wikipedia.org/wiki/Database_right

mthoms · on Aug 10, 2020

I'm not sure the data in question here would qualify as "facts" like a batting average or win percentage would. These are researched estimates. Different outlets would come up with different numbers depending on the researchers and techniques they employ. Ergo it requires a degree of skill and creativity.

But as you say, the point is moot. The collection itself should certainly be covered by copyright whether its contents are a mere collection of "facts" or not.

JimDabell · on Aug 10, 2020

> Facts (like someones net worth) cannot be copyrighted. However a compilation of facts can be.

It’s my understanding that this ceased to be true in the USA in 1991 and that a minimal degree of creativity is required [0]. Other countries have started following suit more recently.

[0] https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....

arcticfox · on Aug 10, 2020

IMO there's a huge difference between a list of names and numbers generated by a research team estimating someone's net worth, and a phone book.

One is a list of informed estimates, the other is a list of (essentially) facts.

Informed guesses are IMO essentially much longer pieces of creative content, just condensed into a number.

errantmind · on Aug 10, 2020

What you are saying about compilations makes sense, but was it a compilation in that sense? My interpretation was it was just a bunch of data spread over public pages on a website. It seems like if that constitutes a compilation, it would expand copyright by a lot

mthoms · on Aug 10, 2020

The case you referenced says scraping is legal. It does not say re-publishing the scraped data is legal.

>I don't understand how this data would be protected under US law. Can someone explain if you think otherwise?

For something to be copyright-able it needs to have an element of creativity. That's why you can't copyright facts. Arguably, the net worth estimates require a degree of creativity to come up with. They aren't "facts", they are estimations based on research.

arh68 · on Aug 11, 2020

It is not public nor commodity information.

> We generated content that could not be found anywhere else on the web and that had never existed previously in response to demand in the marketplace. The information we produce to this day is unique. It is not commodity information like the height of a famous building or the time of day.

To call it the open web makes it sound like the wild west, and it's not: are Google SERPs on the open web? Can I copy the search results for 1000 words and paste it on my website? No.

philprx · on Aug 10, 2020

It's the business model of many online blogs and sites, and get payment through ads. Of course it's not a really strong model now that ads revenue are going down. It was a strong one not so long ago.

Now it's clearly IPR theft, present it how you like, but theft is theft.

Plus it's not really smart from Google given their prior contact and request to this company on APIs and providing it for free.

Now the real question: Is Google going to go away with it blaming it on "rogue employee" and continuing on dispensing huge sum of cash and free advisory to gov and internet NGOs in their constant advocacy effort...

errantmind · on Aug 10, 2020

>> Now it's clearly IPR theft, present it how you like, but theft is theft.

I don't quite see the legal basis for it being theft. Can you explain?

philprx · on Aug 10, 2020

IANAL:

1. Google doesn't show in-line'd data box with Net worth

2. Google asks site (CNW) for access, is denied.

3. Google then shows site's data in data box (with CNW site proof of theft with canary tokens being the fake celebrity's net worth) that are now presented as Google's data (no source or fake source), without license from original site.

If that isn't theft of data, what is it?

From the PDFs, there is no license for free re-use ;-)

While I'm all-in for freedom of information and open source documentation license or open data licenses, I fully understand the need of businesses to protect their intellectual property and rights.

teddyh · on Aug 10, 2020

You keep calling it “theft”. It is obviously not literally theft, since no physical object has been taken. The only thing you could possibly mean is “copyright infringement” (which is commonly called theft in imprecise and inflammatory language), but copyright does not apply to a simple fact like a dollar amount, no matter how much research lies behind it.

You might be of the opinion that it should, but that is a different argument, which you did not make.

philprx · on Aug 10, 2020

Ok, sorry sir ;-) Copyright infringement. What's the difference really here?

IANAL means this precisely (and on a side note, I'm here for open exchange and debate, not for claiming to be right ;-)

Interestingly, depending on jurisdictions, there are places on the globe (EU, FR) where I think the terminology is theft not copyright infringement ("theft of database"), even if it does not apply to this US National dispute. Now let's see how this goes with the US gov.

teddyh · on Aug 10, 2020

> Copyright infringement. What's the difference really here?

Upthread, the user “errantmind” asked you to provide your legal basis for it being “theft”. It’s obviously not legally speaking theft, since no physical object has been taken. But let’s be a bit charitable and assume that you meant “copyright infringement” instead. However, you have not made the case for it being copyright infringement either, since Google’s copying (which you do describe in detail) did not, AFAIK, infringe copyright (at least in the US, which is where the investigation referred to in the document is being held).

If you want to argue that what Google did is illegal, then please specify what actual law was broken.

On the other hand, if you instead want to argue that what Google did should be illegal, then please, go ahead and present your case, but this was not the original topic.

thedevilslawyer · on Aug 10, 2020

There is a significant difference, and an entire activism movement in ensuring this is recognized. Thanks for calling out (and I hope realizing) the difference.

lalaland1125 · on Aug 10, 2020

This isn't theft of data or copyright infringement because you can't copyright facts in the US.

mthoms · on Aug 10, 2020

These aren't facts though. They are researched estimates. Different researchers will come up with different numbers.

lalaland1125 · on Aug 10, 2020

All facts are well researched estimates. Our number for the speed of light didn't come out of the ether: someone came up with a very sophisticated experiment to get a highly accurate estimate and different experiments get different estimates. Do you think the speed of light should be copyrightable?

mthoms · on Aug 10, 2020

The speed of light is absolutely not an estimate. It's a mathematically provable fact.

Facts (in this context) are numbers that are not reasonably disputable. The value of pi, the speed of light and my current phone number are all facts.

What I happen to think the net worth of someone else is, is certainly not a fact. You might think it's something else. It would be difficult to prove either one of us right.

lalaland1125 · on Aug 10, 2020

If you don't believe that the speed of light is a physically measured quantity, then replace that question with whatever physical constant you prefer (charge of an electron, gravitational constant, current mass of the earth, etc, etc). Do you think those numbers should be copyrightable? The charge of the electron is particularly interesting as there was actually a lot of debate over that number because some of the original experiments were flawed.

mthoms · on Aug 10, 2020

>If you don't believe that the speed of light is a physically measured quantity, then replace that question with whatever physical constant you prefer

Wait, what? You asserted the speed of light could only be estimated. Not me. A "physical constant" is, by definition, a fact and not copyrightable.

You're now arguing the opposite point you were arguing two comments ago ("all facts are well-researched estimates"). What are you trying to do?

tripletao · on Aug 10, 2020

Your distinction between "measure" and "estimate" seems meaningless to me, and probably to most scientists and engineers. All physical measurements have uncertainty, and many people are engaged as we speak in creative (in the colloquial sense of the word) work seeking to reduce that uncertainty. We tend to say "measurement" when the uncertainty is negligibly small for whatever purpose is at hand, and "estimate" when it's not; but that distinction is as much in how the number will be used as in the number itself.

As to specific examples, the speed of light and charge on an electron are tricky because they're both exact by definition in SI today; but considerable creativity was required to get those definitions. The mass of the Earth is uncertain in kg today, and creative work is ongoing to decrease that uncertainty. So does that make a new estimate of the mass of the Earth copyrightable? Perhaps just the last, previously-unknown digits would be copyrightable?

I'm not aware of case law on anything too closely resembling physical constants; but I'm near-certain the answer if litigated would be no, and I'd guess the argument would be something like "you're uncovering some preexisting truth, not creating something new (no matter how 'creative' your approach to the uncovering)". Returning to the original question, the CNW guy presumably got competent legal advice and concluded the copyright angle wasn't worth pursuing, and instead focused on the antitrust angle in his testimony.

mthoms · on Aug 10, 2020

>So does that make a new estimate of the mass of the Earth copyrightable?

Likely not on its own (due to fair use). But in a "collection" of estimates for the mass of all known celestial bodies then yes, it absolutely would be copyrightable. The same way a collection of stock price targets produced by a research company are copyrightable.

I agree with the rest of your comment in general but it's relevance to the discussion is questionable. In court, the speed of light would certainly be "a fact" unless it was expressed to a precision beyond the margin of error, and that margin of error was relevant to the copyright claim. That's getting way off track from the original point since we'd be talking about a singular number instead of a collection of numbers.

Back to the original discussion: A collection of facts is not copyright-able. The poster said the collection of wealth estimates was factual and therefore not copyright-able. That's provably false since another, equally skilled researcher would come up with different numbers and they can't both be factual. Right?

>the CNW guy presumably got competent legal advice and concluded the copyright angle wasn't worth pursuing

You're trying to imply that this means he had no legal basis for his claim. I find that very disingenuous. Besides, there's no need to speculate on why he didn't pursue legal action, the reasons are clearly outlined in the write up.

Edit: I see now why the "speed of light" example is not a good one since the meter is derived from the speed of light. Perhaps pi is the best example of a simple, agreed upon, physical constant or "fact" for the purpose of this debate.

tripletao · on Aug 10, 2020

Is there case law on the price targets? That one seems yet hazier to me, given the inherent uncertainty of predictions vs. (still uncertain) estimates of current truths. I guess in a deterministic universe they're the same thing, but that seems a bit too philosophical for most lawyers' taste.

In any case, large companies with near-infinite legal budgets still don't get to (openly) violate well-settled law, since judges do see what's happening and seek to minimize the burden on their opponents--and even award legal fees where possible, which it probably would be here under the Copyright Act. Their advantage comes when the law is at least slightly unclear and a victory would be worth a lot more dollars to the large company (for the precedent) than to their opponent. I believe that's the situation here, and the reason why CNW chose not to pursue Google for copyright infringement; though even with matched legal teams, I'd still bet that CNW would lose.

Finally, if that mass of the Earth isn't copyrightable, then no concept of fair use exists, because that concept exists only for copyrighted material; so I don't think your statement there makes sense. If a copyright for the CNW dataset did exist (which I doubt, but maybe), then Google could still argue fair use. I'd bet Google would lose that one, since they're using the whole work, without transformation of the original, in a way that destroys the commercial value of the original.

philwelch · on Aug 10, 2020

“Researched estimates” that in CNW’s case are virtually fictitious. To me, the scandal is Google enabling and boosting this SEO spam in the first place.

mthoms · on Aug 10, 2020

I'm working on the assumption that the testimony provided to congress is truthful.

The research may well be fictitious, but there is no evidence of that.

teddyh · on Aug 10, 2020

Is the fact “CNW has estimated that celebrity A has a net worth of $X” copyrightable? I sincerely doubt it.

bryanrasmussen · on Aug 10, 2020

actually this is interesting right here, because if Google takes CNW's fake celebrity, then that is definite copyright infringement - the fake celebrity was created by CNW and as such is not a fact so Google can't argue that it's not copyrighted because facts are not copyrighted.

lalaland1125 · on Aug 10, 2020

That's not how the courts view these issues. Fake facts hidden within real facts have been litigated many times and has always been shown to be uncopyrightable. See https://en.m.wikipedia.org/wiki/Trap_street#Legal_issues.

bryanrasmussen · on Aug 10, 2020

You are correct, about Trap Streets. I'm not sure if the language is fake facts hidden in real facts, or fake streets in maps. Because it seems pretty specific to trap steets.

If the language is indeed not general beyond trap streets - if I create a fake celebrity and data regarding them, I would argue that is more like making a story. A trap street is a small line on the map (generally hidden somewhere unimportant) and a couple words. A fake celebrity requires much more content.

From the PDF "I added fivecompletely conjured celebrities to the site. I used stock photos with entirely made up names, biographies and net worth numbers. I published the pages backdated several yearsso they were nearly invisible to the world."

Now, let's look at a real celebrity on their site https://www.celebritynetworth.com/richest-celebrities/actors...

It seems reasonable to assume from the pdf that they put the kind of effort into an article on the fake celebrities to make them indistinguishable from real celebrities, it may be decided that this is the same as a trap street. But I don't think it is just a foregone conclusion.

Aside from all that - https://www.theguardian.com/uk/2001/mar/06/andrewclark they might be able to sue in some other place than the U.S and find a more sympathetic court.

dunnock · on Aug 10, 2020

You would not expect of such big business to behave in a different way. What we consumers can do is to try to use alternative services where we can, e.g. I am happy with using Firefox instead of Chrome

jtxx · on Aug 10, 2020

same + duckduckgo & protonmail

not2b · on Aug 10, 2020

Many commenters are interpreting this as strictly a matter of copyright violation, and are asking the question of whether facts can be copyrighted. But that's too narrow a way of looking at this; there's also antitrust law. If Google scrapes everything interesting off of other sites and presents it as their own, even if it finds a way to do this that does not violate copyright, it's anti-competitive behavior that may violate antitrust law.

(I'm not addressing the European database protection rules which may also apply).

rjkennedy98 · on Aug 11, 2020

> Earlier this year, within weeks of the publication of a Wired magazine article that included quotes from me and a recap of our story, the CNW mobile app was banned from the Google Play store without explanation or recourse

A lot of talk here about the scrapping and capturing of the ad revenue, but I thought this was the most scary part. It’s one thing to capture ad revenue from SEO companies, it’s another thing to act like the mob and start taking retribution against anyone that speaks out against you.

Wheaties466 · on Aug 11, 2020

I think thats the point of a monopoly. They have such a large overarching reach that they don't need to respond in a direct way that could be proved to be retribution. They respond in another avenue that affects your business in another platform they control, in ways they don't have to explain.

iJohnDoe · on Aug 10, 2020

I think many would be surprised at the trivial and childish changes that Google does to hurt web sites.

Many don’t speak out for fear of further being punished by Google.

dmcbrayer · on Aug 10, 2020

There's evidence of Google engaging in similar behavior with regard to song lyrics on Genius.com: https://www.rollingstone.com/music/music-news/genius-google-...

I think the points about people voluntarily building their businesses on Google's platform, and being thus beholden to Google's whims are well-made. The law, as it exists, doesn't seem to prevent that kind of thing since they're all private businesses, and this behavior benefits the consumer (arguably) even if it does hurt other businesses. Antitrust law (in the US) seems ill-prepared for the 21st century.

I tend to get interested in the policy concerns. Is web search so important that it should be considered a public utility, and thus regulated by the government? I think there's a case to be made there.

DSingularity · on Aug 11, 2020

Wow! Holy smokes. What a terrible company.

Oddly enough, as their practices may destroy the internet perhaps they will eventually destroy their own livelihood.

Speaking for myself, I no longer search for product reviews and comparisons. Why waste my time when all the content is fake? Probably fake due to the Google search ranking tweaks. Now I go straight to the sellers.

What is sad in all this is all the great people that work for such shitty companies and what that implies regarding the state of our morality as computer scientists and engineers.

dave_aiello · on Aug 11, 2020

The impression I got from reading comments here is that CelebrityNetWorth.com isn't much more than a compilation of net worth figures in a database. I don't think this is a fair analysis of the holistic value of that website.

For instance, when I look at the analysis of Tim Cook, https://www.celebritynetworth.com/richest-businessmen/ceos/t..., who's in the news this week partly because Bloomberg reported that his net worth has reached $1 billion, https://www.bloomberg.com/news/articles/2020-08-10/apple-s-c..., I see the value of CNW's news articles. The CNW news article has a similar value to the analysis that's present in the Bloomberg article reporting on the same thing.

In terms of how much of the intrinsic value of CelebrityNetWorth's website Google should be allowed to report directly in its search results, I'd say it would be fair to report CelebrityNetWorth's estimate of a celebrity's net worth so long as Google clearly attributed the data to the site that is their basis, and provided a link to the original information.

But in my opinion, the real value of CelebrityNetWorth.com in 2020 is the curated biographical content about the celebrities and their analytical pieces presented in context with their net worth estimates.

havelhovel · on Aug 10, 2020

People are saying this was a bad business model, but looking at the numbers it clearly wasn’t—-unless we’re also acknowledging that the bad part of the model was relying on Google to stay true to its stated policies.