If you read my comment carefully, I’m referring to "word2vec and co", meaning si...

If you read my comment carefully, I’m referring to "word2vec and co", meaning similar projects from other sources that use the same approach, and for which similarly large pre-trained corpi exist.

My general criticism is that it’s easy to invent a new method, or popularize an existing method that’s superior. But that’s not the hard task. Building the algorithm is pretty easy.

Getting a corpus of colloquial, professional, and slang texts in dozens of languages, and enough computational power to actually train them all, is usually the hard part for us doing independent research, or working on open source projects without major corporate sponsors. The reason we use word2vec and similar solutions is not because of the approach it uses, but just because it provides a well-working complete package.