Pretty much all these engines (and PG) are fast enough.
Gist/Gin indexes are great and do a fine job to make millions of records searchable in very few milliseconds.
The problem is accuracy.
I’ve tried a few of these and accuracy is wildly different with most of them.
Accuracy depends on how much you index.
For example, even with decently designed weighting: If you index title, subtitle, tags and content — too much of content ruin relevancy.
And yes, we have proper relevancy sorting setup nicely.
The best is something like elastic search, but it does not integrate nicely with PG for our use case. Because of the multi tenant nature of our data setup.
PG will get us to that magic 80% but that leaves the all important 20% which is not great.
Not to mention the inability to index and search Asian character sets.
So even though we exclusively use it, it’s not great and every time our team can’t find something that drives my team into the Psql CLI — I start searching for alternatives again.
The big brick wall is updating, inserting and deleting from an external solution fast enough so we don’t miss stuff.
And quickly searching multi tenant records in that external solution.
And no I’m not ready to use ES as my primary database.
Yup this is exactly the problem, which is why I'm surprised with such decent extension support and heavy use no body has created something that builds better analysis and query pipelines specifically for full text search on top of these databases to solve the relevancy issues.
That's really cool but that's not quite the solution to this problem that I would like.
I haven't used ZomboDB but I have managed plenty of applications where Postgres was the main db and elastic was used for FTS. Zombo looks like it makes it easier to do that type of setup, but Postgres is so high performance (at least at parity on speed w/ Elastic) and Elastic is such a pain in the ass to manage from a DevOps perspective that I'd like to eliminate the need for Elastic by investing some time into Postgres Extensions.
Plenty of great db solutions have come out of Postgres in the last few years based on extensions and fulltext search is one of the areas that has been very quiet, I think we can do better and I'd like to try.
I think in the interim Zombo looks like a really good stopgap though!
Here I'm listing engines based on https://github.com/quickwit-oss/tantivy - tantivy is comparable to Lucene in its scope - but I'm sure there are other engines that could tackle ElasticSearch.
Another thing that could happen is maybe directly embed tantivy in Postgres using an extension, perhaps this could be an option too.
Ah, I guess this term is overloaded. I am not familiar with the specific PG offerings, but more complex search engines are often based on indexing pairs of words (bigrams) as opposed to individual keywords. Pairs of words together are much more meaningful.
Gist/Gin indexes are great and do a fine job to make millions of records searchable in very few milliseconds.
The problem is accuracy.
I’ve tried a few of these and accuracy is wildly different with most of them.
Accuracy depends on how much you index.
For example, even with decently designed weighting: If you index title, subtitle, tags and content — too much of content ruin relevancy.
And yes, we have proper relevancy sorting setup nicely.
The best is something like elastic search, but it does not integrate nicely with PG for our use case. Because of the multi tenant nature of our data setup.
PG will get us to that magic 80% but that leaves the all important 20% which is not great.
Not to mention the inability to index and search Asian character sets.
So even though we exclusively use it, it’s not great and every time our team can’t find something that drives my team into the Psql CLI — I start searching for alternatives again.
The big brick wall is updating, inserting and deleting from an external solution fast enough so we don’t miss stuff.
And quickly searching multi tenant records in that external solution.
And no I’m not ready to use ES as my primary database.