Sharding Pgvector

alexgarcia-xyz · 2025-03-26T18:29:03 1743013743

I really dig more content about how vector databases/tools handle problems like this!

In sqlite-vec, there's only a flat brute-force index (though DiskANN/IVF will be coming soon). But we do have a concept of partition keys[0], which allows you to "internally shard" the vector index based on a user_id or any other value.

   create virtual table vec_documents using vec0(
      document_id integer primary key,
      user_id integer partition key,
      contents_embedding float[1024]
   )

Then at query time, any WHERE constraints on a partition key are pushed-down, so only matching vectors are searched instead of the entire index.

  select
    document_id,
    user_id,
    distance
  from vec_documents
  where contents_embedding match :query
    and k = 20
    and user_id = 123; -- only search documents from user 123

Definitely not as performant as a proper vector index, but a lot of real-world application have these natural groups anyway. Like "search only English documents" or "search entries in this workspace only", or even "search comments only from the past 30 days."

Internally sqlite-vec stores vectors in "chunks", so when partition keys are definds, every chunk is associated with a unique combination of all partition keys. Kinda hard to describe, but if you create a vec0 virtual table and insert some values, you can inspect the internal "shadow tables" in the SQLite database to see how it's all stored.

[0] https://alexgarcia.xyz/sqlite-vec/features/vec0.html#partiti...

ankitml · 2025-03-27T02:51:16 1743043876

Well, I have a brute force strategy for pgvector working reasonably well. Individual, partial indexes. It works for all those queries with category_id=<> clauses. You only need an index for larger categories, for categories with rows below a threshold you dont need index a KNN/dot product would work.

jeffchuber · 2025-03-26T19:06:21 1743015981

This will work very poorly when your data is changing because the centroids degrade and you'll have very poor recall but likely not know it unless you are also monitoring recall.

I didn't see this in the write-up, so adding it here as a common foot gun.

levkk · 2025-03-26T19:10:04 1743016204

Good call-out. The article mentions using a representative sample, but I should definitely put it somewhere on top.

redskyluan · 2025-03-26T18:59:32 1743015572

sharding is a bad solution for any databases, especially vector database. See https://milvus.io/blog/why-manual-sharding-is-a-bad-idea-for...

isoprophlex · 2025-03-26T19:47:01 1743018421

That's an ad, plain and simple, and as such can't be taken seriously.

> When we reached 100 million vector embeddings, query latency spiked to over a second, something far beyond what our customers would tolerate.

I don't know what else they were doing but 1000 msec latency for 100 million embeddings... that's definitely not on pgvector.

redskyluan · 2025-03-27T01:29:31 1743038971

What’s the largest deployment you’ve maintained in production?

If you’ve had the chance to manage one—especially while handling concurrent insert and search operations—you’ve probably felt the pain firsthand and understand just how critical scalability really is.

Insert into PGvector is super slow when data accumulate and the index build will take forever

ankitml · 2025-03-27T02:44:55 1743043495

Depends on the hardware, sharding layout, bulk/individual insert and table bloat. Looks like you are missing pg_repack..

cluckindan · 2025-03-26T21:50:38 1743025838

The article talks about manual sharding, not sharding in general.

charliereese · 2025-03-26T20:23:17 1743020597

Thanks for writing about this :)