Git-like version control requires a Merkle DAG. Unless you know something I don't, there are no RDBMS products that incorporate a Merkle DAG for storage. Dolt is the first.
Table data is stored in a cross between a Merkle DAG and a B Tree (a prolly tree), which is what makes diff / merge performant and scalable. We didn't invent these data structures but we believe we are the first to build a SQL database on them.
This is false, you are conflating the abstract algorithm with a narrow implementation. That's like saying the only possible sorting algorithm is quicksort.
With all due respect, you seem to be only loosely familiar with database architecture, both theory and historical practice. Nothing you've described is actually novel. That you are unfamiliar with why no one builds things this way, despite many attempts, does not lend confidence.
I am actually a big fan of people trying unorthodox approaches to databases that have never been tried before, this just isn't such an example. Which doesn't make your approach wrong per se, but it leaves you exposed to learning why other people tried and abandoned it.
Tangentially, the "prolly tree" is intrinsically not a scalable data structure. That may satisfy your design requirements but I can't tell.
I agree with everything you have said in this thread, the one area I would take a slightly different approach with is that some of the best algorithmic advances do come from people and teams looking at a problem anew -- without examining all previous works and failures.
Are they likely to find a way to jump forward? no. Are they likely to fall into the same traps and dead ends that are well documented in this space? yep. But in general I believe we should temper the negative feedback with constructive details:
Hey have you seen this previous and very similar work in the same space, here is some info where and how they failed to scale.
Taking a stance (even an implied one) that this is not novel and therefore you should stop could have been made against many of the teams working in vast areas of algorithms and architectures that have lead to jumps forward. CS is and should be science and to that end, our current knowledge and understanding is simply what is known today and must remain mutable to move forward. While their current approach is not scalable and a dead end, who knows if this experience will lead them to identify previously unrealized solutions (even if partial).
> you are conflating the abstract algorithm with a narrow implementation
They're literally telling you Git uses a Merkle DAG and they wanted to recreate Git so they used a Merkle DAG. That's not conflating, it's copying.
> you seem to be only loosely familiar with database architecture
Based on what? The only comments GP has made about DBA in this entire HN thread is "this is not MVCC", which is correct.
> I am actually a big fan of people trying unorthodox approaches to databases that have never been tried before
So stop discouraging the neophyte? Sometimes innovation requires the ignorant to figure things out without knowing better, because that way they won't quit before they've begun. Let them figure it out. And if it's not novel, who cares? It doesn't have to be some perfect architectural masterpiece to solve people's problems. If it works well enough just for WordPress blogs, that's pretty great already.
Sigh. Databases seems to be one of the last bastions of ivorytower-ism in software engineering.
Databases aren't magic. They are software. And as you say, there is a deep well of literature, and almost all interesting modern databases are open source. So it's possible for anyone with the interest and motivation to learn how they work, and yes, improve them.
> With all due respect, you seem to be only loosely familiar with database architecture, both theory and historical practice. Nothing you've described is actually novel. That you are unfamiliar with why no one builds things this way, despite many attempts, does not lend confidence.
You claim that there is all this information out there saying why this won't work, and that it has been tried, but don't point to any of it.
> Tangentially, the "prolly tree" is intrinsically not a scalable data structure.
First: Dolt publishes industry standard performance benchmarks and they are an average of 4.4x slower than MySQL:
This is using the original storage format from noms which wasn't written for oltp workloads. A new storage format is coming which is and will dramatically improve this:
In any case 5x slower already shows, experimentally, that this approach works for database-style problems.
===
Second: In a purely algorithmic sense, prolly trees are the definition of scalable. They are log(n) (with a large base) for inserts, deletes, and seeks and they have efficient ordered scans. They have basically the same algorithmic complexity as B Trees, B+ Trees, LSM trees, etc -- very similar properties to the data structures used by other databases. The problem with prolly trees is actually the reverse: they are scalable, but have large constant factors due to the overhead of hashing.
But a single-digit constant factor slower performance than MySQL but with versioning seems like a great product for many applications.
If anyone reading this is interested, the Dolt team did a great job writing up the how prolly trees work and how they compare to classic databases here:
Tip: I'd read the jandrewrogers comments before engaging. He actually does know more than you. He makes some unusual claims then doesn't back them up because he has no need to. I've spent and hour or three and learnt nothing because he's so far ahead I can't keep up.
Perhaps read his profile about the amazing thing he's invented such as http://www.jandrewrogers.com/2015/10/08/spacecurve/
Let me quote a bit. I admit, I don't understand it but then how could I?
Algorithm design using topology manipulation can be enormously challenging to reason about. You are often taking a conceptually simple algorithm, like a nested loop or hash join, and replacing it with a much more efficient algorithm involving the non-trivial manipulation of complex high-dimensionality constraint spaces that effect the same result. Routinely reasoning about complex object relationships in greater than three dimensions, and constructing correct parallel algorithms that exploit them, becomes easier but never easy.
Incredible! I wish I could grok this stuff.
You should save your breath and just get out of the way of the real experts.
Table data is stored in a cross between a Merkle DAG and a B Tree (a prolly tree), which is what makes diff / merge performant and scalable. We didn't invent these data structures but we believe we are the first to build a SQL database on them.
https://docs.dolthub.com/architecture/storage-engine/prolly-...