Fast diffing of objects may be seen as rarely needed to a regular user, but it’s very important part of handling vdom/vnode diffs. Recently I researched few lesser web ui libraries for my own use and found out that many of them do not diff vnodes correctly, e.g. in case of RegExp or Date arguments. I don’t remember exact library names vs flaws, but my conclusion was that almost all of non-actively supported libs had it in one way or another. If you want to explore, check that first, because it’s a surprising pain point.
But vnodes have a predefined shape, whereas this library is for diffing (relatively) arbitrary objects, right? If your data isn't arbitrarily shaped, I would assume you'd get much better performance by just implementing a simple direct diff, no?
Yes. I should have noted for clarity that vdom diff is a special case, not a general one like that of subj. But their similarity reminded me of the experience.
And, often but probably not always, using a class instance rather than a POJO. Seems unintuitive (function call overhead), but it can signal to the JIT that the object’s type is relatively stable.
I maintain symmetry[1] and wanted to try the benchmark against it, but the results are very inconsistent. Symmetry is anywhere between 50% slower to 20% _faster_ than microdiff on the small object benchmark.
Symmetry doesn't get a lot of activity, but I've been using it in production for many years. It does some more work on array diffing, which this benchmark doesn't cover, by implementing part of Myers' algorithm.
In all seriousness, what's so noteworthy about this? It's one trivial 50 line recursive function. This is kind of stuff devs just type out instead of searching for a npm package, because it's faster that way.
You only see the finished product, not the 100 failures and incremental improvements over time. You could certainly write something yourself from scratch, but I doubt it would be competitive against this.
> not the 100 failures and incremental improvements over time
This code has 3 actual code commits over 6 days, none of which changed anything fundamental. What are you talking about?
It's quite possible to compete against this on multiple levels. One would be correctness. The other a bit less wasteful interface. Another would be handling arrays in a more useful way (or at all). And yet another would be safety. Another would be speed and memory use. Yet another
would be non-recursive implementation. Another would be documentation (what's this diffing? enumerable/non-enumerable properties? how it deals with getters? how about inherited properties? - I can't know any of this without looking long and hard at the code)... etc. etc.
Note on only the first part: the author might have tested bunch of stuff on their local without commiting until they've found something worth, perhaps? That would result in very few commits.
Mmm, agree with parent; can get faster. Unfortunately lots of perf critical code is created out of necessity and defaults to proprietary.
You could profile this lib and figure out where it’s spending most of its time and figure out if there any ways to improve (starts getting into implications of JS <-> system), but as you get more idiosyncratic you lose flexibility and ease of use.
Eg type checks, heap additions, recursion, etc can get “slow”
I wouldn't be able to write this. I mean, I could, but I lack the knowledge of Typescript and willingness to dedicate the needed time to create this while I have got other things to do.
The code is clean, well organized and nice to look at, easy to read through and understand, and it's great that someone has made the effort to create this and share it for us to use.
I'd use standard String.prototype.padStart. For anything more complicated, but not too complex (say "libraries" with 100-300 lines of code), it's very tedious and time consuming to find something on npm that satisfies the specific contraints of my app precisely. This certainly fits in that category.
For libraries above 1-2k lines of code that can be quite complicated (eg. things like database connectors), it starts being reasonable to mold expectations of my app around the library.
Also using coherent large utility libraries is something worthwhile (or was in the past), libs like underscore/lodash, etc.
But composing app from random underdocumented 2 function "libraries" from npm is just pain if you know what you're looking for. 1) you can't trust the code, so 2) you have to read it anyway and reading is often slower than writing
Searching for a package on npm is more time-consuming than writing 300 lines of code? Perhaps your app has some very specific constraints, but for a simple library like this, I have a hard time believing you could code something up on the fly faster than it would take to npm install the package in question.
I have absolutely written a diffing algorithm like this faster much than it would take to find an appropriate version on npm.
Every so often I try to reach for an npm library, only to find out that it's suboptimal, doesn't handle my edge cases, or just plain worse than the code I would have written.
The last time I tried to do this, I wanted a hash map with custom equality functions on the keys, since JS doesn't have one built-in by default. I searched for about 2 hours and didn't find a single one that actually worked (several claimed to be, but actually didn't work...)
So I wrote my own in about 100 lines. [0]
Data structures and algorithms tend to be really awful on npm.
In the long term, having to manage all these dependencies, keeping track of incompatibility, testing updates, or worse, the lack of management, is time consuming.
Let's say I maintain dozen of packages, then I would rather like to write it once and include it everywhere. I could copy and paste all the common functions instead of including, but if I notice a bug or have to upgrade the code for some reason, then I have to edit the code dozen times. But if I only have to maintain one package, then I prefer the copy&paste approach.
I agree it can be pretty easy to set up a basic version of this, but eliminating edge cases, optimizing, and unit testing can make it take significantly longer.
I know some diffing libraries are reluctant to do this for perf reasons, but many of them do not handle cycles correctly leading to potential infinite loops based on the shape of the objects and order of arguments.
Fast is great, but unless you know that your data doesn't have cycles, infinite loops take a long time :)
What if object references have loops? That doesn't seem to be handled. Also it looks like it will diff not just own properties but properties from prototype objects, too.
Hey this is really cool! Not sure how this compares to https://github.com/benjamine/jsondiffpatch which has been unmaintained for some time. But if this could replace it I'd happy to start using it.
Thanks for your interest! Currently, Microdiff does not have a patching functionality by default, and the API is different. If there is enough interest, though, I might make a compatibility layer to ease the differences, and it probably would not be hard to do yourself too.
I wonder how it plays with React(Native)/Redux. I'm having a bottleneck issue with state changes with many state comparisons and wondering if I can swap the builtin diff with this to speed up the diffs.
(I admit there are many other parts to optimize too, but if I can just plug this for some extra juice, why not?)
Usually it doesn’t make a difference neither in speed, nor in size, because of “interning” of literals. Unless you jsonify it, of course. It could show up, if you’d later compare .type with some “CRE”+”ATE” values made funny way, but not for literals, e.g.:
// x.js
export const foo = "foo"
// y.js
if (x.foo === "foo") {
// ^ as cheap as int
}
const foo2 = "FOO".toLowerCase()
if (x.foo === foo2) {
// ^ questionable
// depends on jit/optimization
}
Is this a true deep diff utility? Didn’t check the code, but the readme makes no mention of depth capability beyond comparing to some other deep diffing libs.
In Javascript, forEach, map, reduce and other similar array methods are considerably slower than a regular for loop. You can see [0] and the linked questions for more technical details.
Function callbacks are definitely slower. You could probably shave another 10% by changing the array.push() calls to direct assignments: array[array.length] = value. Same with the nested array.map, changing to a for loop.
Seems that this particular library is aware of these potential issues (which motivated me to write this comment): https://github.com/AsyncBanana/microdiff/issues/2