Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The fastest object diff library in JavaScript (github.com/asyncbanana)
100 points by AsyncBanana on Nov 6, 2021 | hide | past | favorite | 44 comments


Fast diffing of objects may be seen as rarely needed to a regular user, but it’s very important part of handling vdom/vnode diffs. Recently I researched few lesser web ui libraries for my own use and found out that many of them do not diff vnodes correctly, e.g. in case of RegExp or Date arguments. I don’t remember exact library names vs flaws, but my conclusion was that almost all of non-actively supported libs had it in one way or another. If you want to explore, check that first, because it’s a surprising pain point.

Seems that this particular library is aware of these potential issues (which motivated me to write this comment): https://github.com/AsyncBanana/microdiff/issues/2


But vnodes have a predefined shape, whereas this library is for diffing (relatively) arbitrary objects, right? If your data isn't arbitrarily shaped, I would assume you'd get much better performance by just implementing a simple direct diff, no?


Yes. I should have noted for clarity that vdom diff is a special case, not a general one like that of subj. But their similarity reminded me of the experience.


Ah that's my mistake, I read more into your comment than you wrote. Sorry!

It's a good point about edge cases in diffing. Dynamic value diffing in dynamically-typed languages is an interesting problem to think about.


And, often but probably not always, using a class instance rather than a POJO. Seems unintuitive (function call overhead), but it can signal to the JIT that the object’s type is relatively stable.


Are these RegExps/Date objects passed as props? I’m not sure why a VDOM algorithm would have to handle these types of objects.


I maintain symmetry[1] and wanted to try the benchmark against it, but the results are very inconsistent. Symmetry is anywhere between 50% slower to 20% _faster_ than microdiff on the small object benchmark.

Symmetry doesn't get a lot of activity, but I've been using it in production for many years. It does some more work on array diffing, which this benchmark doesn't cover, by implementing part of Myers' algorithm.

[1]: https://github.com/Two-Screen/symmetry/


Here is the microdiff benchmark wrapped in benchmark.js: https://github.com/stephank/js-diff-benchmarks

I added three more libraries, and also the size in bytes of `JSON.stringify(result)` for each. (That was also important for me in making symmetry.)

The results from benchmark.js are a lot more consistent on my laptop:

Benchmark: Small object (baseline)

- @n1ru4l/json-patch-plus x 1,426,241 ops/sec ±0.29% (94 runs sampled) (94 bytes)

- deep-diff x 337,122 ops/sec ±0.45% (95 runs sampled) (193 bytes)

- deep-object-diff x 2,138,118 ops/sec ±0.25% (96 runs sampled) (60 bytes)

- diff x 59,277 ops/sec ±0.60% (97 runs sampled) (268 bytes)

- jsondiffpatch x 841,880 ops/sec ±0.38% (96 runs sampled) (113 bytes)

- microdiff x 6,979,471 ops/sec ±0.77% (95 runs sampled) (171 bytes)

- symmetry x 8,666,619 ops/sec ±0.21% (97 runs sampled) (7 bytes)

Benchmark: Large Object (300 properties)

- @n1ru4l/json-patch-plus x 16,902 ops/sec ±0.66% (92 runs sampled) (515 bytes)

- deep-diff x 8,378 ops/sec ±0.68% (96 runs sampled) (1194 bytes)

- deep-object-diff x 19,647 ops/sec ±0.40% (94 runs sampled) (410 bytes)

- diff x 737 ops/sec ±0.45% (94 runs sampled) (12051 bytes)

- jsondiffpatch x 14,306 ops/sec ±0.76% (95 runs sampled) (714 bytes)

- microdiff x 22,131 ops/sec ±0.25% (95 runs sampled) (935 bytes)

- symmetry x 21,432 ops/sec ±0.92% (98 runs sampled) (424 bytes)


In all seriousness, what's so noteworthy about this? It's one trivial 50 line recursive function. This is kind of stuff devs just type out instead of searching for a npm package, because it's faster that way.


You only see the finished product, not the 100 failures and incremental improvements over time. You could certainly write something yourself from scratch, but I doubt it would be competitive against this.


> not the 100 failures and incremental improvements over time

This code has 3 actual code commits over 6 days, none of which changed anything fundamental. What are you talking about?

It's quite possible to compete against this on multiple levels. One would be correctness. The other a bit less wasteful interface. Another would be handling arrays in a more useful way (or at all). And yet another would be safety. Another would be speed and memory use. Yet another would be non-recursive implementation. Another would be documentation (what's this diffing? enumerable/non-enumerable properties? how it deals with getters? how about inherited properties? - I can't know any of this without looking long and hard at the code)... etc. etc.


Note on only the first part: the author might have tested bunch of stuff on their local without commiting until they've found something worth, perhaps? That would result in very few commits.


Nah, this just really isn’t a very complex or noteworthy library. There’s not that many edge cases to test. Just look at the test file.


Mmm, agree with parent; can get faster. Unfortunately lots of perf critical code is created out of necessity and defaults to proprietary.

You could profile this lib and figure out where it’s spending most of its time and figure out if there any ways to improve (starts getting into implications of JS <-> system), but as you get more idiosyncratic you lose flexibility and ease of use.

Eg type checks, heap additions, recursion, etc can get “slow”


>Microdiff aims for working for 99% of use cases, instead of making every edge case work.[1]

That's fine, of course, but probably should be in the README.

[1] https://github.com/AsyncBanana/microdiff/issues/2#issuecomme...


I wouldn't be able to write this. I mean, I could, but I lack the knowledge of Typescript and willingness to dedicate the needed time to create this while I have got other things to do.

The code is clean, well organized and nice to look at, easy to read through and understand, and it's great that someone has made the effort to create this and share it for us to use.


Would you rather implement `leftPad` or use the npm package?


I'd use standard String.prototype.padStart. For anything more complicated, but not too complex (say "libraries" with 100-300 lines of code), it's very tedious and time consuming to find something on npm that satisfies the specific contraints of my app precisely. This certainly fits in that category.

For libraries above 1-2k lines of code that can be quite complicated (eg. things like database connectors), it starts being reasonable to mold expectations of my app around the library.

Also using coherent large utility libraries is something worthwhile (or was in the past), libs like underscore/lodash, etc.

But composing app from random underdocumented 2 function "libraries" from npm is just pain if you know what you're looking for. 1) you can't trust the code, so 2) you have to read it anyway and reading is often slower than writing


Searching for a package on npm is more time-consuming than writing 300 lines of code? Perhaps your app has some very specific constraints, but for a simple library like this, I have a hard time believing you could code something up on the fly faster than it would take to npm install the package in question.


I have absolutely written a diffing algorithm like this faster much than it would take to find an appropriate version on npm.

Every so often I try to reach for an npm library, only to find out that it's suboptimal, doesn't handle my edge cases, or just plain worse than the code I would have written.

The last time I tried to do this, I wanted a hash map with custom equality functions on the keys, since JS doesn't have one built-in by default. I searched for about 2 hours and didn't find a single one that actually worked (several claimed to be, but actually didn't work...)

So I wrote my own in about 100 lines. [0]

Data structures and algorithms tend to be really awful on npm.

[0] https://github.com/magcius/noclip.website/blob/master/src/Ha...


In the long term, having to manage all these dependencies, keeping track of incompatibility, testing updates, or worse, the lack of management, is time consuming.


For trivial functions, I'd rather just copy them into my source code.


Let's say I maintain dozen of packages, then I would rather like to write it once and include it everywhere. I could copy and paste all the common functions instead of including, but if I notice a bug or have to upgrade the code for some reason, then I have to edit the code dozen times. But if I only have to maintain one package, then I prefer the copy&paste approach.


I agree it can be pretty easy to set up a basic version of this, but eliminating edge cases, optimizing, and unit testing can make it take significantly longer.


I know some diffing libraries are reluctant to do this for perf reasons, but many of them do not handle cycles correctly leading to potential infinite loops based on the shape of the objects and order of arguments.

Fast is great, but unless you know that your data doesn't have cycles, infinite loops take a long time :)


Please compare with a modern competitor: json-patch-plus https://github.com/n1ru4l/graphql-live-query/blob/main/packa...

It’s a bit different, but I think it covers more cases.


What if object references have loops? That doesn't seem to be handled. Also it looks like it will diff not just own properties but properties from prototype objects, too.


> loops

Don't


Ok, let's pretend they don't exist and potentially hang the app.


> const t = true;

That should be the job of a minifier, makes the code less readable.


Hey this is really cool! Not sure how this compares to https://github.com/benjamine/jsondiffpatch which has been unmaintained for some time. But if this could replace it I'd happy to start using it.


Thanks for your interest! Currently, Microdiff does not have a patching functionality by default, and the API is different. If there is enough interest, though, I might make a compatibility layer to ease the differences, and it probably would not be hard to do yourself too.


I wonder how it plays with React(Native)/Redux. I'm having a bottleneck issue with state changes with many state comparisons and wondering if I can swap the builtin diff with this to speed up the diffs.

(I admit there are many other parts to optimize too, but if I can just plug this for some extra juice, why not?)


> type: "CREATE" | "REMOVE" | "CHANGE";

Since performance is a goal, why not define an enum and use integer values? I don't know about speed, but the resulting diff would be smaller.

Also I wonder what "const t = true" is for.


Usually it doesn’t make a difference neither in speed, nor in size, because of “interning” of literals. Unless you jsonify it, of course. It could show up, if you’d later compare .type with some “CRE”+”ATE” values made funny way, but not for literals, e.g.:

  // x.js
  export const foo = "foo"

  // y.js
  if (x.foo === "foo") {
    // ^ as cheap as int
  }

  const foo2 = "FOO".toLowerCase()
  if (x.foo === foo2) {
    // ^ questionable
    // depends on jit/optimization
  }
https://stackoverflow.com/questions/5276915/do-common-javasc...

https://en.wikipedia.org/wiki/String_interning


Interesting it’s so much faster, the code looks very natural and trickless besides the richobject check (at least at first glance).

Wonder what everyone else is doing to make it slow.

Also curious what mobx and reacts object diffing looks like now.


For a bit of info on why it is often faster, you can look at https://github.com/AsyncBanana/microdiff/issues/2. I will probably add something to the README about that soon too.


Not checking for arrays (and nested array recursion) is most likely the biggest performance/feature difference.

Some diff libraries have React specific optimizations, if values might be React elements/components.


It doesn't handle arrays, for one example.


Is this a true deep diff utility? Didn’t check the code, but the readme makes no mention of depth capability beyond comparing to some other deep diffing libs.


Yeah, it's just a single function that recurses. So the limit will be your VM's call stack limit.


Nice one. Have you tried replacing the for loops with forEach? It might be even more performant.


In Javascript, forEach, map, reduce and other similar array methods are considerably slower than a regular for loop. You can see [0] and the linked questions for more technical details.

[0] https://stackoverflow.com/q/22155280/1470607


Function callbacks are definitely slower. You could probably shave another 10% by changing the array.push() calls to direct assignments: array[array.length] = value. Same with the nested array.map, changing to a for loop.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: