> Allocation in modern generational GCs is in a completely different performance category than C-style malloc is.
Yes and no. If enough allocation happens that a GC pass occurs while you're still running your JS hot loop you're pretty much screwed and all your performance vanishes. If you want to see this in the extreme go open an Android systrace or Chrome chrome://tracing trace result and do a JS profile while you do so. The single largest consumer of JS CPU time is the GC at nearly 13%.
The problem with modern generation GC langauges is they tend to be designed assuming that allocation is always cheap/free and it just isn't. That's a fairytale. It's just not true.
Also just because it only takes 6-13ns for an allocation to be given to you doesn't mean anything when the first access of that allocation is a cache line miss compared to if you did the same thing in C and it was on the stack, which is likely sitting in hot L1 primed and ready to go. Reusing the same memory address range is critical to getting maximum performance, and modern generational GCs are miserably bad at that.
> Yes and no. If enough allocation happens that a GC pass occurs while you're still running your JS hot loop you're pretty much screwed and all your performance vanishes.
GCs are expensive relative to not doing anything, but minor GCs are very fast.
> The problem with modern generation GC langauges is they tend to be designed assuming that allocation is always cheap/free and it just isn't. That's a fairytale. It's just not true.
I'm not making an argument about language design here. Obviously value types are a good thing.
> Reusing the same memory address range is critical to getting maximum performance, and modern generational GCs are miserably bad at that.
Not at all. The nursery is usually in cache, just as the stack is. Nurseries are usually implemented as two space copying collectors with small spaces, which are excellent for cache locality (arguably even better than stacks, because stacks can grow deep).
> Not at all. The nursery is usually in cache, just as the stack is. Nurseries are usually implemented as two space copying collectors with small spaces, which are excellent for cache locality (arguably even better than stacks, because stacks can grow deep).
I think you're still working under the assumption that the GC is keeping up with the allocation rate, which if you have a small allocation in a hot loop will largely not be true. Typically GC'd languages rely on escape analysis to handle this and not the generational GC at all, but if that fails then you're SOL because the generational GC is unable to keep up and unable to keep things in the fast path.
> Typically GC'd languages rely on escape analysis to handle this and not the generational GC at all
No, they don't. Most GC'd languages rely on generational GC, because escape analysis on its own doesn't directly provide a lot of performance benefits once you have a generational GC.
> if that fails then you're SOL because the generational GC is unable to keep up and unable to keep things in the fast path.
No, you aren't. Cleaning up dead objects (for example, temporaries created in a hot loop) in a nursery is extremely fast. The entire nursery semispace is typically in L1.
L1 is only going to be about 64k per core (eg, Kaby Lake). That'll have your stack, your working set, and a tiny bit of your nursery in it... as long as you don't get associativity problems.
If you read a bunch of data before making an object, I could see you easily evicting your nursery from L1 into L2, but then you only have about 4x as much space.
Yes and no. If enough allocation happens that a GC pass occurs while you're still running your JS hot loop you're pretty much screwed and all your performance vanishes. If you want to see this in the extreme go open an Android systrace or Chrome chrome://tracing trace result and do a JS profile while you do so. The single largest consumer of JS CPU time is the GC at nearly 13%.
The problem with modern generation GC langauges is they tend to be designed assuming that allocation is always cheap/free and it just isn't. That's a fairytale. It's just not true.
Also just because it only takes 6-13ns for an allocation to be given to you doesn't mean anything when the first access of that allocation is a cache line miss compared to if you did the same thing in C and it was on the stack, which is likely sitting in hot L1 primed and ready to go. Reusing the same memory address range is critical to getting maximum performance, and modern generational GCs are miserably bad at that.