Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If anyone is curious, one of the fastest multi-threaded queue implementations out there is the LMAX Disruptor.

https://github.com/LMAX-Exchange/disruptor

https://github.com/LMAX-Exchange/disruptor/wiki/Performance-...

I've started using a variant of this in my .NET Core projects and have found the performance to be astonishing.



Jonathan Blow warns against threaded queues in game development, as normally simulating your world isn't the bottleneck (rendering is) and it will just cause a fair bit of unexpected behavior/debugging


statements like this really need to be put into context. Maybe that's true for his games, but it's not necessarily true for all games. His latest game, The Witness, is a first-person puzzle game with zero non-player characters, no kinematics, and a variety of puzzles based around light and rendering. He designs games that don't have much to simulate and do have complicated rendering situations.

Meanwhile, Doom Eternal has no main or render thread at all, and instead uses a massively parallel jobs system. https://www.dsogaming.com/news/doom-eternal-does-not-have-a-...


I think the majority of game devs who would get advice from lectures/hacker news comments are probably making games on a small enough scope/scale that choosing a single threaded game logic engine is fairly reasonable. The people at Bethesda are the best of the best; this kind of reminds me of the fitness "advice" I was once given to not go on long jogs/runs because "all the best marathon runners are super skinny", but that only applies to world class marathoners, not dudes running a 5k on the weekend.


people who work in games also read HN you know.


And said AAA devs aren't "ubermensch". They run a large gamut of specialties and skillsets - plenty of them will benefit from extra context.

And some of those small scale hobby/indie devs may later end up working for "the big leagues" as well, so the extra context can benefit them too.


tbf I said take advice from HN. I except people deeply involved in games to have their own specialized info us mortals can't touch (or they just do their own testing)


I'd appreciate a link so I could hear his whole argument.

You can argue about when it is appropriate to use threads at all. But, if I'm going to use threads, I use a threaded queue for communication exclusively.


I wish X4 would go this route though, as it is entirely bottle necked by simulation speed.


I imagine execution order/consistency is very valuable for 4X games, and most of the time, results are dependent on each other (who wins a battle may depend on the current status of an empire, which is dependent on the outcome of various planet level actions, for example). It'd probably be a very different game to have each action be stateless, could be a cool exercise though


Note that despite using the same two characters, X4 is definitely not a 4X game.


Whoops, my brain just ran right through the word. Looking at X4, it still looks like an incredibly busy game with a very busy gamestate


In some cases you have to. I'm working on a game and absolutely need separate threads for animation, rendering, and constructing meshes (it's kind of procedural - is part 3d map renderer).


Sim games (eg SimCity, Sims) could spend more time on sim than graphics.


Indeed, lot of games are actually single threaded.


In recent years there has been a trend shift away from this, at least in the AAA engines, towards a job system. This makes sense: you have a thread per core and you create jobs to “go wide” when you can. See for example Unity’s Job system it’s the GDC talk by Naughty Dog from a few years ago.

The big games will also prepare data for rendering in parallel (eg culling and sorting and whatnot, although much if this is also done on the GPU).

(Going by GDC talks, the rendering teardown articles and just what I see online from Unity/Unreal. I don’t work in games myself)


Which makes total sense because single thread performance is growing more slowly these days. Used to be you'd double every couple of years but today the midrange is only about 50% faster single threaded than it was in '16. Now if you count all the cores you're still seeing things more than double over that time. Compare these similarly priced CPUs from today and a few years ago: i5-6500 and i5-10500. The latter is maybe 30-40% faster single threaded but has more than double the parallel throughput.


It's true but lot of the main area don't multi-thread well like AI and physics.


I don't know why AI shouldn't thread well, assuming there is more than one actor. As long as they are operating over an immutable view of the game state, each actor should be able to plan independently and enter its commands independently. Likewise, there are probably some tricks you can do with physics. And anyway in most games interactive physics is only done for a few objects in the game world, and those objects are often not interacting with each other, at least not physically. You could cluster the objects that can affect each other and then do each of them single-threaded.


> As long as they are operating over an immutable view of the game state

That's a big issue. There is a surprising amount of back and forth between objects in a single step of gameplay/AI.

And, generally gameplay code tends to be a big mess of wild and ever-changing requirements from gameplay designers, extreme time crunch and short term (1 game then burn it) goals. Ivory tower software architecture it is not...

Clustering physics into "islands" is common practice though.


I'm not a game dev but I do know how software can become a mess. I think engines that are used by many games have a chance to push good practices here.


I mean, not entirely, though the game logic often will be. But usually there is a fair bit of threading going on, and things that are threaded in the engine, or graphics card driver.


would be nice to see updated benchmarks against the C++ queues - this LMAX queue seems to give 20-25 million messages per second on sandy bridge - the best 1P/1C C++ queue is around 250 million messages per second on a 9900K and I doubt a 9900K is 10 times more performant than a 2600K.

> https://max0x7ba.github.io/atomic_queue/html/benchmarks.html


I am more concerned about worst case latency than going into message rates measured in the billions per second. The load those events will generate far exceeds the load incurred by creating and processing them on the same physical host, so ill never be in a situation where 25 vs 250 million makes a difference.

I am also interested in the productivity and safety afforded by high-level languages in this arena. Dealing with memory and threading at the same time is not something I like to do at a low-level.


LMAX is optimizing latency, not throughput.


Can you give more context about your projects i.e. what makes them require a super high-performance queue?


The type of project I am using this for is a centralized client/server UI architecture where 100% of user events are submitted to the queue for processing. This allows for very high throughput user interfaces if you are doing clever things on the server WRT caching of prior-generated content for other events (i.e. all login attempts for the same region will get the same final view).

I found the abstraction this was originally developed for - processing of financial transactions with latency as the primary constraint - as an excellent analogue for UI event processing. Latency is also a huge concern when the user's eyeballs are in the loop.


And that use Java...


If you control object pools yourself and don’t use GC, as the LMAX disrupted does as far as I remember, Java can be blazingly fast.


Martin Thompson has basically made a career out of writing Java in the style of embedded C because he found enterprise customers that need the performance of embedded C but, being enterprise, insist on absolutely everything being Java.


calling external code from Java adds latency.


You have to make sure all dependencies don't use GC as well right?


Sure, but I'm assuming that if you're writing such high performance limited scope software like the LMAX disruptor, you have few dependencies (looking at their code, it appears that the disruptor code itself has no external dependencies and uses few of the standard library classes outside of NIO bytebuffers).


in LMAX-disruptor's case, they have no runtime dependencies: https://github.com/LMAX-Exchange/disruptor/blob/master/build...


Are you using https://github.com/disruptor-net/Disruptor-net library port or something else?


This is exactly what I am using.


Just curious, are you also making use of Span and Pipelines?


I haven't made much use of Span directly, but I do like using Pipelines for copying streams to other streams (i.e. building AspNetCore proxy abstractions).


Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: