Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> My initial reaction was that the extra indirection would slow it down, but of course that's based on nothing but general knowledge and two seconds' thought.

There are already indirections in the current implementation.

The new dict has much better memory locality: the "actual data" (24 bytes on 64b platforms) is stored in a dense array instead of the former sparse array, and the sparse array only stores indices so it can be made much more compact (especially as it can switch the index size depending on the size of the compact array, for small dicts it'll store byte indices). So not only is the new dict smaller, the way the data is laid out is better fetchable.

The original point[0] was actually iteration speed: since the data is stored in a dense array, that can be iterated directly and efficiencly instead of having to iterate a sparse array and skip the random empty entries (leading to awful branch predictability, whether a given entry of the sparse array is empty is essentially random if the hash is any good).

[0] https://mail.python.org/pipermail/python-dev/2012-December/1...



Great, thank you. That's a cool technique and I'll have to remember it.


Thanks for this - I was also wondering about how this would effect cache usage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: