rurban a day ago

Don't compare apples to oranges. unordered_map is so slow because it has to guarantee pointer stability, doing seperate chaining, whilst the open addressing hashtables doing probing and moving do not. They are at least 2x faster.

Compare to linear probing, quadratic probing, double hashing, cuckoo, or swiss tables.

  • tialaramex a day ago

    This seems like it has causality upside down. The pointer stability and weird bucketing API for std::unordered_map happen to be consequence of its design, and you're stupidly allowed to rely on them, and so it can't be trivially replaced with a better hash table.

    But most people didn't want this. std::unordered_map is just a more modern example of WG21's ability to clutch failure from the jaws of success than say std::vector<bool>

    The approach taken in std::unordered_map was a historical footnote when C++ 98 shipped, but what's crazy is that it didn't ship in C++ 98 it's from C++ 11. The ISO document specifying this archaic hash table design as the standard C++ hash table is from the Obama Presidency like Rust 1.0.

    So while maybe it's unfair in one sense, in another sense it makes absolute sense to compare against std::unordered_map because this is what you get out of the box.

    • SkiFire13 a day ago

      > So while maybe it's unfair in one sense, in another sense it makes absolute sense to compare against std::unordered_map because this is what you get out of the box.

      That's not really a reason to omit comparisons with other better implementations.

      • tialaramex a day ago

        That's entirely fair, it would be nice to compare against other popular hash tables too.

    • gpderetta a day ago

      > [pointer stability] you're stupidly allowed to rely on them

      that's a bit unfair. Those are properties that can be very useful if you need them.

      • cb321 a day ago

        Indeed. One of the main reasons STL defaults to a non-moving hash table is because "deleting in the middle of an iteration" was viewed as an important property. Like it or not (I do not), the entire container part of the original 1990s Stepanov STL was oriented around "iteration/iterators" as the high priority abstraction.

        Trying to think like a DB engineer is probably helpful here. Point or range query workloads absolutely afford different optimization/specializations than full table scan iterations.

        Any mistake here is more in thinking "one size fits all workloads". That may as much be a mistake of user communities as of language designers. (E.g. 'expert' advice like "just always use X" instead of "well, it depends...".) So, the charitable reading of tialaramex's "most people" is a weakly articulated assertion about workloads (though I absolutely agree with your specific point).

        • tialaramex a day ago

          I'm not at all convinced I need to be read "charitably". I think it's just straightforwardly true that most people writing C++ with a std::unordered_map do not use this property of std::unordered_map.

          In particular the "deleting arbitrary other entries while walking the unordered map" is a weird thing to be doing. It's not necessarily wrong but it's weird to need precisely this (which it so happens the std::unordered_map can do) but not e.g. inserting arbitrary stuff into the map (which might grow and thus invalidate your iterator)

          Note that "Delete yourself if you want" is an operation we can afford in the otherwise more optimal containers, so if that is the only deletion you want then you again over-paid by choosing this weird container type with pointer stability.

          I do not believe that Stepanov chose this particular hash table because this one had properties he specifically wanted, maybe you have a reference which can prove me wrong? I think this was the type he was most familiar with, and the STL is illustrating generic programming - an important idea - not showing off the most optimal data structures and algorithms.

          • gpderetta a day ago

            The primary reason you might want reference stability is because you might have another pointer to it. Reference stability is generally important for the STL and carefully documented; even for vectors stability is guaranteed on many modify operations. I have seen plenty of code that use reserve+push_back specifically to preserve stability for example.

            The primary reason for unordered_map to preserve stability is that std::map does and the ability of unordered_map to be a drop-in replacement for std::map for code that doesn't need ordering was considered at that time a better tradeoff than the absolute optimal implementation.

            You can of course build a pointer stable map on top of an unstable one by heap allocating every object and storing unique_ptrs to the objects in the map, but when hash_map was initially added to the STL (remember than unordered map is just standardizing the non-standard hash_map), there was no unique_ptr (we do not speak of auto_ptr).

          • cb321 20 hours ago

            I tried to defend you against a hard to support "most people didn't want this" with a workload (either static or dynamic) assertion. You come back saying you don't need such charitability while simultaneously clarifying in terms of a static workload! Seemingly, "most written C++" (again charitably moving from "people writing" to "written code" because the latter is probably more relevant and surely more auditable/studyable - "source code files easier than people").

            I think you should be more careful about claiming you know what people want or know. Many people I know don't seem to know what they want.. it's some melange of desiderata with various eternal tensions. As just one example, I would doubt most people even want performance above all else. I really doubt most people understand well how their various static and dynamic workloads relate to their performance. I doubt I'm alone in having known many C++ programmers who, from spoken conversation, revealed they thought STL map was a hash table, for example.

            I only said "one of the main" and almost added "with pointer stability being the bigger", but it had already been observed by rurban. Delete-while-iterating has close ties with cell motion (though convoluted workarounds may exist, they may well cost more than you personally would like, it sounds). How to best explain all these things is always questionable, but I was trying to supplement. It's generally not easy without visual aids and definitely not if people are being combative. Python has ever & always since the late 80s used open addressing but only more recently tried to `raise` on dict edits during iterations.

            It seems to be flip-flopping that you are now calling separately chained hash tables a "weird container type" when 3 months ago it seemed to be news to you that they were NOT the main strategy for decades before open addressing on some Nim -ish comment on a Carbon thread: https://news.ycombinator.com/item?id=44754821

            Anything else I would say, such as Stepanov not even choosing a hash table in the first place, gpderetta said very well in a sibling. For the record, I still very much disagree with the early (& later) design of the STL and am no fan of WG21. Have a good weekend.

            • tialaramex 15 hours ago

              Rather than evidence of flip-flopping what I'm talking about there is that I consider this to be weird today in 2025. Things change, there's innovation in data structures and algorithms and the hardware we're likely targeting also changes to favour some structures and algorithms over others.

              In 1975 this wouldn't be weird, but also C++ doesn't exist then.

xxs a day ago

Aside already mentioned comparison to unordered_map, there appears to have a bug, on line 61: "p = (p + 1) & LenV". It should be mod (%) like the rest of the code.

Morealso mod is slow in general and it should be replaced by bitwise and (&) and power of 2 sized map, then using LenV-1.

vander_elst a day ago

I understood what's going on in this article only after reading the paper. It's might be good to define things s bit better in the article or say that the paper is a prerequisite for reading the article

  • avadodin a day ago

    The paper itself is an overview that doesn't discuss the cache-friendly optimizations in the original 1986 PhD thesis so it feels like a hash table all the way down.