Though the branch predictor can chew though both layers of indirection. It can actually start fetching code from the function (and even executing it) before it even reads the function pointer from the vtable.
Though, that assumes a correct prediction. But modern branch predictors are really good, they can track and correctly predict hundreds (if not thousands) of indirect calls, taking into account the history of the last few branches (so it can even get an idea of what class is currently being executed, and make branch predictions based on that). Modern branch predictors do a really good job at chewing up indirect branches in hot sequences of code.
Virtual functions are probably the most harmful for warm code. We are talking about code that's executed too often to be considered cold code, but not often enough to stick around in the branch predictors' cache, executed only a few hundred times a second. It's a death by a thousand cuts type thing. And that's where devirtualisation will help the most...
As long as you don't go too far with the inlining and start causeing icache misses with code bloat. In an ideal would the compiler would inline enough to devirtualise the class, but not necessarily inline the actual function (unless they are small, or only called from one place)
Though, that assumes a correct prediction. But modern branch predictors are really good, they can track and correctly predict hundreds (if not thousands) of indirect calls, taking into account the history of the last few branches (so it can even get an idea of what class is currently being executed, and make branch predictions based on that). Modern branch predictors do a really good job at chewing up indirect branches in hot sequences of code.
Virtual functions are probably the most harmful for warm code. We are talking about code that's executed too often to be considered cold code, but not often enough to stick around in the branch predictors' cache, executed only a few hundred times a second. It's a death by a thousand cuts type thing. And that's where devirtualisation will help the most...
As long as you don't go too far with the inlining and start causeing icache misses with code bloat. In an ideal would the compiler would inline enough to devirtualise the class, but not necessarily inline the actual function (unless they are small, or only called from one place)