> Excellent explanation, thank you.
> I've heard that these demanding drivers only uses 3 cores/threads from the CPU, why
> don't split the load with the fourth? it would really help mid-level CPU like mine.
> And your CPU is a freakin monster really, not only has 2 more cores but has 100
> points more than mine on SingleThreadRating.
Happy to help. MAMEWorld has traditionally not been particularly great in terms of offering helpful responses from actual devs, being a technically unofficial forum, so I'm hoping to gradually turn that tide.
Whereabouts did you hear that only 3 cores are used? I'm pretty sure that that's not the case.
There's a fairly limited amount of multithreading that can be applied to emulation: The majority of slow systems in MAME are slow because they have a single beefy CPU and can't be split up onto multiple cores. Systems like Namco System 22, with multiple somewhat-fast CPUs would also be difficult to break across multiple cores - these sorts of systems tended to have almost minimal latency when communicating between CPUs, and the cost of synchronizing the threads would most likely wipe out any gains you'd have by offloading the CPU emulation to other cores.
One of the places where MAME does make use of multithreading, however, is with the "poly.h" system - essentially a threaded work-unit system where individual scanlines of a given polygon are shuffled off to additional cores via "buckets" of scanlines for a given thread to process. In this case, MAME can and does use as many cores as you give it, but there's an aspect of diminishing returns. Most games have many small polygons, and the number of polygons spanning a large amount of screen real estate are minimal. So, any given polygon could very likely wind up only being able to distributed onto, say, 3 cores, if the polygon is 24 pixels in height or less, and a bucketing factor of 8 scanlines. The actual numbers might be different from this, but it helps to illustrate the problem.
Polygons must also be drawn in-order, as otherwise games which relied on drawing polygons over other polygons might wind up with them sorting backwards on-screen. So, you can't just let the unoccupied cores start grinding through "future" jobs if those jobs would have conflicting Y ranges. Since a lot of games tend to draw models in the form of triangle fans, triangle strips, and other things, where sequential polygons tend to have high locality in screen-space, it ends up meaning that the code can't really get too far ahead. Not only that, but any time the rendering state changes, unless that state is mirrored to each work unit as the N64 driver currently does, you have the whole temporal issue of a future polygon potentially changing the common renderer state out from under any other threads that are currently processing a triangle.
One potential solution for this will be to treat polygons - and, indeed screens in general - as an array of horizontal spans, rather than (in the case of screens) a given 2D bitmap, so looking to the future there's the potential to be able to spawn off workloads on the GPU, which could make things faster, and allow higher amounts of parallelism. There's still the very real fact that in-order polygon commands need to be processed in-order, so the key will be to minimize the per-span overhead. However, that's something that's probably going to take literally years to implement, so it's all theory at this point.
Suffice it to say that MAME will attempt to use as many cores as you tell it to use, but the number of cases where parallelism actually helps the emulation is minimal, and in the case of 3D drivers there's still the aspect of diminishing returns to contend with, performance-wise.