What I find odd is the way that some see this "opportunity" as a problem. If noise is most important, there is no harm in using a shutter speed that would be just sufficient for an 8MP APS-C; it won't be worse because of 4x the density, and in fact, if the motion blur is simple and easily de-convolved, the higher-res version will de-convolve more cleanly with less artifacts.
Many talk like you have to increase your shutter speed with higher pixel density to just keep up with the IQ of lower pixel densities, but every attempt to prove that is a fail, once the condition of normalizing the results is required. Most feelings about this "need" are based on uncontrolled, highly-confounded memory of experience, but no one can reproduce it in a controlled manner.
Let's do a thought experiment, since most of us don't have a way of shooting two cameras exposing at exactly the same time with exactly the same camera motion and subject motion. Find an image with high pixel density and some mild motion blur that is visible at 100%. Now, make two copies of that image, and make a triptych of 3 100%-crop windows of the same crop area, and pixelate the second one 2x2, and the third one 3x3, and every single time, the original will be the most detailed. What many people will actually do and remember in the sum of their experience is like making the windows 100%, 50%, and 33.333%, basically as if, instead of pixelating, they downsampled the copies to 50% and 33% and had all 3 windows at 100%, through the habit of inspection at 100%.
Keep in mind that when we create virtual larger pixels with pixelation, we are actually biasing things in favor of those larger pixels; they have weaker virtual AA filters relative to pixel size than real large pixels, and are more Foveon-like in having full color recorded at each virtual pixel, and still, the original, with the stronger virtual AA filter and surviving CFA mosaic, still looks more detailed.
So, if you target pixel-level stability proportional to pixel spacing, you will get more noise, even in a subject-normalized display, but you could have used that higher shutter speed with the lower pixel density, too; you may have simply decided not to do it because it doesn't have as much potential benefit as there would been with the higher pixel density. It is easy to get lost in the weeds and lose track of the difference between absolute qualities, and qualities relative to high-expectation potential.
I have always viewed higher (and higher) resolution sensors as permitting you to do less (and less) capture sharpening to compensate for the AA filter.
Of course a lot of the time with wildlife, particularly birds you are crop hard even with 600mm glass and APS-C sensors; this is the use case I think many are thinking of where they are using the sensor as a variable sized format to get closer in. This I think explains the pixel peeping fixation.
I generally agree with all your points on comparing sensors on an equivalent physical image size rather than pixel size, it is just people are probably cropping a 32 mp pixel to 8 mp routinely.










