Only the second half relates to what I was saying, of course; I would never argue that a larger optical aperture (200 divided by 2.8 vs 145 divided by 2.8) for the same FOV (facilitated by a larger sensor area used) did not capture more relevant total light. I have a few times in these forums talked about how an open-aperture zoom lens favors a larger sensor when the zoom focal length is proportional to sensor diagonal for the same full-sensor FOV or AOV.
The second set of images are problematic, IMO, because the amount of noise per unit of sensor area is relatively close, and when they are that close, differences in conversion style and post-processing artifacts can play more of a role in apparent visual quality than actual RAW noise levels.
If you look at the dark areas in the ISO 6400 comparison, at 100% on my 113 PPI monitor, they look less noisy with the 1Dx. The high-contrast edges look sharper on the 1Dx, too, despite it being upsampled.
When you zoom in to 400%, though, you can clearly see what is really going on; the 1Dx version is almost completely devoid of detail, like a steamroller rolled over it and turned it into a cartoon, in any contiguous area of similar color and tone; an aggressive edge-preserving NR. The real reason for the higher edge contrast in the 1Dx is due to more sharpening, because you can see the sharpening halos, and at 400%, you can see that there is really no better info in the 1Dx at all.
Also, upsampling is a form of apparent visual (but not informational) noise reduction. Imagine an image of random black pixels and white pixels. They have a lot of contrast, and they really stand out against each other. Upsample that image just a little bit, though, with any method that uses reasonable resampling quality, and then there are no longer any 0,0,0 or 255,255,255 pixels anymore, and everything is in a range between something like 80,80,80 and 175,175,175 and it looks less noisy. So, when you are inspecting detail and noise closely, upsampling one image and not the other creates a sort of perceptual noise reduction in the upsampled one. Perhaps both should have been upsampled, and both converted with no sharpening or NR in a converter capable of doing so, unless you are only interested in the visceral appreciation of the converters and implicit and explicit parameters chosen.