Simple: a lens properly illuminating an APS-C sized sensor puts the same amount of photons per square millimeter but about 2.3x less overall light than an f/2.8 lens that illuminates a FF sensor. Exposure is the same per square millimeter but the total amount of light for the (different sized) sensor is different. That's why the whole "equivalence equation" is more complex than just comparing one piece of the whole system.
If you want to understand how to create the overall "same" picture, you need to:
- Account for focal length equivalence
- Amount of total light and size of entrance pupil
This means a 33/1.3 on a 1.5x crop APS-C sensor puts the same amount of light on the overall sensor as a 50mm f/2 on FF would. Now, given that the amount of light per square millimeter is different on these lenses at max aperture, we need to adjust the "gain", meaning selecting a lower ISO value for the APS-C sensor. If we picked ISO 200 on FF, we'd need to use ISO 89 on APS-C.
At this point we basically have equivalence - in photons per sensor cell (if we go with same pixel counts on both) as well as in DoF rendering. The problem now becomes when the APS-C (or even smaller sensors) can't go to very low native ISO values, because their buffers overflow. It's interesting that it's actually harder to achieve equivalency with low ISO values, which also explains why a larger sensor with larger photo cells can achieve higher dynamic range at base ISO. At higher ISO values read out noise becomes a factor, that's where smaller photo sites actually help, plus the effect that if you have more pixels, de-noise is more efficient.
Now, what that overall means is that you can typically use one stop smaller lenses on a FF sensor compared to the APS-C sized sensor and achieve the same overall sensor illumination (total amount of photons recorded by the respective sensor). That's why comparing a f/2.8 lens on for example a Sony A7 III with a f/2.8 lens on a Fuji X-H1 (I picked the older camera only to account for same pixel count) is incorrect as you're ignoring the more than double amount of photons let in by the FF lens to the FF sensor – which you don't need if you want the same overall sensor readout. It's perfectly fine to use a f/4 lens on the A7 III.
Therefore, the whole argument for APS-C lenses being so much smaller for the same f-stop, while technically correct, is not an argument for the same illumination of the sensor and therefore not an argument if you want to actually use a camera with that lens. The factors of f-stop, entrance pupil, sensor size, ISO (gain) defines the whole system. Ignoring one when we talk about different system just shows ignorance, nothing else.
Now, whether this matters in the real world is a different question. I have a FF system (EOS R) and an APS-C system here (Fuji X). Both have their respective places in my kit, but at least I understand the actual differences and where they come and when I have an advantage using one or the other or why APS-C lenses at same aperture can be smaller than the ones on FF. It plain is because the sensor is smaller and they let in less than half the light.
That's why I said: saying f/2.8 on APS-C is the same as f/2.8 on FF is the same as saying a two seater car is the same as a four seater car, since every car has exactly one person per seat. The issue is that the four seater still transports twice as many people, but this is happily ignored in the photo world. And don't get me wrong: I'm perfectly fine if people say "it's irrelevant for me, I never need more than two people in the car" – nobody can ever say it's not, because that part is personal. What I'm not fine with is people saying "It's irrelevant for me therefore it has to be irrelevant for you" or "I ignore part of the system and just quote a fact out of context and still expect it to be the only defining factor".