The limit is the wavelength of light. Just off the top of my head, I think the wavelength of red light is 0.7 microns. I'm not sure if the efficiency of a photosite deteriorates as this limit is approached, but smaller than this limit would seem to be less efficient. In fact, if it's possible to make an analogy with radio reception, efficiency might increase dramatically as the photosite size approaches the wavelength of light. Current small P&S cameras have around 3 micron photosites, so there's a way to go.
The other side of the equation is lens quality. It seems to be a truism that components in a controlled environment (sort of laboratory conditions), develop more rapidly than components that have to interface with the real world. For example, computer chips, camera sensors, hi fi amplifiers, factory robots, to name a few, develop more rapidly than computer monitors, loudspeakers, camera lenses and automobiles.
All lenses are limited ultimately by diffraction, until someone discovers a way of beating this limitation. But not all lenses are diffraction limited at every F stop. It's very difficult to prodice a 35mm lens that's diffraction limited at f8; even more difficult at f5.6 and well nigh impossible at smaller f stops (but perhaps not with an unlimited budget - I stand to be corrected
).
With increasing application of nanotechnology and new materials, it might eventually be possible to produce a lens that is truly diffraction limited at f2. Those 1 micron photosites (a guesstimate of the practical limit) might well become a reality.
One should also bear in mind that the Rayleigh's limit is at an MTF of around 9% (ie. the image has lost 91% of its original contrast). Current digital sensors are incapable of picking up such faint signals. I wonder if a photodetector tuned to the frequency of light would be able to.
We need some clever Physicists to answer such questions.