It is often stated that an image sensor with small pixels will create digital images that have worse performance characteristics (more noise, less dynamic range, lower sensitivity, worse color depth, more lens aberrations, worse diffraction, and more motion blur) than those created by a sensor with large pixels. I disagree.
Of all the reasons why the misconception is so widly believed, these three remain most prominent: flawed image analysis, flawed reasoning, and flawed authorities. But the greatest of these is image analysis.
Flawed Image Analysis.
The five most common types of image analysis mistakes are:
* Unequal spatial frequencies.
* Unequal sensor sizes.
* Unequal processing.
* Unequal standards.
* Unequal technology.
When image analysis is performed correctly, it becomes very clear to the viewer of an image that it is possible for small pixel sensors to have worse performance per pixel, but the same performance when actually displayed or used for the same purpose as a large pixel sensor. Furthermore, that such a condition exists and is common in real world image sensors. These facts may be very surprising or at least counter-intuitive to many people who work with digital images until they see it with their eyes, so examples are included below.
The second most common cause of small-pixel myths is flawed reasoning, such as logical fallacies, poor models, and bad analogies of image sensors. For example, one line of reasoning is that a single pixel, in isolation, when reduced in size, has lower performance; therefore, a sensor full of small pixels creates images that have worse performance than the same sensor full of large pixels. But that's not seeing the forest for the trees: it fails to account for the effect of a far greater number of pixels. What matters are the results, whether crops, prints, or something else, and it is in the real-world results where the small-pixel myths fall flat.
Thirdly, there is a tremendous number of sources for information about digital imaging, including respected photographers, camera review web sites, forums, magazines, manufacturers, and more. Many of these resources which are regarded as authorities have made the same image analysis and reasoning mistakes discussed here. There are, of course, some who haven't. For example, a paper by G. Agranov at 2007 International Image Sensor Workshop indicates similar performance between 1.7 micron and 5.6 micron pixels:
The first category of image analysis flaws is in regards to spatial frequency, which is the most important and fundamental element of image analysis as it pertains to pixel size. This aspect of an image indicates the level(s) of detail under analysis: whether fine details (high frequencies), more coarse information (low spatial frequency), or an entire spectrum of frequencies. This vital factor often ignored completely, other times poorly understood, but it always has a tremendous impact on the result of any comparison or performance analysis.
The great majority of image analysis is fundamentally based on the performance of a single pixel, so having worse performance per pixel and the same performance in the real world results, where it matters, would seem a contradiction. It isn't.
Performance scales with spatial frequency. In other words, the many important performance characteristics of a digital image are all a function of spatial frequency, including noise, dynamic range, color depth, diffraction, aberrations, and motion blur. Therefore, for any given sensor, analysis of higher spatial frequencies will never show better performance than analysis of lower spatial frequencies. Here are some very good references to explain how performance scales with spatial frequency:
Every image sensor has a sampling rate, or Nyquist. This is the spatial frequency at which the image sensor samples information. But every resulting digital image also contains information at all other lower spatial frequencies. For example, Pixel A may have a native sampling rate of 30 lp/mm. But the resulting digital image also contains information corresponding to 20 lp/mm and 10 lp/mm, which are larger, coarser details. Pixel B may be much smaller, and may natively sample at 60 lp/mm, but the resulting image still contains all the information of Pixel A, it only has additional information.
100% crop is the most common way to compare image sensors, but it is very misleading when the sensors have different pixel sizes. The reason is that 100% means the maximum spatial frequency. But different pixel sizes sample different spatial frequencies. So 100% crop means higher spatial frequencies for small pixel sensors than it does for big pixel sensors. This results in comparisons of completely different portions of the image. A 100% crop of a small pixel image would show a single leaf, whereas a 100% crop in a large pixel image would show the entire shrub. It's a nonsensical comparison. Failing to account for that important and fundamental difference is one of the most common flaws in such comparisons.
This type of flaw is much more rare in optics analysis. There it is widely understood that standard optical measurements, such as MTF, are naturally and fundamentally a function of spatial frequency. If the MTF of lens A is 30% at 10 lp/mm, and the MTF of lens B is 20% at 100 lp/mm, that does not mean lens A is superior; in fact, the opposite is more likely true. It's necessary to measure the lenses at the *same* frequency, either 30 lp/mm or 100 lp/mm, before drawing conclusions. It's very likely that lens B has a much higher MTF at 10 lp/mm. Of course, comparing MTF without regard for spatial frequency is so obviously wrong that very few people ever make that mistake. However, those same people do not realize they are making the exact same error when they compare image sensors with 100% crops. They are comparing at their respective Nyquist frequencies, but they have different Nyquists, so they are not the same spatial frequency.
Take a 100% crop comparison of a high resolution image (e.g. 15 MP) with a low resolution image (e.g. 6 MP) for example. The high resolution image contains details at a very high spatial frequency (fine details), whereas the low-res image is at a lower spatial frequency (larger details). Higher spatial frequencies have higher noise power than low spatial frequencies. But at the *same* spatial frequency, noise too is the same.
Sometimes it is stated that 100% crop or 1:1 viewing is a sort of absolute reference, or an equal playing field for image comparison. But the reality is that it is completely arbitrary: the level of detail is ignored and the scale is left up to whatever the resolution of the image sensor happens to be. Analogies (especially cars) can go awry quickly, but imagine if the same 100% methodology were used in evaluating the noise of vehicles.
The 100% speed on a sports car might be 220 MPH, while a commuter sedan may top out at 90 MPH. The 100% methodology indicates that the sports car is worse than the sedan. But it would be important to consider what would happen if both cars were driven at the same speed, instead of different speeds (since they have different 100% speeds). It might be that the sports car actually has less noise at 90 MPH than the sedan.
Although it's not necessary, it is always possible to resample any two images (e.g. large pixel and small pixel images) to the same resolution for comparison. This would make it possible to compare 100% crops and draw conclusions about the spatial frequencies under analysis. Since uncorrelated noise adds in quadrature, downsampling causes the noise power at Nyquist to decrease for lower Nyquist frequencies. That is, higher resolutions have higher noise, so when they are removed by downsampling, the noise is removed as well.
However, sometimes this fundamental nature of downsampling is called into question, such as a blog post by Phil Askey at DPReview.com:
However, it was thoroughly debunked:
There is ample proof that resampling works in practice as well as in theory. Given that fact, as long as small pixel sensors have proportionately higher noise and higher spatial frequencies, it will always be possible to resample the image and get lower noise power at lower spatial frequencies, so that the image is the same as that created by large pixel sensors. For example:
Again, in cases where it's not necessary to resample the image, it is best not to, since the resolution often has a highly beneficial impact on the image, and the noise will be the same either way. But for the purpose of understanding pixel size comparison, it generally helps to perform the analysis with downsampling *and* upsampling, to show the effect for a variety of situations.
Another way to think about it is performance per detail. Say one small but important detail in an image is an eye. A large pixel sensor has a certain performance ”per eye”, so that over the area of the eye there is a certain noise power, dynamic range, etc. A small pixel sensor, too, has a certian performance per eye, again over the same area, only there are many more pixels. The noise power per pixel is higher, but since each pixel contributes a smaller portion of the eye, the noise power per eye is the same.
Here are some images by John Sheehy comparing small pixels and large pixels at low analog gain and high analog gain:
The spatial frequency mistakes error may have roots in the fact that the standard engineering measurements for sensor characteristics such as noise is necessarily at the level of the pixel. Sensitivity is measured in photoelectrons per lux second per pixel. Read noise is measured in RMS e- or ADU per pixel. Dynamic range is measured in stops or dB per pixel. There is nothing wrong with per-pixel measurements per se, but it should be understood how that relates to the image as a whole. The scale, or level of detail, should be accounted for correctly to understand the performance of the image sensor, not just a single pixel of that image sensor.
Image sensor performance, like MTF, cannot be quantified without understanding the effect of spatial frequency.
Unequal sensor sizes.
Sensor size is separate from pixel size. Some assume that the two are always correlated, so that larger sensors have larger pixels, but that is not the case. Sensor size is generally the single most important factor in image sensor performance; therefore, it's always necessary to consider its impact on a comparison of pixel size. The most common form of this mistake goes like this:
* Compacts have smaller pixels than DSLR cameras.
* Compacts have more noise than DSLR cameras.
* Therefore smaller pixels cause more noise.
The logical error is that correlation is not causation. The reality is that it is not the small pixels that cause the noise, but small sensors. A digicam-sized sensor (5.6x4.15mm) sensor with super-large pixels (0.048 MP) will not have superior performance to a 56x41.5mm sensor with super-tiny pixels (48 MP). Even the size of the lens points to this fact: the large sensor will require a lens that is many times larger and heavier for the same f-number and angle of view, and that lens will focus a far greater quantity of light than the very tiny lens on a digicam. When they are both displayed or used in the same way, the large sensor will have far less noise, despite the smaller pixels.
There is a strong correlation between all the performance metrics and sensor size; such that larger sensors (with a proportionately larger lens and thinner DOF) have much improved image quality in low light.
Unequal processing can include anything that happens after the light hits the pixel, including the ADC, raw preconditioning, raw converter, post processing, etc. An excellent method to draw conclusions about pixel size is to analyze just the raw data itself, before a raw converter can introduce inequalities, bias, and increase experimental error; however, it is possible to get useful information after a raw conversion in certain conditions. The types of errors in this category include the following.
Processed formats that come directly out of the camera, such as JPEG (and including video such as HDV, etc.) are good for drawing conclusions about the utility of that processed format for whatever purpose is needed, but it probably does not accurately reflect the sensor itself. Furthermore, any conclusions, which are necessarily subjective, cannot be generalized to pixel sizes in all cameras. Too much processing has already been applied to the raw data, including noise reduction, saturation, black point, tone curve, and much more, which all have an affect on apparent noise, sensitivity, color, and dynamic range.
Unequal raw preconditioning.
Most cameras apply a certain amount of processing before the raw file is written. Typically it includes at least hot/dead pixel remapping, but sometimes also other things. Many apply a black clip at or near the mean read noise level. Some remove the masked pixels. Nikon performs a slight white balance. Canon compensates for the sensor's poor angle of response by increasing brightness for f-numbers wider than f/2.8. Pentax applies slight median-filter based noise reduction on raw files when set to ISO 1600. Etc. Much of this pre-processing of the raw file can be factored into the comparison; the chief thing is to be aware of its occurance.
Unequal raw converters.
One raw converter may use totally different processing than another, even if the settings look the same. For example, setting noise reduction to "off" in most converters does not actually turn off all noise reduction; it just reduces noise reduction to the lowest level the creator is willing to allow.
Another common mistake is to think that a given raw converter will process two different cameras equally. None of the popular raw converters do. There are some converters that do, e.g. dcraw, IRIS, Rawnalyze, etc. One of the most popular converters, ACR, for example, varies greatly in its treatment of cameras, with different styles for different manufacturers, models from the same manufacturer, and even from one minor version of ACR to the next.
Furthermore, even if a raw converter is used that can be proven to be totally equal (e.g. dcraw), the method it uses might be better suited to one type of sensor (e.g. strong OLPF, less aliases) more than another (e.g. weak OLPF, more aliases). Even this can lead to incorrect conclusions because some cameras combined with certain scenes may benefit more from certain styles of conversion. For example, one demosiac algorithm may give the best results for a sensor with slightly higher noise at Nyquist. One way to workaround this type of inequality is to examine and measure the raw data itself before conversion, such as with IRIS, Rawnalyze, dcraw, etc. The important thing is to be aware of the possibility for inequalities to arise from processing.
This is the type of inequality that stems from having a different standard, expectation, or purpose for different pixel sizes. It unlevels the playing field. For example, there are some who claim in order for small pixels to be equal to large pixels, they must have the same performance (e.g. noise power at Nyquist) when displayed at much larger sizes (finer scales). But in reality, to be equal, they only need to have the same noise power when displayed at the same size (same Nyquist).
This can also be manifested in the idea that one should be able to crop a smaller portion of the image (center 10%) from a small pixel sensor and get the same performance as a large crop (center 50%) from a large pixel sensor. The correct method is to crop the same portion out of both sensor for comparison. If one is cropping the center 50% from the large pixel sensor (e.g. 1000x1000), then one should crop the same 50% from the small pixel sensor (2000x2000).
If one only expects to it be at least display the same size, or the same portion of the image, and have the same performance (noise) for the same light, then that would be equal expectations.
This type of mistake is almost never made in support of large pixels, since large pixels are almost invariably the older technology. However, it is worth pointing out anyway. In one sense, it will never be possible to compare any two cameras with completely equal technology, because even unit-to-unit manufacturing tolerances of the same model will cause there to be inequalities.
Having now discussed the many factors that cause flaws in image analysis as it pertains to pixel size, other related topics will be addressed.
One topic that often comes up in a discussion of pixel size is fill factor, which is the relative area of the photoreceptive portion of the pixel, e.g. photodiode. It is commonly asserted that fill factor gets worse as pixel size shrinks (”smaller buckets means more space between the buckets”). In fact, this has not occurred. ”Fill factor pretty much has scaled with technology, ...” (CMOS sensor architect)
Comparing the Sony ICX495AQN to ICX624, for example, pixel area shrunk from 4.84µm to 4.12µm, a decrease of 15%. But instead of losing 15% of the photoreceptive area, it actually increased by 7% (22% total).
Another assertion is that full well capacity decreases with pixel size. In fact, sensor designers say the reverse is true: "smaller pixels have greater depth (per unit area) and saturate 'later in time'".
Read noise is important for dynamic range and performance in low light. Sensors of all types vary greatly in read noise; much more so than they vary in quantum efficiency (sensitivity) or full well capacity.
There is a certain sensor design that has low read noise and currently only occurs in some sensors with big pixels (4.7+ microns) using high analog gain, and is accompanied by high read noise at low gain (compared to small pixels). Not all big pixels have this design, and not all big pixels with high analog gain have this characteristic either.
Yet for every decrease in pixel size, this design continues to have effect. Since pixel size in large sensors is limited by many factors, including the in-camera processing power, it's unknown how small the pixels can shrink before this design is no longer beneficial. Therefore it can't yet be accurately correlated with pixel size.
Another consideration is angle of response. Pixels of all sizes tend to have lower response from oblique angles, such as an ultra wide angle f/1.4 lens, but smaller pixels have even more difficulty because it's hard to scale the depth (z dimension) in proportion with the area (x and y, width and height). But the most typical lenses, such as an f/2.8 "normal" or "wide" (e.g. 50mm or 28mm equivalent) are not a problem for most sensor sizes.
Some assert that smaller pixels put out lower voltages, and the increased amplification necessary results in worse noise. However, Eric Fossum has stated that no sensor designer would allow this to be the case. The simplified description of how it works is given here by Bob Newman:
The accumulated charge of photoelectrons collects on the gate of the source follower transistor which has a gain of something less than one. The output of this amplifier is the voltage output of the cell (actually transferred via the column amplifier). This voltage is determined by the voltage on the source follower gate, which in turn is given by the value of the accumulated charge divided by the capacitance of the gate (by the well known expression V = Q/C).
If you take a cell and scale it uniformly by a factor s. Now the cell is s^2 times smaller in terms of area, so the accumulated charge is s^2 times smaller. However, the gate capacitance is s^2 times smaller also. So the output voltage is now (Q/s^2)/(C/s^2) = Q/C = V. Scaling has not changed the output voltage of the cell, so no extra amplification is needed.
Optical and mechanical issues.
There are many things that can affect the resolution of an image, including diffraction, aberrations, motion blur (from camera shake or subject movement), and mechanical issues such as collimation, back focus, tilt, and manufacturing tolerances.
In the face of these issues, some will claim that small pixels are actually worse than large pixels. This is easily proven false. The reality is that all of these factors may cause diminishing returns, but returns never diminish below 0%.
The most frequently misunderstood factor in diminishing returns is diffraction. As pixel size decreases there are two points: one at which diffraction is just barely beginning to noticably diminish returns (from 100% of the expected improvement, to, say, 90%); and another where the resolution improvement is so small that it's immeasurable (0-1%). One common mistake is to think both are the same point, but in reality they are often very far apart.
Another diffraction-related mistake is to think that diffraction will ever cause a small pixel sensor to have lower performance. In fact, the worst that can ever happen is for smaller pixels to have a 0% improvement. That is, for performance to be the same.
For example, anyone shooting 5 micron pixels at f/32 because they really need DOF (e.g. macro) is not going to get any benefit from smaller pixels: the returns will be close to 0%. At f/11, the returns will be diminished slightly, but an improvement can still be had from smaller pixels.
Lens aberrations can be an issue too. Usually even the cheapest lenses will have pretty good performance in the center, stopped down. But their corners wide open will sometimes not benefit very much from smaller pixels, so the returns in those mushy corners may be 0-5% due to aberrations.
And there's the mechanical issues. If the collimation is not perfect, but it's good enough for large pixels, then it will have to be better to get the full return of even smaller pixels. This relates to manufacturing tolerances of everything in the image chain: the higher the resolution, the more difficult it is to get full return from that additional resolution. Even things like tripods have to be more steady to prevent diminishing returns.
So essentially the diminishing returns depend on the circumstances, but the higher the resolution, the more often the returns will be diminished.
There are several things to consider with regard to pixel size:
* FPS, in-camera electronics, thermals
* In-camera processing / JPEG
* File size
* Workflow / post processing
* Magnification value
File size is an obvious one. Magnification is what causes telephoto (wildlife, sports, etc.) and macro shooters to often prefer high pixel density bodies.
In-camera processing such as JPEGs are affected by pixel density because manufacturers may add stronger median-filter-based noise reduction, which may not be desired and may be difficult to tune with in-camera settings.
Higher pixel densities may require bigger files, slower workflow, and longer processing times. Lower pixel densities may result in smaller files, faster workflow, and shorter processing times. This is an area where there are many possible software solutions for having most of the benefit of smaller pixels without the size/speed downsides. REDCODE is a good example.
Here is the math on a comparison of dynamic range between the LX3 and 5D2. Compare the 2-micron pixels of the LX3 (10.7 stops DR) with the immensely larger 6.4 micron pixels of the 5D2 (11.1 stops DR). Going by the per-pixel numbers, it seems that the smaller LX3 pixels have less dynamic range. But remember that the LX3-sized pixel samples a much, much higher spatial frequency.
At the same spatial frequency, the scaled LX3 pixels have 12.3 stops of dynamic range, 1.2 stops greater.
5D2 maximum signal: 52,300 e-
LX3 maximum signal: 9,000 e-
5D2 read noise at base ISO: 23.5 e-
LX3 read noise at base ISO: 5.6 e-
5D2 per-pixel DR at base ISO: 11.1 stops (log_2(52300/23.5))
LX3 per-pixel DR at base ISO: 10.7 stops (log_2(9000/5.6))
LX3 scaled maximum signal: 92200 (9000 e- * (6.4µm/2.0µm)^2)
LX3 scaled read noise at base ISO: 17.92 (sqrt(5.6 e-^2 * ((6.4µm/2.0µm)^2)))
LX3 scaled DR at base ISO: 12.3 stops (log_2(92200/17.92))
With correct image analysis, it is clear that many statements about small pixels are only myth, thanks to the ingenuity of sensor designers.