Why Downsampling an Image Reduces Noise

One of our readers, Mike Baker, sent the below email to me today. I thought it was a great and interesting analysis of why downsampling an an image reduces noise, so I decided to share it with you (with his permission, of course). Trying to digest this stuff makes my head spin, but it is a great read. You might need to read it several times to understand what he means, especially with all the mathematical formulas (I had to):

You recently commented about downsizing a high-resolution image to a lower-resolution in order to reduce the apparent noise. While I knew that this is an effective way to reduce noise visible in the images, I had not thought in much detail about the technical reasons why this works.

After a long evening’s thought on the subject, and running a few questions past my friend and fellow engineer, I believe I have a (reasonable, though perhaps not perfect!) handle on the subject…

If the image signal and the image noise had similar properties, averaging neighboring pixels in order to reduce the resolution would not improve the signal-to-noise ratio. However, signal and noise have different properties.

There is (in general) no relationship between the noise in neighboring pixels. Technical junkies call this “no correlation”.

Correlation is the long-term average of the product of two signals N1 x N2. If two signals have no correlation, then the mean of their product is zero.

The signal in neighboring pixels has a high degree of correlation. If you add uncorrelated signals, then their “power” is added, meaning the combined signal is the square root of the combined power.

N_comb = sqrt(N1^2+N2^2) and for N1 = N2 = N we get N_comb = sqrt(2)*N, where N1, N2 are root-mean-square (RMS) values of the noise.

However, if signals are highly correlated, then their sum is effectively the sum of their magnitudes:

S_comb = S1+S2 and for S1=S2=S we get S_comb = 2*S

So, if we add the content of two neighboring pixels, we get:

SNR_comb = S_comb/N_comb = sqrt(2)*(S/N)

So, the signal-to-noise increases by square root of two, which is about 40%.

Now, you may say that the signal in neighboring pixels is not always 100% correlated. The correlation between the signals depends on the image content. If the image content is very smooth, the correlation is high. If the image content varies very fast, the correlation is low. Of course, noise will be more noticeable in smooth areas and the effect of resampling the image will be stronger.

Adaptive noise filters take into account the absolute signal-to-noise and the image content. They reduce the resolution more in areas that are smooth and have poor signal-to-noise and keep the original resolution in areas that have strongly varying image content and high signal-to-noise. You can think of it as a joint optimization of SNR and resolution.

Now, we also need to look into the different sources of noise:

  1. The first source of noise is dark current which is caused by electrons that accumulate in the individual pixel well, even if there are no photons entering (lens cover on). Dark current becomes dominant for very long exposures. For normal exposures the errors from trapped electrons are negligible.
  2. The second source of noise is the read-out noise. This is essentially generated by two sources: A) Noise added by the amplifier and B) Noise generated by the analog-to-digital converter. It is a fixed amount of noise that is added to each image during read-out. When you choose the ISO setting on your camera, you essentially set the read-out gain and therefore the read-out noise. The higher the ISO, the higher the read-out gain and the less read-out noise. Of course if you pick an ISO which is too high you will get signal saturation. So for low-light situations always pick an ISO that is no higher than needed to capture the image you want.
  3. The third source of noise is called “quantization noise” and is a bit harder to understand. It has to do with the fact that (in low-light conditions) we don’t sample a smooth, continuous flow of photons but rather discrete bunches of photons. The problem is, that a source of light does not produce a stream of photons that are spaced equally in time. So, if you image a low light source that sends out (on average) 100 photons per second, you may receive 90 photons for the first second, 105 for the second etc.. The average error will be on the order of the square-root of the number of photons (or electrons in the pixel sensor well). A typical sensor well contains between 20,000 and 60,000 electrons when fully charged. The maximum amount depends on the pixel size. A sensor well with 20,000 electrons has an error of approx +/-141 electrons when fully charged or +/-0.7%. A well with 60,000 electrons has an error of approx +/-245 electrons when fully charged or +/-0.4%. While we may be able to reduce dark current and read-out noise by cooling the sensor, there is essentially nothing we can do about it. If we keep on shrinking the pixels, we will have smaller and smaller electron wells and less and less electrons trapped.

    The above errors of 0.7% or 0.4% appear rather small and we would not be able to notice them. However, in low-light situations, sensor wells will be only partially filled. If we only manage to trap 1000 electrons, the error becomes 3%. If we only trap 100 electrons, the error becomes 10%.

    Notice that the term “quantization noise” has nothing to do with the signal quantization by the analog-to-digital converter. It has to do with the fact that your signal actually arrives in quantums of energy.

What do you guys think? Anyone wants to challenge Mike’s analysis? :)

Exit mobile version