mean subtraction and contrast normalization (MSCN)은 Difference of Gaussian (DoG), Laplacian of Gaussian (LoG), Gradient Magnitude (GM)와 같은 이미지 전처리 방식 중에 하나이다. 눈에 보이는 것과 우리 뇌에서 이미지를 처리할 때는 분명 다를 것이기 때문에 영상처리를 위해서는 이러한 전처리 과정이 필요하다. DoG, LoG, GM과 같은 전처리들은 이미지의 엣지 부분을 도드라지게 하는 방법들이다. 이때 시각적으로 그다지 중요하지 않은 부분들은 약화된다. MSCN도 역시 이미지의 불필요하게 반복되는 부분들을 제거하면서, 중요한 부분을 비선형적인 방식으로 남겨준다.
좀더 심화된 이해를 위해 MSCN에 대한 논문들을 읽어보았다. 아래는 중요하다고 생각한 내용들을 옮겨 적은 것이다.
1) BRISQUE
Much recent work has focused on modeling the statistics of responses of natural images using multiscale transforms (eg., Gabor filters, wavelets etc.). Given that neuronal responses in area V1 of visual cortex perform scale-space-orientation decompositions of visual data, transform domain models seem like natural approaches, particularly in view of the energy compaction (sparsity) and decorrelating properties of these transforms when combined with divisive normalization strategies. However, successful models of spatial luminance statistics have also received attention from vision researchers.
Given a image, first compute locally normalized luminances via local mean subtraction and divisive normalization. Ruderman observed that applying a local non-linear operation to log-contrast luminances to remove local mean displacements from zero log-contrast and to normalize the local variance of the log-contrast has a decorrelating effect.
2) NRSL
The contrast normalization scheme is applied to the image to remove redundancy in the visual input.
Local contrast normalization has been used as a preprocessing stage to emulate the nonlinear masking of visual perception in many image processing applications. Generally, each coefficient is divided by the square root of Gaussian weighted combination of the squared amplitudes of its neighbors.
Process each featuure map by performing perceptuallysignificant debiasing and divisive normalization operations on them.
4) IL-NIQE
Ruderman pointed out that the locally normalized luminances of a natural gray-scale photographic image conform to a Gaussian distribution.
5) The statistics of natural images
The early stages of vision, such as those in the retina, are constrained to process images locally-no neuron has access to the entire image. The neurons which convey these signals will have output statistics which are determined by the images. According to various efficiency criteria the responses of these channels should have certain statistical properties.
For instance, channels with signal variance constraints are optimized for information transfer by sending Gaussian signals. Neurons have an analogous constraint in their function since their firing rates can saturate at high levels and cannot go negative. The optimal encoding statistics thus depend on the imposed constraints.
To search for a likely candidate we should think about the possible causes of the excess histogram tails. Consider an analogy with music, which is an ensemble with similar properties to images. The amplitudes of musical sound pressure also have exponential tails. The source of the long tails is the dynamics of the musical score; some sections are loud and some are quiet for an interval of time. If the quiet passages were amplified and the loud ones attenuated then the excesses at the tails and the peak of the distribution would move to more 'typical' values, thus diminishing the peak and tails. This would give the distribution a more 'rounded' or Gaussian character. Maybe a similar dynamic occurs in natural scenes, where locally correlated regions are either flat in texture (i.e. quiet) or very dynamic (loud). This suggests an origin for long exponential tails. The histogram is a superposition of many distributions of different variance.
Dividing a sound waveform by its recent loudness is a local nonlinear operation. We can try an analogous procedure on images by normalizing log-contrast fluctuations relative to their local standard deviation.
This procedure has the effect of removing mean displacements from zero log-contrast and normalizing the local variance of the log-contrast. Patches of small local contrast will be expanded, and high contrast areas will be toned down.
The variance normalized image is much more homogeneous than the original.
6) Perceptual Quality Prediction on Authentically Distorted Images Using a Bag of Features Approach
This normalization process reduces spatial dependencies in natural images.
Wainwright et al., building on Ruderman's work, empirically determined that bandpass natural images exhibit striking non-linear statistical dependencies. By applying a nonlinear divisive normalization operation, similar to the non-linear response behavior of certain cortical neurons, wherein the rectified linear neuronal responses are divided by a weighted sum of rectified neighboring responsesgreatly reduces such observed statistical dependenciesand tends to guassianize the processed picture data.
Divisive normalization by neighboring coefficient energies in a wavelet or other bandpass transform domain similarly reduces statistical dependencies and gaussianizes the data.Divisive normalization or contrast-gain-control accounts for specific measured nonlinear interactions between neighboring neurons. It models the response of a neuron as governed by the responses of a pool of neurons surrounding it. Further, divisive normalization models account for the contrast masking phenomena, and hence are important ingredients in models of distorted image perception.
7) Nonlinear Image Representation Using Divisive Normalization
In this paper, we describe a nonlinear image representation based on divisive normalization that is designed to match the statistical properties of photographic images, as well as the perceptual sensitivity of biological visual systems.
Most recent efforts in finding image representation focus on linear transforms optimized to minimize statistical dependencies.
Nevertheless, linear transforms do not completely eliminate higher-order statistical dependencies in photographic images. It is thus natural to develop invertible nonlinear transforms that can reduce such higher-order statistical dependencies.
Several recent image representations include spatially varying divisive normalization as simple nonlinear map, where each component in a cluster of coefficients is divided by the square root of a linear combination of the squared amplitudes of its neighbors. Divisive normalization was originally motivated by observed properties of biological vision, where it was used to explain nonlinearities in the responses of cortical neurons, nonlinear masking phenomenon in visual perception, and has also been empirically shown to reduce statistical dependencies of the original linear representation.
8) A model of visual contrast gain control and pattern masking
Contrast gain control is a mechanism that serves to keep neural responses within their permissible dynamic range while retaining the information conveyed by the pattern of activity over the neural ensemble.
나는 도대체 왜 이미지에 전처리과정으로 MSCN을 시행하고, 어떤 생리학적 근거로 시행하는 것이며, 어떤 효과를 주는지에 대해서 알고 싶었다. 위의 논문들을 읽어가며 나름대로 내린 결론은 아래와 같다.
대부분의 2D 이미지에서 각 픽셀의 값은 이웃하고 있는 픽셀들과 크게 상관성을 갖는다. 즉, 각 픽셀의 값은 이웃 픽셀값과 비슷하거나 종속적이다. 따라서 이미지 내에 불필요하게 중복되는 정보가 꽤 많다. 우리의 눈은 시공간적으로 중복되는 정보는 뇌로 보내지 않는다. 만약 다 보내면 뇌에서 처리해야할 양이 너무 많아지기 때문이다. 이러한 불필요한 중복(redundancy)를 제거해주기 위한 과정이 필요하다. 그것을 모방한 것이 바로 MSCN이다. 일단 각 픽셀에서 주변 픽셀들(자신을 포함)의 평균값을 빼준다. 그리고나서 주변 픽셀들의 표준편차값으로 나눠준다. 이러한 방식으로 각 픽셀값들은 뇌에서 처리하게 좋게 표준화된다. 흥미롭게도 이 과정을 거치면 특별한 규칙이 없었던 이미지의 히스토그램이 가우시안 형태로 변화된다. MSCN은 생리학적 현상을 잘 모델링하고 연산에 있어서도 간단하기 때문에, 많은 이미지 처리 분야에서 이미지의 전처리를 위해 널리 사용되고 있다.