Receptive field profiles registered by cell recordings have shown that mammalian vision has developed receptive fields tuned to different sizes and orientations in the image domain as well as to different image velocities in space-time. Corresponding cell recordings in the auditory system has shown that mammals have developed receptive fields tuned to different frequencies as well as temporal transients. This article describes normative theories that have been developed to explain these properties of sensory receptive fields based on structural properties of the environment. Beyond theoretical explanation of biological phenomena, these theories can also be used for computational modelling of biological receptive fields and for building algorithms for artificial perception based on sensory data.
Idealized models of visual receptive fields similar to those found in the retina, the lateral geniculate nucleus and the primary visual cortex of higher mammals can be derived in an axiomatic way from structural requirements on the first stages of visual processing that reflect symmetry properties of the surrounding world in combination with additional assumptions to ensure internally consistent image representations at multiple spatial and temporal scales. Specifically, idealized functional models for linear spatio-temporal receptive fields can be derived in a principled manner to constitute a combination of Gaussian derivatives over the spatial domain and either non-causal Gaussian derivatives or truly time-causal temporal scale-space kernels over the temporal domain: where
denotes a spatial covariance matrix determining the spatial shape of an affine Gaussian kernel,
and denotes orders of spatial differentiation,
denotes the order of temporal differentiation,
and denote spatial directional derivative operators in two orthogonal directions and,
is an affine Gaussian kernel with its size determined by the spatial scale parameter and its shape by the spatial covariance matrix,
denotes a spatial affine Gaussian kernel that moves with image velocity in space-time and
is a temporal smoothing kernel over time corresponding to a Gaussian kernel in the case of non-causal time or a cascade of first-order integrators or equivalently truncated exponential kernels coupled in cascade over a time-causal temporal domain.
Correspondingly, and with similar notation idealized functional models for spatial receptive fields can be expressed of the form This model specifically generalizes the receptive field model in terms of Gaussian derivatives from directional derivatives of rotationally Gaussian kernels to directional derivatives of affine Gaussian kernels. Idealized functional models of receptive fields of these forms have been shown to quite well reproduce the shape of spatial and spatio-temporal receptive fields measured by cell recordings of neurons in the LGN and of simple cells in the primary visual cortex. Theoretical arguments have been presented of preferring this generalized Gaussian model of receptive fields over a Gabor model of receptive fields, because of the better theoretical properties of the generalized Gaussian model under natural image transformations. Specifically, these generalized Gaussian receptive fields can be shown to enable computation of invariant visual representations under natural image transformations. By these results, the different shapes of receptive field profiles found in biological vision, which are tuned to different sizes and orientations in the image domain as well as to different image velocities in space-time, can be seen as well adapted to structure of the physical world and be explained from the requirement that the visual system should have the possibility of being invariant to the natural types of image transformations that occur in its environment.
A computational theory for auditory receptive fields can be expressed in a structurally similar way, permitting the derivation of auditory receptive fields in two stages:
a first stage of temporal receptive fields corresponding to an idealized cochlea model modeled as a windowed Fourier transform
where denotes time, denotes the angular frequency, denotes the temporal scale of the window function, which can be chosen as either Gabor functions in the case of non-causal time or Gammatone functions alternatively generalized Gammatone functions for a truly time-causal model in which the future cannot be accessed,
a second layer of spectra-temporal receptive fields
applied to the magnitude of the logarithmically transformed spectrogram where
denotes the logarithmic frequency,
is a spectro-temporal covariance matrix determining the shape of the second-layer receptive field over the spectro-temporal domain,
is the order of temporal differentiation,
is the order of logspectral differentiation,
the smoothing over the logspectral domain is modeled as a Gaussian function extended with glissando adaptation with
a glissando parameter to account for frequency variations over time
and with the temporal smoothing kernels chosen as either Gaussian kernels over time in the case of non-causal time or first-order integrators coupled in cascade in the case of truly time-causal operations. The shapes of the receptive field functions in these models can be determined by necessity from structural properties of the environment combined with requirements about the internal structure of the auditory system to enable theoretically well-founded processing of sound signals at different temporal and log-spectral scales. Specifically, the resulting spectro-temporal fields in this model obey invariance or covariance properties over natural sound transformations including: temporal shifts, variations in sound pressure, the distance between the sound source and the observer, a shift in the frequencies of auditory stimuli and glissando transformations. Idealized receptive fields of this form can be shown to well model the qualitative shape of spectro-temporal receptive fields as measured by cell recordings in the inferior colliculus as well as the linear component of some receptive fields measured in the primary auditory cortex.