 |
| Volume 2, Number 1, Article 2, Pages 12-24 |
doi:10.1167/2.1.2 |
http://journalofvision.org/2/1/2/ |
ISSN 1534-7362 |
Receptive field structure of neurons in monkey primary visual cortex revealed by stimulation with natural image sequences
Dario L. Ringach |
Departments of Psychology and Neurobiology and Brain Research Institute, UCLA, Los Angeles, CA, USA |
|
Michael J. Hawken |
Center for Neural Science, New York University, New York, NY, USA |
|
Robert Shapley |
Center for Neural Science, New York University, New York, NY, USA |
|
Abstract
Probing the visual system with the ensemble of signals that occur in the natural environment may reveal aspects of processing that are not evident in the neural responses to artificial stimulus sets, such as conventional bars and sinusoidal gratings. However, unsolved is the question of how to use complex natural stimulation, many aspects of which the experimenter cannot completely specify, to study neural processing. Here a method is presented to investigate the structure of a neuron's receptive field based on its response to movie clips and other stimulus ensembles. As a particular case, the technique provides an estimate of the conventional first-order receptive field of a neuron, similar to what can be obtained with other reverse-correlation schemes. This is demonstrated experimentally and with computer simulations. Our analysis also revealed that the receptive fields of both simple and complex cells had regions where image boundaries, independent of their contrast sign, would enhance or suppress the cell's response. In some cases, these signals were tuned for the orientation of the boundary. This demonstrates for the first time that it might be feasible to investigate the receptive field structure of visual neurons from their responses to natural image sequences.
 |
|
History
Received June 18, 2001; published January 2, 2002
Citation
Ringach, D. L., Hawken, M. J., & Shapley, R. (2002). Receptive field structure of neurons in monkey primary visual cortex revealed by stimulation with natural image sequences.
Journal of Vision, 2(1):2, 12-24,
http://journalofvision.org/2/1/2/,
doi:10.1167/2.1.2.
Keywords
reverse correlation, triggered correlation, primate, recursive least squares, linear-nonlinear model, system identification
for related articles by these authors
for papers that cite this paper |
The goal of this project is to investigate how visual
cortical cells respond to natural stimulation and to study what sort of signal
processing occurs within primary visual cortex. We postulate that the use of
natural image sequences may reveal aspects of cortical processing that are not
evident when using simpler stimuli such as bars and luminance modulated
gratings. Because the cortex is a nonlinear network, it may not be feasible to
use the neural responses to simple stimuli to predict and understand cortical
cells’ activity under natural stimulus conditions. Furthermore,
accumulating evidence suggests that the surround of the
“classical receptive
field ” of a neuron can
modulate its response in very specific ways (see,
Sillito, Grieve, Jones, Cudeiro, & Davis, 1995;
Zipser, Lamme, & Schiller, 1996;
Levitt & Lund, 1997;
Walker, Ohzawa, & Freeman, 1999;
Sceniak, Ringach, Hawken, & Shapley, 1999;
Walker, Ohzawa, & Freeman, 2000;
Kapadia, Westheimer, & Gilbert, 2000).
It has been proposed that contextual modulation is the basis for figure and
ground segregation
( Knierim & van Essen, 1992;
Zipser et al., 1996;
Sillito et al., 1995), as well as
grouping and segmentation
( Kapadia et al, 2000;
Chen, Kasamatsu, Polat, & Norcia, 2001).
It has also been argued that the pattern of contextual interactions observed in
V1 is what one would expect for a grouping network processing natural scenes
( Siegman, Cecchi, Gilbert, & Magnascol, 2001).
Thus, it is becoming increasingly important to understand how V1 neurons respond
when their “classical
receptive fields ” are
embedded in a natural surround.
A first step necessary to attack these questions is to
develop methods to study the structure of a neuron's receptive field from its
response to natural image sequences. Here we present a technique that allows
one to investigate the input-ouput relationship of the cell by estimating a
family of receptive-field
“kernels ”
associated with particular
“features ”
of the stimulus. As a particular case, the proposed method recovers the
first-order kernel of a neuron with respect to the luminance of the visual
stimulus
(Marmarelis & Marmarelis, 1978).
The data set collected in the present study consists of
several movie segments that have been digitized and stored in the computer (the
stimulus) and the corresponding responses of neurons in V1 to these movie clips.
Our goal is to understand how neural activity in each case is influenced by the
physical properties of the image sequences. The complexity of a natural
stimulus introduces several challenges in the analysis. Most aspects of the
stimulus are no longer under experimental control. Instead of varying one
parameter at a time, as is customary in most experimental designs, a large
number of physical properties are changing simultaneously. Explaining the full
response of the cell to the movie sequences might be a very difficult task;
instead, it is shown that one can readily test if a particular property of the
stimulus influences the firing rate of a cell and to what extent. In this way,
the richness of natural scenes can be exploited to explore which
“features”
of the image induce a cell to fire.
We begin by considering cortical simple cells
( Hubel & Wiesel, 1968;
Movshon, Thompson, & Tolhurst, 1978a;
Skottun et al., 1991). Simple cells are
thought to provide oriented, spatially bandpass filtering that is one of the
essential early stages in visual processing
( DeValois & DeValois, 1988). A
common mathematical model describing the function of simple cells, under a
constant level of contrast gain control, is a linear operator followed by
rectification ( Movshon et al., 1978a;
Tolhurst & Dean, 1990;
Tolhurst & Heeger, 1997;
Carandini, Heeger, & Movshon, 1997).
Numerous studies of V1 simple cells using bars, spots or gratings have
established that their receptive field structure can be approximated by
spatially discrete antagonistic subregions
( Hubel & Wiesel, 1968;
Movshon et al., 1978;
Andrews & Pollen, 1979,
Jones & Palmer, 1987;
Parker & Hawken, 1988). If spatial
summation of neural signals by simple cells is linear, then different stimulus
ensembles could be used to determine the first-order linear receptive field or
kernel of these neurons ( Victor, 1992).
Furthermore, invariance of the resulting kernel with respect to the stimulus
ensemble provides one way in which the linearity of simple cells could be tested
( Ringach, Sapiro, & Shapley, 1997b).
Using this method, we show that the first-order kernel of simple cells can
readily be recovered from their responses to movie clips.
One
must consider also that visual cortical cells could be responding to some
nonlinear feature of the visual stimulus. Certainly complex cells respond in a
nonlinear manner to luminance contrast
( Hubel and Wiesel, 1968;
Movshon et al., 1978b;
Spitzer & Hochstein, 1985;
DeValois et al., 1982;
Szulborski & Palmer, 1990). To
what other nonlinear features of the stimulus are V1 cells responding? The
concept of a “feature
map ” of a stimulus is
introduced in this paper as a way of studying responses of visual neurons to
different attributes in natural image sequences. In essence, the method
estimates the best linear predictor of the cell's response given a particular
feature map of the stimulus. The initial results indicate that V1 neurons have
a variety of complex responses to natural images, and that sophisticated image
processing might be occurring in the V1
cortex.
Physiology, Optics and Visual Stimulation
Acute experiments were performed on adult Old-World
monkeys ( Macaca fascicularis) in
compliance with National Institutes of Health guidelines as described elsewhere
(Ringach et al., 1997a). Natural image
sequences were generated by digitally sampling commercially available videotapes
in VHS/NTSC format. A Silicon Graphics R10000 Solid Impact was used to sample
frames at a spatial resolution of 320
× 240 pixels (6 deg
× 4.5 deg of visual angle) and
at a temporal rate of 15 Hz. The selected movies included both man-made and
natural landscape scenes. Six segments of 30-s duration were sampled from eight
different movies, making a total of 24 minutes of video. The movies were
compressed using Silicon Graphics’ MVC2 compression scheme (proprietary)
and stored on a disk. The compressed data fitted in 480 megabytes of memory. A
Silicon Graphics O2 R5000 computer played back the images during the experiment
on a computer screen that measured 34.3 cm wide by 27.4 cm high. The refresh
rate of the monitor was 100 Hz and each movie image was presented for six
consecutive frames. Thus, the effective playback rate was 16.6
Hz—slightly faster than the sampling rate. The mean luminance of the
display was 56 cd/m 2.
Stimulation was monocular to the dominant eye (the other eye was occluded).
Movie clips were effective in evoking responses from V1 cells; the mean spike
rate to natural image stimulation was
≈10 spikes/s. We think these movie
clips have statistics similar to those used in other studies of natural image
sequences, such as those employed by
van Hateren and van der Schaaf, (1998)
who sampled videos from Dutch, British and German TV
broadcasts.
Each cell was stimulated monocularly via the dominant
eye and characterized by measuring its steady-state response to conventional
black/white drifting sinusoidal gratings (the nondominant eye was occluded).
With this method we measured basic attributes of the cell, including spatial and
temporal frequency tuning, orientation tuning, contrast, and color sensitivity
( Johnson, Hawken, & Shapley, 2001),
as well asarea, length, and width tuning curves. Experiments using natural
image sequences were performed following these standard measurements.
Steady-state orientation tuning curves were obtained using angular steps of 15
deg or 20 deg. In a few very sharply tuned cells, we used steps of 10 deg.
Simple cells are defined as those neurons whose responses had ratios of first
harmonic to mean response larger than one when stimulated with a drifting
grating having optimal spatio-temporal parameters. All other cells are defined
as being complex. Receptive fields were located at eccentricities between 1 and
6 deg.
The question that we want to address is how the
response of a neuron depends on the recent history of the movie sequence. We
propose a method that is an extension of one recently introduced by
DiCarlo, Johnson, and Hsiao (1998) to
analyze receptive fields in area 3b of primary somatosensory cortex in response
to random dot patterns (see also the recent work of
Theunissen et al. [2001]). The
following terminology will be used. Let
I( x,y,t)
denote the value of a pixel at location
( x,y) and time
t. This is normally a
three-dimensional vector representing the values of the red, green and blue
components of the pixel. For the response of the cell we consider the total
number of spikes occurring within a time window
 centered at time
t. This value is denoted by
r( t)
Formally, the general problem is to determine how the
response,
r( t),
depends on the recent history of the visual stimulus
 where
T is the width of the analysis window.
This relationship is fully characterized by the joint probability of the
stimulus and the response
P( s,r)
(Rieke, Warland, van Steveninck, & Bialek, 1997).
Due to the high dimensionality of the stimulus space, however, estimating this
probability distribution is not possible in the given experimental time.
Instead, methods that make specific assumptions about the relationship between
stimulus and response are required. Here we consider a general class of models
described
by  | (1) |
where
 is the mean
response rate,  represents a
feature map sequence, and
w( x,y.t)
are weights representing a spatio-temporal kernel of the receptive field. The
feature map is a function (linear or nonlinear) of the input image sequence,
I( x,y,t).
Therefore, the cell's modulation away from its mean response
 is modeled as a
linear spatio-temporal filter acting on the feature map sequence. The choice of
 is limited only
by our intuition about what
“features ”
of the image sequence the cell at hand may be representing.
For example, one of the feature maps considered below
is the luminance contrast map. The
“luminance contrast
map” is defined
by
where
L( x,y,t)
=
wTI( x,y,t)
is the luminance of the pixel at location
( x,y) at time
t, and
 is the mean
luminance of the frame at time t. The
luminance of a pixel is obtained by weighting the values of the red, green, and
blue guns appropriately (which is achieved by multiplying with a vector
w obtained from the calibration of the
display). As an example of this calculation, we present original color frames
from the stimulus in Figure 1a, and their
associated luminance contrast map Figure 1b.
With this definition of the feature map, the modulation of the cell's response
is modeled as a linear function of the luminance contrast values within its
receptive field—a commonly used model for simple
cells. Figure
1 . Examples of feature maps obtained from the original
frames in the movie. (a) Three still frames taken from the movie
“Sleeper.” (b) The luminance-contrast maps associated with the
original images. Regions in white indicate positive values of contrast, whereas
regions in black indicate negative values. (c) The edge map associated with the
original images. Locations where large gradients in the luminance contrast map
are located are emphasized. (d) Oriented edge maps associated with the original
images when  ; this choice accentuates oriented
boundaries that are near vertical. (e) Oriented edge maps associated with the
original images when  this emphasizes
oriented edges that are near horizontal.
A second feature map of interest is given
by  | (2) |
where
 . This value
represents the absolute value of the luminance contrast gradient, which is large
in those regions where boundaries are present in the image. Thus, the result of
this computation may be considered an
“edge map''
(Pratt, 1991). It can be seen that the
edge map ( Figure 1c) associated with the
original images ( Figure 1a) emphasizes local
changes in contrast. This definition of the
“edge
map ” is insensitive to the
local contrast sign of the contour or its orientation. Clearly, the edge map is
a nonlinear operator on the luminance of the images.
In some situations it is of interest to separate
the contributions of edges at different orientations to the cell's response.
This may be done by defining an oriented edge map as
follows,  | (3) |
This feature map emphasizes
edge boundaries whose orientations are normal to the selected orientation. For
example, the selection
 accentuates
vertical boundaries in the image ( Figure 1d).
Similarly, the oriented edge map where
 emphasizes
horizontal boundaries ( Figure 1e). This
measure is also insensitive to contrast sign.
Once a feature map
 is selected, we
want to find the optimal spatio-temporal weighting function, or kernel,
w( x,y,t)
that predicts the response of the cell in the least squares sense according to
the model in Equation (1). When the input
is white noise stimuli, one computes this kernel by calculating the mean input
before a spike ( Lee & Schetzen, 1965;
deBoer & Kuyper, 1968). This
computation does not apply to natural images because there are strong
spatio-temporal correlations in the input sequence and resulting feature maps.
The autocorrelation of the input must be taken into account. To do this, we used
a standard technique, recursive least-squares (RLS), to calculate the optimal
kernel. The input to the algorithm is the feature map sequence and the response
of the cell. The output is the optimal kernel that transforms the feature map
into the response. This is done via a recursive procedure that refines our
guess of the kernel as more and more data are added to the calculation. At the
first step of the calculation, the weights (kernel values) are all set to zero.
At the  th step of the
calculation, the old estimate of the
kernel, at step n – 1, is used to
predict the response of the cell. The error between the predicted and true
response is used to make a correction to the weighting function and generate a
new estimate. The correction in the RLS algorithm is computationally complex but
basically it is the present input image filtered so as to correct for image
correlation, and is weighted by the magnitude of the error. It can be shown that
the expected value of the RLS algorithm's estimate is equal to the true value of
the kernel ( Haykin, 1991).
Some of the advantages of the recursive least squares
technique, over standard least squares, are as follows. First, in contrast to
the standard least-squares technique, there is no need to invert the (very
large) correlation matrix of the input data at any stage in the algorithm.
Instead, a recursive estimate of the inverse of the correlation matrix is
updated as new data arrive ( Haykin, 1991).
This is important when the condition number of the correlation matrix is high,
as is the case for the application at hand. A high condition number implies
that inverting the matrix is not a numerically stable process
(Golub & van Loan, 1989). Second, the
technique is recursive, so estimates can be updated as new data are collected.
This could help us decide when sufficient data have been gathered on a
particular cell as we run the experiment. Third, slow trends in excitability,
due to variations in anesthetic levels and the physiology of the animal can be
factored out by a recursive estimation of the mean values. Fourth, such a
technique could be used in principle to follow changes of the receptive field
with time when the cell is presented with nonstationary input. Thus, in
principle, the technique could allow the study of adaptation to changes in the
statistics of natural images. A detailed description of the algorithm is
provided in the “Appendix.”
To test the performance of the algorithm, we first
determined whether the method could recover the classical first-order kernel in
a model V1 cell consisting of a cascade of a linear receptive field (acting on
the luminance contrast of the input) and a threshold nonlinearity (see
Figure 2).
Figure 2.
Performance of method on a simulated simple cell. The system consists of a
cascade of a spatial linear filter (acting on the luminance contrast of the
input) followed by a hard-step threshold nonlinearity. The signal
z( n)
represents Gaussian additive noise. The simulated receptive field had two
subfields, one excitatory (in red) and one inhibitory (in blue). The algorithm
had to estimate this receptive field given the input image sequence and the
response of the cell,
r( n).
The result of applying the method is shown below the image of the simulated
receptive field.
At each time step, the dot product between the
simulated receptive field and the input image was computed first (the receptive
field was centered on the movie frame). This value is denoted by
y( n)
( Figure 2). Next, in an attempt to make the
simulation realistic,
y( n)
was perturbed by a large amount of additive Gaussian noise,
z( n).
The standard deviation of
y( n)
and
z( n)
were equal, i.e., the signal-to-noise ratio was one. Finally, the resulting
signal
w( n)
=
y( n)
+
z( n)
was passed through a hard rectifier
( Figure 2, right). The threshold was set at
a value that caused the model cell to
“fire ”
(i.e., generate a nonzero output) only 12% of the time. This is equivalent to a
mean response rate of ≈ 2 spikes/s.
The output variance was 2.1
(spikes/s). 2 These
numbers are close to the median values for our data: median response 2.4
spikes/sec and variance (2.3
spikes/s). 2 The movies
used in the simulation, and the length of the data record, were identical to
those in the actual experiment. The simulated receptive field had two symmetric
subfields, one excitatory (indicated in red) and one inhibitory (indicated in
blue), and was defined on a square grid of 17
× 17 pixels representing 0.65
deg × 0.65 deg of visual
angle. These parameters were selected to test the proposed method under
stringent conditions: the algorithm had to estimate 289 parameters from very
noisy thresholded data in the presence of highly correlated input signals (the
condition number
[ Golub & van Loan, 1989]) of the
luminance covariance matrix was
 ). The resulting
estimate of the receptive field is very good
( Figure 2,
lower receptive field ): the correlation
coefficient between the true and estimated weights equals 0.88. Thus, the
algorithm can perform very well even in the presence of strong output
nonlinearity and large additive noise
levels. Figure 3. Analysis
of receptive field structure using natural image stimulation. Each panel in
this figure shows the estimated luminance contrast kernel (on the left) and the
edge kernel (on the right) for several V1 cells. Additional information is
displayed on top of each panel: the cell's laminar location, the ratio between
the first-harmonic component and the mean of the response
( F1/ F0)
for the optimal sinusoidal grating stimulus followed by the classification of
the cell as simple
( F1/ F0
> 1) or complex
( F1/ F0
≤ 1) the angular size represented by one side of the 17 x 17 grid, and
the preferred orientation of the cell as measured with conventional drifting
gratings. The orientation of the bar corresponds to the orientation of the
grating that generated the best response. In
( l) the bar was omitted because the
cell was not well tuned. In most cases we observe that the preferred
orientation of the cell closely matches the axis of elongation of the estimated
receptive fields. Each kernel was normalized independently so that its maximum
absolute value was one. This makes optimal use of the pseudo-color map which
ranges from –1 (blue) to +1 (red).
The analysis was then applied to study the structure of
receptive fields in 22 cells of macaque V1. As described in
”Methods,” this was done by having the model predict the response of
the cell at time  given the feature
map at time  . A fixed delay
of  ms, which
corresponds to the median time-to-peak in our V1 population
( Ringach et al., 1997a), was used for
all cells. Representative results are shown in
Figure 3. Each panel in the figure
corresponds to a different cell and depicts the luminance contrast kernel on the
left, and edge kernel on the right. Regions in red correspond to positive
values of the kernel; those in blue represent negative values. For comparison,
the optimal stimulus orientation obtained with drifting sinusoidal gratings is
shown by the orientation of the bar on top of the kernels for each
cell. Figure 4. (a,b)
Comparison of receptive fields mapped with natural image sequences (right panel)
and with subspace reverse correlation (left panel) for two V1 cells. (c)
Scatter plot of the cell preferred orientation as measured with steady-state
drifting gratings ( x-axis) versus the
angle of elongation of the strongest subfield in the kernels for the cells in
Figure 3 (a-k). Open squares represent the elongation of subfields in the
luminance contrast kernel and open circles represent the elongation for the edge
map kernels. Cases where the ‘aspect ratio’ of the subfield defined
by the ratio between the largest and smallest eigenvalues of the (centered)
second order moment matrix was less than 1.2 were ignored. A small aspect
ratio could result because of noise in the kernels (such as Figure 3i, left
panel) or because the subfield was round (such as Figure 3l, right panel).
To check even more rigorously whether or not the
kernels recovered with the RLS algorithm characterize the visual function of the
V1 neurons, we mapped the luminance-contrast kernel in a few V1 simple cells
using natural images and a more conventional reverse correlation technique
( Ringach et al., 1997b).
Figure 4a and 4b illustrate the results in
two V1 neurons. The receptive field on the left panel corresponds to the
estimate obtained using standard reverse correlation, and the panel on the right
shows the kernel estimated from stimulation with natural image sequences. Both
methods provide similar estimates. In addition, the luminance contrast kernel
of V1 cells obtained from its responses to the movie clips has often elongated
excitatory and inhibitory subfields
( Figure 3). We compared the axis of
elongation in the kernels with the preferred orientation of the cell estimated
from the response tuning as a function of orientation for drifting sinusoidal
gratings ( Figure 4c). The axis of
elongation of the strongest subfield was determined by calculating the
eigenvalues and eigenvectors of the (centered) second order moment matrix of
absolute values of the kernel for that subfield. The direction of the largest
eigenvector provides the axis of elongation, and the ratio between the largest
and the smallest eigenvalue gives the aspect ratio of the subfield.
Figure 4c shows that the axis of elongation
in the kernels matches the preferred orientation estimated from the steady-state
orientation-turning curve.
The statistical significance of the kernels was
evaluated as follows. First, to obtain an estimate of the noise in the
measurement we calculated the standard deviation of the kernel values in pixels
located away from the receptive field. Then, the kernel was normalized by the
standard deviation of the noise. The result of this calculation is a
z-transformed kernel ( Zar, 1996).
Figure 5 replots the z-transformed values of
some of the kernels in Figure 3. Here the
color map ranges from a z value of –10 (blue) to +10 (red). Thus, the
maximum value in this scale corresponds to a kernel amplitude that is 10 times
the standard deviation of the noise. All kernel features discussed below, both
in the luminance and edge maps, had peak absolute z values larger than 4. This
implies a significance level of
 . Figure 5. Evaluating the statistical
significance of kernel features. The figure shows the z-transformed values of
some of the kernels depicted in Figure 3. All features described in the text
had peak absolute z values larger than 4.
In the kernels mapped using natural image sequences, we
observed simple cells that had structure in both the luminance contrast map and
in the edge kernel map
( Figure 3a,3c,3e, and 3f). Similarly, some
complex cells also exhibited structure in their luminance contrast kernels
( Figure 3b,3d,3g, and 3h, left panel),
whereas others did not ( Figure 3i-3l, left
panel). Another salient feature of the data is that all cells, both simple and
complex, showed spatial structure in their edge kernels. The structure of the
kernels for a direction selective simple cell in layer 6 illustrates how this
method of analysis can reveal complexities in the organization of a receptive
field ( Figure 3a). It can be seen that the
luminance contrast kernel shows two elongated subfields; one excitatory and one
inhibitory ( Figure 3a, left panel). The
preferred orientation of the cell, as measured with drifting sinusoidal
gratings, is shown at the top right of
Figure 3a and closely matches the
luminance-contrast kernel's orientation.
The spatial structure observed in the kernel associated
with the edge map was unexpected
( Figure 3a, right panel). This kernel
displays primarily two slightly elongated subfields. One field is excitatory,
indicating that high values of luminance contrast gradients in that region
induced the cell to respond more. The second subfield is inhibitory; it
indicates that image boundaries in that region, independent of their contrast
sign, suppressed the response of the cell. The cell's preferred direction of
motion, as determined with drifting gratings, was from the excitatory toward the
inhibitory subfield. Notice also that the edge kernel appears to be slightly
displaced with respect to the center of the luminance-contrast kernel and has a
somewhat greater spatial extent. A weaker suppressive region, located to the
left of the excitatory region, may also be seen. The spatial structure seen in
the edge kernel reveals the presence of a contrast independent (nonlinear)
signal that modulates the response of this simple cell.
The result obtained in a complex cell from layer
4C  is shown in
Figure 3b. A luminance contrast kernel with
two parallel subfields of opposite signature in the luminance contrast kernel
was detected ( Figure 3b, left panel). The
preferred orientation of the cell as determined with drifting sinusoidal
gratings matches the axis of elongation of the subfields. The edge kernel has a
single excitatory subfield centered at the same location as the excitatory
subfield of the luminance-contrast kernel but extending further in space
( Figure 3b, right panel). We observed
several cases in which cells (both simple and complex) showed two subfields of
opposite signs in the luminance-contrast kernel and a single excitatory subfield
in the edge kernel
( Figure 3c,3e, and 3f).
In some cases, it appears that selectivity for
orientation is conferred to the neuron by a contrast-insensitive signal. The
luminance contrast kernel obtained from an
“on”-center cell in
layer 6 appears isotropic in space
( Figure 3g, left panel). The edge kernel, on
the other hand, is slightly elongated
( Figure 3g, right panel). The axis of
elongation corresponds well with the preferred orientation as measured with
drifting gratings. The neuron was well tuned for orientation; its tuning curve
had a half-bandwidth at half-height of 25 deg. One would not expect this cell
to be orientation tuned based on the measurement of the luminance-contrast
kernel alone.
In Figure 3h we
present the result from an
“off-center ”
cell in layer 4B. Notice that the excitatory field in the edge kernel is
slightly displaced in space with respect to the luminance contrast kernel.
The structure of the edge kernel reveals information
about the organization of the receptive field in complex cells that do not have
measurable luminance contrast kernels. A subset of complex cells showed weak or
no spatial structure in their luminance contrast kernels
( Figure 3i-3l). The kernels estimated with
respect to the edge map, on the other hand, have obvious spatial structure in
them. Figure 3i illustrates the analysis of
a direction selective complex cell in layer 6. The preferred direction of
movement, as determined with drifting gratings, was from the excitatory towards
the smaller inhibitory subfield
( Figure 3i, right panel). Other cells in
this group had a single excitatory field in their edge kernels. In some cases,
the field was elongated and matched the preferred orientation of the cell
( Figure 3j and 3k, right panel); other cells
had nearly circular fields
( Figure 3l, right panel).
Finally, the effect of oriented image boundaries on the
response of the cell can be studied by estimating spatial kernels with respect
to the “oriented edge
map. ”
Figure 6 shows examples from three complex
cells. In each case four different kernels are depicted. In left to right
order they are the luminance contrast kernel, the edge kernel, the oriented-edge
kernel when the angle
 was selected to
emphasize edges with orientations similar to the preferred orientation of cell,
and the oriented-edge kernel when the angle
 was selected to
accentuate boundaries at the orthogonal orientation.
The edge kernel in
Figure 6a indicates the presence of three
subfields: a small central excitatory subfield flanked by two elongated
inhibitory subfields. The kernels estimated with respect to the oriented edge
maps show that the central excitatory mechanism arises from boundaries having
the same orientation as the one preferred by the cell. The inhibitory
subfields, in contrast, result from boundaries orthogonal to the preferred
orientation. This means that in these regions of space, edges perpendicular to
the optimal orientation for the cell suppress its response. This could be a
manifestation of cross-orientation inhibition
( Morrone, Burr, & Maffei, 1982).
Thus, such analysis of responses to natural images may provide a way to
understand how the spatial arrangements of oriented edge segments influence the
response of the
cell. Figure 6. The use of oriented edge maps
to analyze the contribution of different orientations to the cell's response.
Each panel in this figure shows four kernels. In left to right order they
represent: the luminance contrast kernel, the edge kernel, the oriented-edge
kernel when the angle
 was selected to
emphasize edges with orientations similar to the preferred orientation of cell,
and the oriented-edge kernel when the angle
 was selected to
accentuate boundaries at the orthogonal orientation. These are all complex
cells, and have weak structure in their luminance contrast kernels.
In other complex cells, the kernels calculated with
respect to the edge map and the oriented-edge map for the preferred orientation
are similar and show a single excitatory region
( Figure 6b and 6c). In contrast, the
orthogonal edge maps appear to be more diffuse and peak in different spatial
locations. One may conjecture that such receptive field structures may underlie
the enhanced responses of cells to orientation contrast
( Knierim & van Essen, 1992;
Sillito et al., 1995).
The experimental results indicate that natural images
can be used successfully to probe the visual properties of neurons. This method
proved to be successful in obtaining the two-dimensional first-order kernel, or
luminance-contrast feature map, in all simple cells
( Figure 3). The orientation of the luminance
feature map was consistent with the orientation tuning measured with grating
stimuli ( Fig 4c), cells showed parallel
antagonistic subregions, and different spatial scales were evident among the
population. In a couple of simple cells we compared the kernels obtained via
stimulation with natural image sequences with those measured using subspace
reverse correlation
( Ringach et al., 1997b) and the results
were similar. These findings, together with the simulation results, suggest
that the proposed method works as expected.
Some, but not all, complex cells showed
luminance-contrast kernels, indicating that such neurons did receive excitatory
input from
“first-order”
neural mechanisms. Thus, the analysis of the responses to the movie sequences
verifies that they can be used to give us a two-dimensional spatial map of the
receptive field. Of course, the results shown here are only a snapshot at a
single time frame. The method can be extended to provide the full
spatio-temporal kernel by having the input to the algorithm represent a recent
spatio-temporal volume of the feature map sequence. This would require the
estimates of more parameters and, as a consequence, more data to obtain a
reliable answer.
Using the new method, we were able to demonstrate
contrast invariant edge kernels in both simple and complex cells. Contrast
invariant edge kernels in simple cells have not been previously described. The
model cell described in Figure 2 did not show
suppressive regions in the edge kernel in response to stimulation with the movie
sequences. In some conditions, the model receptive field in
Figure 2 did predict a single excitatory
region in the edge map kernel, centered between the two subfields of the
luminance contrast kernel. Therefore, we can conclude that an oriented linear
filter with a threshold predicts most of the luminance and part of the edge
kernel in V1 simple cells, but does not predict both the position of some
excitatory regions and the suppressive regions. The analysis of complex cell
kernels into orientation specific edge response indicates that while the
excitatory region arises from the preferred orientation of the cell (as measured
by gratings), some of the antagonistic regions arise from orthogonal
orientations ( Figure 4a). We believe that
this nonlinear suppression represents a novel feature of the receptive field
organization whose spatial extent and orientation tuning have not been
previously characterized.
The method proposed here allows for the calculation of
the neuron’s kernels with respect to different feature maps, such as a
luminance map and the
“edge ”
map ( Figure 3). We noted a number of cases
where the neural response was correlated with
both maps. In principle, such a result
could be simply due to the fact that the maps themselves are correlated. A
principled way to deal with correlated feature maps is dictated by linear
regression theory. If “main
effects ” are found with
respect to two different feature maps,
 and
 , a next step
would be to build a compound model by defining a new feature map that represents
the concatenation of
 and
 ,
 , and run the same
algorithm which will compute new kernels with respect to these two maps taking
into account any possible cross-correlations. If the feature maps are
approximately orthogonal (i.e., uncorrelated), the resulting kernels are clearly
the same as those obtained by doing a regression on each feature map
individually. This is the case in this study, as the maximal cross-covariance
between the luminance-contrast and
“edge ”
map was very small, 0.04, meaning that the maps are nearly orthogonal. As a
consequence, estimating the maps separately is justified in our case. We note
that the responses of cells to image attributes other than luminance contrast is
consistent with previous data showing that cortical cells may respond to image
boundaries that are not defined by luminance cues alone, such as illusory
contours
( Grosof, Hawken, & Shapley, 1993) and
second order motion
( Mareschal & Baker, 1999). Thus,
we do not believe this phenomenon arises only when using natural image
stimulation.
We envision similar
techniques to the “feature
map ” approach proposed here
as potential tools in psychophysical research. In the response classification
images method ( Beard & Ahumada, 1998),
the noise is uncorrelated in space. To be able to use correlated noise (such as
bandpass filtered white noise), averaging of the noise samples is not the right
calculation to estimate the kernel (or classification image). Instead, the
average classification images should be premultiplied by the inverse of the
noise cross-correlation matrix, as is effectively done by the algorithm in this
study. Also, multiple features (besides the luminance of the images) may
mediate performance in a particular psychophysical task. The technique
described here could allow the investigator to explore such dependencies.
It is unknown at present if the kernels obtained using
natural image stimulation are identical to or different from kernels derived
from other stimulus ensembles, such as bars, spots of light, or sinusoidal
gratings ( Ringach et al., 1997b). The
data we have collected so far indicate that mapping receptive fields with
subspace reverse correlation (which uses spatial grating stimuli) and with
natural image sequences yields similar results
( Fig 4). A detailed comparison, however,
requires a larger data set than the one we now have. It is also unknown if
other feature maps, involving more elaborate two-dimensional features, such as
corners and junctions, would be better correlated with the responses of some
neurons. We plan to exploit the method to address these interesting questions
in future work.
It will also be important in the future to address some
of the weaknesses of the technique. In the present experiments we presented
only one trial per movie segment. In part, this was due to the fact that it was
unknown in these initial experiments using natural image stimulation, how much
data would be required to estimate the receptive fields. Because the firing
rates are relatively low, it is difficult to obtain from such data sets reliable
estimates of the instantaneous firing rate of the neuron as well as the noise in
its response. This is unfortunate, as these numbers are required to calculate
the amount of response variance explained by each of the kernels. To do so, it
will be necessary to measure a number of repeats for each trial. Another
weakness of the present approach is that the linear model in
equation (1) is not entirely satisfactory
as it ignores nonlinear operations that we know are present in simple cells of
V1, such a cortical gain control
( Carandini et al., 1997). Identifying
nonlinear models from the responses of neurons to natural stimulation is one
area for future research.
Theoretical studies have argued that a critical
component in understanding how the brain processes sensory information is to
investigate the statistical properties of the signals encountered in the natural
environment ( Field, 1987;
Tolhurst, Tadmor, & Chao, 1992;
Olshausen & Field, 1996;
Olshausen & Field, 1997;
Dong & Atick, 1996;
Bell & Sejnowski, 1997;
van Hateren, 1998). A complementary
line of research is to explore how the cortex processes this particular ensemble
of signals. The use of natural stimuli to study the physiology of the visual
system has up to now been limited
( van Hateren, 1987;
Dan, Atick, & Reid, 1996;
Baddeley et al., 1997;
Gallant, Connor, & van Essen, 1998).
Here we showed, for the first time that it is experimentally feasible to measure
the receptive field structure of visual neurons from their responses to natural
image sequences. This methodology may pave the way to evaluating the
similarities and differences in visual cortical processing when the cortex is
faced with stimulus ensembles of varying complexity. The method may also
generalize the classification image technique so that correlated noise and
multiple feature maps can be used in the study of human psychophysical
performance.
In this paper we restrict the analysis and attempt to
predictthe response at time
 from the feature
map at time  . The response at
time  was defined as
the total number of spikes in the segment. We picked a window width of
 ms and used a
fixed delay  ms (this is the
average delay in our population of V1 cells
( Ringach et al., 1997a). The central
portion of the feature map was subsampled on a square grid of 17
× 17. The visual area
represented by this grid was varied from cell to cell to make sure it covered
their receptive fields. These data were arranged in a column data vector,
u( t),
having 289 entries.
A variant of the recursive least-squares (RLS)
algorithm was implemented in Matlab (Mathworks, Natick, MA) to process the data.
The analysis was run on an SGI Onyx 2. The algorithm is described in Table 13.2
in Haykin (1991)). Essentially, it
consists of two main steps: forward prediction and adaptation. The forward
prediction stage is when the present estimate of the kernel is used to predict
the neuron's response and errors in prediction are computed. The adaptation
step is when the kernel estimate is updated with a correction factor to bring
the estimate closer to the true kernel. It is in the computation of the
correction factor that the correlations in the image statistics enter the
algorithm. The mathematical derivation of the algorithm can be found in
Haykin (1991). Pseudo code follows:
Figure 7.
Pseudo-code of a modified RLS algorithm used to compute the optimal linear
kernels in this study.
Here, the variables have been discretized in space and
time:
I( i,j,n)
represents the image at location ( i,j)
for the n-th stimulus frame in the
movie sequence and similarly for the other variables. The variable
w( n)
is an  vector
representing the estimate of the weights at time step
n. When we begin the process we have
no data, so we set the initial value of
w to zero (line 1).
N is the total number of parameters to
be estimated. In our case we have N =
289 parameters.
P( n)
is an  matrix
representing a recursive estimate of the inverse of the correlation matrix, and
 is a small
number. Two modifications were done to the standard RLS algorithm. First, we
added a recursive estimate of the mean response of the cell µ that is
subtracted from the response
r( n)
at each step (lines 8 and 9). This is done to factor out slow trends in the
excitability of the neuron, as we are only interested in explaining departures
in the response of the cell from its mean. The forgetting factor
 corresponds to a
time constant of ≈ 6 s. A second
modification is the spatial smoothing of the estimated coefficients in lines 18
and 19. The standard RLS algorithm does not include any knowledge about the
spatial relationship between the different coordinates. Here, we chose to
smooth the estimates with a
 pixel Gaussian
kernel every Q = 450 frames (equivalent
to 30 s of video). The smoothing kernel had a time-varying width given by
 pixels. Our
simulations indicated that adding this sort of
“annealing ”
smoothing step increases the convergence rate of the algorithm.
There are important convergence results of the RLS
algorithm that are worth mentioning here
( Haykin, 1991). First, the estimate of
w is
expected to converge on the mean. In other words, the estimation of the
receptive field is unbiased. Second, the variance of the prediction error
converges to the variance of the noise in the system, i.e., under the
assumption that the response of the neuron is contaminated by independent noise,
the variance of the response prediction and the true response are equal. Thus,
we are guaranteed that the model in
Equation (1) will match both the mean and
variance of the neural response.
One way to experimentally investigate the convergence
of the algorithm when the true value of
w is unknown (such as when we apply the
algorithm to real data) consists in calculating the relative change in the norm
of w after
Q steps of the RLS
algorithm:  | (4) |
The magnitude of changes in
w will never decrease beyond a lower
bound set by the noise in the system. Thus, we expect
 to decrease and
asymptote at some finite value. At this point we considered the algorithm to
have converged on the mean. After this time the values of
w( n)
may be averaged to yield more accurate estimates. In the population of cells
studied the algorithm converged, on average, after 15 minutes of video.
Finally, in those cases where the calculation of the feature map required an
estimate of the gradient, Sobel operators were used
( Pratt, 1991).
This research was supported by National Institutes of
Health Grants EY-12816 (D.L.R.), EY-08300 (M.J.H.), and EY-01472 (R.S.), and a
Sloan Foundation grant to New York University in support of their Theoretical
Neuroscience Program. Commercial relationships: None.
Andrews, B. W., &
Pollen, D. A. (1979). Relationship between spatial frequency selectivity and
receptive field profile of simple cells.
Journal of Physiology,
287, 163-176.
[PubMed]
Beard, B. L., & Ahumada,
A. J. (1998). A technique to extract relevant image features for visual tasks.
Human Vision and Electronic Imaging III, SPIE
Proceedings, 3299, 79-85.
Baddeley, R., Abbott, L.
F., Booth, M. C., Sengpiel, F., Freeman, T., Wakeman, E. A., & Rolls, E. T.
(1997). Responses of neurons in primary and inferior temporal visual cortices to
natural scenes. Proceedings of the Royal
Society of London. Series B: Biological Sciences,
264, 1775-1783.
[PubMed]
Bell, A. J., & Sejnowski,
T. J. (1997). The "independent components" of natural scenes are edge filters.
Vision Research,
37, 3327-3338.
[PubMed]
Carandini, M., Heeger, D.
J., & Movshon, J. A. (1997). Linearity and normalization in simple cells of
the macaque primary visual cortex. Journal of
Neuroscience, 17, 8621-8644.
[PubMed]
Chen, C. C., Kasamatsu, T.,
Polat, U., & Norcia, A. M. (2001). Contrast response characteristics of
long-range interactions in cat striate cortex.
Neuroreport, 12, 655-661.
[PubMed]
Dan, Y., Atick, J. J., &
Reid, R. C. (1996). Efficient coding of natural scenes in the lateral geniculate
nucleus: Experimental test of a computational theory.
Journal of Neuroscience,
16, 3351-3356.
[PubMed]
DeBoer, E., & Kuyper, P.
(1968). Triggered correlation. IEEE
Transactions on Biomedical Engineering
15, 169-179.
[PubMed]
DeValois, R., &
DeValois, K. K. (1988). Spatial Vision. New York: Oxford University Press.
DeValois, R. L., Albrecht,
D. G., & Thorell, L. G. (1982). Spatial frequency selectivity of cells in
macaque visual cortex. Vision Research,
22, 545-559.
[PubMed]
DiCarlo, J. J., Johnson, K.
O., & Hsiao, S. S. (1998). Structure of receptive fields in area 3b of
primary somatosensory cortex in the alert monkey.
Journal of Neuroscience, 18, 2626-2645.
[PubMed]
Dong, D. W., & Atick, J. J.
(1996). Statistics of natural time-varying image.
Network Computation in Neural Systems,
6, 345-358.
Field, D. J. (1987). Relations
between the statistics of natural images and the response properties of cortical
cells. Journal of the Optical Society of
America Am A, 4, 2379-2394.
[PubMed]
Gallant, J. L., Connor, C.
E., & van Essen, D. C. (1998). Neural activity in areas V1, V2 and V4 during
free viewing of natural scenes compared to controlled viewing.
Neuroreport,
9, 2153-2158.
[PubMed]
Golub, G. H., & van Loan,
C. F. (1989). Matrix computations. Baltimore: Johns Hopkins University
Press.
Grosof, D. H., Hawken, M. J.,
& Shapley, R. M. (1993). Macaque V1 neurons can signal
‘illusory’ contours.
Nature,
365, 550-552.
[PubMed]
Haykin, S. (1991). Adaptive
Filter Theory (2nd. ed.). Prentice-Hall, 2nd edition.
Hubel, D. H., & Wiesel, T.
N. (1968). Receptive fields and functional architecture of monkey striate
cortex. Journal of Physiology (London),
195, 215-245.
[PubMed]
Johnson, E. N., Hawken, M.
J., & Shapley, R. M. (2001). The spatial transformation of color in the
primary visual cortex of the macaque monkey.
Nature Neuroscience, 4, 409-416.
[PubMed]
Jones, J. P., & Palmer, L.
A. (1987). The two-dimensional spatial structure of simple receptive fields in
the cat striate cortex. Journal of
Neurophysiology, 58, 1187-1258.
[PubMed]
Kapadia, M. K., Westheimer,
G., & Gilbert, C. D. (2000). Spatial distribution of contextual
interactions in primary visual cortex and in visual perception.
Journal of Neurophysiology, 84,
2048-2062.
[PubMed]
Knierim, J. J., & van
Essen, D. C. (1992). Neuronal responses to static texture patterns in area V1 of
the alert macaque monkey. Journal of
Neurophysiology, 67, 961-980.
[PubMed]
Lee, Y.W., & Schetzen, M.
(1965). Measurement of the Wiener kernels of a nonlinear system by
cross-correlation. International Journal of
Control, 2, 237-254.
Levitt, J. B., & Lund, J.
S. (1997). Contrast dependence of contextual effects in primate visual cortex.
Nature, 387, 73-76.
[PubMed]
Mareschal, I., &
Baker, C. L., Jr. (1999). Cortical processing of second-order motion.
Visual Neuroscience, 16, 527-540.
[PubMed]
Marmarelis, P. N., &
Marmarelis, V. Z. (1978). Analysis of Physiological Systems: The White Noise
Approach. New York: Plenum Press.
Morrone, M. C., Burr, D. C.,
& Maffei, L. (1982). Functional implications of cross-orientation inhibition
of cortical visual cells. I. Neurophysiological evidence.
Proceedings of the Royal Society of London.
Series B: Biological Sciences, 216, 335-354.
[PubMed]
Movshon, J. A., Thompson,
I. D., & Tolhurst, D. J. (1978a). Spatial summation in the receptive fields
of simple cells in the cat's striate cortex.
Journal of Physiology (London), 283,
53-77.
[PubMed]
Movshon, J. A., Thompson,
I. D., & Tolhurst, D. J. (1978b). Receptive field organization of complex
cells in the cat's striate cortex. Journal of
Physiology (London), 283, 79-99.
[PubMed]
Olshausen, B. A., &
Field, D. J. (1996). Emergence of simple-cell receptive field properties by
learning a sparse code for natural images.
Nature, 381, 607-609.
[PubMed]
Olshausen, B. A., &
Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy
employed by {V1}. Vision Research, 37,
3311-3325.
[PubMed]
Parker, A. J., & Hawken,
M. J. (1988). Two-dimensional spatial structure of receptive fields in monkey
striate cortex. Journal of the Optical Society
of America A, 5, 598-605.
[PubMed]
Pratt, W. K. (1991). Digital
Image Processing (2nd. ed.). New York: John Wiley & Sons.
Rieke, F., Warland, D., van
Steveninck, R., & Bialek, W. (1997). Spikes. Boston: Massachusetts Institute
of Technology Press.
Ringach, D. L., Hawken, M.
J., & Shapley, R. (1997a). Dynamics of orientation tuning in macaque primary
visual cortex. Nature, 387, 281-284.
[PubMed]
Ringach, D. L., Sapiro, G.,
& Shapley, R. (1997b). A subspace reverse correlation technique for the
study of visual neurons. Vision
Research, 37, 2455-2464.
[PubMed]
Sceniak, M. P., Ringach, D.
L., Hawken, M. J., & Shapley, R. (1999). Contrast's effect on spatial
summation by V1 neurons. Nature Neuroscience,
2, 733-739.
[PubMed]
Sigman, M., Cecchi, G. A.,
Gilbert, C. D., & Magnasco, M. O. (2001). On a common circle: Natural
scenes and Gestalt rules. Proceedings of the
National Academy of Sciences of the United States of America, 98,
1935-1940.
[PubMed]
Sillito, A. M., Grieve, K.
L., Jones, H. E., Cudeiro, J., & Davis, J. (1995). Visual cortical
mechanisms detecting focal orientation discontinuities.
Nature, 378, 439-440.
[PubMed]
Skottun, B. C., DeValois, R.
L., Grosof, D. H., Movhson, J. A., Albercht, D. G., & Bonds, A. B. (1991).
Classifying simple and complex cells on the basis of response modulation.
Vision Research, 31, 1079-1086.
[PubMed]
Spitzer, H., &
Hochstein, S. (1985). A complex-cell receptive-field model.
Journal of Neurophysiology, 53,
1266-1286.
[PubMed]
Szulborski, R. G., &
Palmer, L. A. (1990). The two-dimensional spatial structure of nonlinear
subunits in the receptive fields of complex cells.
Vision Research, 30, 249-254.
[PubMed]
Theunissen, F. E., David,
S. V., Singh, N. C., Hsu, A., Vinje, W. E., & Gallant, J. L. (2001).
Estimating spatio-temporal receptive fields of auditory and visual neurons from
their responses to natural stimuli. Network,
12, 289-316.
[PubMed]
Tolhurst, D. J., &
Dean, A. F. (1990). The effects of contrast on the linearity of the spatial
summation of simple cells in the cat's striate cortex.
Experimental Brain Research, 79,
582-588.
Tolhurst, D. J., &
Heeger, D. J. (1997). Comparison of contrast-normalization and threshold models
of the responses of simple cells in cat striate cortex.
Visual Neuroscience, 14, 293-309.
[PubMed]
Tolhurst, D. J., Tadmor,
Y., & Chao, T. (1992). Amplitude spectra of natural images.
Ophthalmic and Physiological Optics, 12,
229-232.
[PubMed]
van Hateren, J. H.
(1987). Processing of natural time series of intensities by the visual system of
the blowfly. Journal of the Optical Society of
America A, 37, 3407-3416.
van Hateren, J. H., &
van der Schaaf, A. (1998). Independent component filters of natural images
compared with simple cells in primary visual cortex.
Proceedings of the Royal Society of London B,
265, 359-366.
[PubMed]
Victor , J. D. (1992).
Nonlinear systems analysis in vision: Overview of kernel methods. In R. Pinter,
B. Nabet (Eds.), Nonlinear Vision: Determination of Neural Receptive Fields,
Function and Networks, (Vol. 1, pp. 1-37). Cleveland: CRC Press.
Walker, G. A., Ohzawa, I.,
& Freeman, R. D. (1999). Asymmetric suppression outside the classical
receptive field of the visual cortex. Journal
of Neuroscience, 19, 10536-10553.
[PubMed]
Walker, G. A., Ohzawa, I.,
& Freeman, R. D. (2000). Suppression outside the classical cortical
receptive field. Visual Neuroscience, 17,
369-379.
[PubMed]
Zar, J. H. (1996).
Biostatistical Analysis (3rd ed.). Englewood Cliffs, NJ: Prentice-Hall.
Zipser , K., Lamme, V. A.,
& Schiller, P. H. (1996). Contextual modulation in primary visual cortex.
Journal of Neuroscience, 16, 7376-7389.
[PubMed]
|
|