01
Investigating Visual Perception Using Computational Analysis
Perceiving the world around us, including objects and our loved ones, may seem like a straightforward and ordinary task for most people. However, is it truly that simple? When we gaze at a person or an object, our eyes receive only rays of reflected light. So, how does our brain transform these lights into meaningful scenes for us? Upon pondering this question, it becomes evident that the visual perception process is more intricate than we might imagine.
Link to Publication: Population spatial frequency tuning in human early visual cortex
My Research Question

To explain the problem I tackled in my research, let's first take a quick look at the visual perception process and what we know about it. As shown in the figure above, the light reflected from our surroundings reaches our eyes, transforms into electrical signals at the back of the retina, and then travels to a nucleus in the thalamus called LGN. From there the signal is sent to the occipital cortex, near the back of the brain to start its journey through visual processing pathways, with some cortical regions downstream this pathway, light pink oval in the figure above, and some in the upstream, the bright pink circles.
According to the popular hypothesis what happens from there on is a hierarchical processing of the information: the lower cortical areas process the signal in search of simple visual features, and higher cortical areas build upon their responses to recognize what the object is and where it is located.
To put it another way, neurons in the lower stages (i.e. the occipital cortex) act as filters, and their level of activity depends on the basic visual features of what is in sight. For example, some neurons get really active when encountering a horizontally oriented edge in the scene, while others become most active in the presence of, for instance, vertical features. This happens because of their selectivity for line orientation. Another basic image feature they are selective for is spatial frequency*.

*Spatial Frequency (SF): in vision research refers to the rate of changes of intensity across space.
High SF contains fine details with rapid changes, while low SF encompasses broader changes like the overall shapes and contours.
Although the responses of these lower-stage filters have been extensively characterized in animals, not all their aspects have been fully recognized in humans. Direct recordings from neurons in animals enable the mapping of their preferences, but due to the invasive nature of such recordings, they are far from ideal for human investigations. An alternative approach in humans involves using brain scans, with the caveat that instead of obtaining responses from a single neuron, we can record responses from a population of neurons that we call a "voxel."
In my PhD projects, I concentrated on characterizing these voxel-based filters for the basic feature of spatial frequency. I captured brain images using functional magnetic resonance imaging (fMRI) and estimated the characteristics of these filters in the lower visual areas of the human brain. Then, by designing new experiments, I examined how attention during a visual task affects these characteristics.
Each voxel doesn't exclusively respond to a single frequency; instead, it is responsive to a spectrum of frequencies. Characterizing the entire profile of these responses for all the voxels within early visual areas (called V1-V3 regions) using conventional fMRI analysis methods proved inefficient. Hence, we introduced a novel technique grounded in modeling and computational analysis. This more sophisticated approach facilitated a quick and efficient estimation of these characteristics, resulting in the successful creation of comprehensive maps that illustrate the selectivity profiles across the early visual areas.
Our analysis method was built upon the widely accepted assumption that the relationship between the neural response to a stimulus and the recorded fMRI signal, known as the BOLD signal, is linear. Based on linear system theories, we understand that the output of such a system is the convolution of the input with the impulse response function of the system. Therefore, the BOLD response time series could be considered as the convolution of the voxel’s response to an input stimulus with our system’s impulse response function, called the hemodynamic impulse response function (HIRF).
Computational Modeling Approach

We didn't need to measure the HIRF, as it has been measured and modeled in previous studies. The stimuli were also known because they were essentially the spatial frequencies shown to the participants during the experiment. In this equation, the unknown factor was the voxel’s response to different spatial frequencies, which was our primary focus:
(Recorded BOLD Signal) = HIRF*(voxel's response to SFs)
Given that a voxel represents a population of neurons, we termed this response profile as the population spatial frequency tuning (pSFT):
(Recorded BOLD Signal) = HIRF*(pSFT)
Drawing inspiration from animal studies, we modeled the pSFT as a band-pass filter in the form of a log-Gaussian function. The parameters of this function were unknown for all the voxels within the visual area, and our specific aim was to determine them and then investigate how they change across these areas. For each voxel, this model with unknown parameters was convolved with HIRF and then compared with the recorded BOLD signal for the same voxel. Through an optimization procedure using the coefficient of determination, r-squared, and grid search, we found the most plausible parameters (including the mean standard deviation) for the log-Gaussian function that represented the pSFT. Computing these two parameters for all the voxels, then allowed us to create maps of these preferences for each of the visual areas. An example figure is shown below.
