Yagmur Gucluturk’s research while affiliated with Radboud University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (15)


Figure 4: Reconstruction results. The upper and lower block show ten arbitrary yet representative examples from the B2G dataset (GAN-synthesized stimuli) and GOD dataset (natural stimuli), respectively. The top rows display the originally-perceived stimuli, the middle rows the reconstructions by PAM (P) and the bottom rows the reconstructions by the linear decoder baseline (L).
Figure 6: Attention-weighted values by PAM. The graphs visualize the distribution of 512-dimensional attention weights across the visual areas (V1, V4 and IT for B2G; V1, V2, V3, V4, LOC, and FFA for GOD) for two stimulus examples ('stim'; on the right of the graph). The black lineplot denotes the mean attention per neural area. We can notice a gradual increase of attention from up-to downstream visual areas (more subtle for GOD). Below each label in the graph (x-axis), we visualized the visual information from the corresponding values by feeding them to the generator of the GAN. We then took a weighted combination of the values and the attention weights to obtain the final latent corresponding to the final reconstruction ('recon'; displayed on the right, below the stimulus). For this example from B2G, particularly V4's visualized value seems to resemble the stimulus. And also for the example from GOD, the warm colors and the dotted pattern from the panther seem to be reflected in the reconstructed value of V4 but not necessarily in the final reconstruction itself.
PAM: Predictive Attention Mechanism for Neural Decoding of Visual Perception
  • Preprint
  • File available

June 2024

·

388 Reads

·

·

·

[...]

·

Umut Guclu

Attention mechanisms enhance deep learning models by focusing on the most relevant parts of the input data. We introduce predictive attention mechanisms (PAMs) -- a novel approach that dynamically derives queries during training which is beneficial when predefined queries are unavailable. We applied PAMs to neural decoding, a field challenged by the inherent complexity of neural data that prevents access to queries. Concretely, we designed a PAM to reconstruct perceived images from brain activity via the latent space of a generative adversarial network (GAN). We processed stimulus-evoked brain activity from various visual areas with separate attention heads, transforming it into a latent vector which was then fed to the GAN's generator to reconstruct the visual stimulus. Driven by prediction-target discrepancies during training, PAMs optimized their queries to identify and prioritize the most relevant neural patterns that required focused attention. We validated our PAM with two datasets: the first dataset (B2G) with GAN-synthesized images, their original latents and multi-unit activity data; the second dataset (GOD) with real photographs, their inverted latents and functional magnetic resonance imaging data. Our findings demonstrate state-of-the-art reconstructions of perception and show that attention weights increasingly favor downstream visual areas. Moreover, visualizing the values from different brain areas enhanced interpretability in terms of their contribution to the final image reconstruction. Interestingly, the values from downstream areas (IT for B2G; LOC for GOD) appeared visually distinct from the stimuli despite receiving the most attention. This suggests that these values help guide the model to important latent regions, integrating information necessary for high-quality reconstructions. Taken together, this work advances visual neuroscience and sets a new standard for machine learning applications in interpreting complex data.

Download

Fig. 5. Primary results of the mobility task in Experiment 1. A) Average number of collisions per trial (N=23, Paired-samples t-test). B) Average trial duration in seconds (N=23, Wilcoxon signed-rank test). *p < 0.0167; **p < 0.003; ***p < 0.0003.
Fig. 7. Example eye-gaze trajectories (raw data) in three trials of a representative participant in Experiment 2. Colors indicate the elapsed time. Each of the visualized trials had a different study condition (from left to right: gaze-locked, gaze-contingent, and gaze-ignored).
Fig. 8. Analysis of the angular eye-and head velocity in Experiment 2. A) Average eye velocity. B) Average head velocity. *p < 0.0167; **p < 0.003; ***p < 0.0003 (N=19, Wilcoxon signed-rank test).
Average age and height (± standard deviation) of the study participants.
Gaze-contingent processing improves mobility performance and visual orientation in simulated head-steered prosthetic vision

September 2023

·

52 Reads

The enabling technology of visual prosthetics for the blind is making rapid progress. However, there are still uncertainties regarding the functional outcomes, which can depend on many design choices in the development. In visual prostheses with a head-mounted camera, a particularly challenging question is how to deal with the gaze-locked visual percept associated with spatial updating conflicts in the brain. A recently proposed compensation strategy is gaze-contingent image processing with eye-tracking, which enables natural visual scanning and reestablished spatial updating based on eye movements. The current study evaluates the benefits of gaze-contingent processing versus gaze-locked and gaze-ignored simulations in the context of mobility and orientation, using a simulated prosthetic vision paradigm with sighted subjects. Compared to gaze-locked vision, gaze-contingent processing was found to improve the speed in all experimental tasks, as well as the subjective quality of vision. Similar or further improvements were found in a control condition that ignores gaze-depended effects, a simulation that is unattainable in the clinical reality. Our results suggest that gaze-locked vision and spatial updating conflicts can be debilitating for complex visually-guided activities of daily living such as mobility and orientation. Therefore, for prospective users of head-steered prostheses with an unimpaired oculomotor system, the inclusion of a compensatory eye-tracking system is strongly endorsed.


Fig. 2. Performance as a function of resolution and number of phosphenes. The data is based on 5 runs of 1540 frames per condition, with batch size equal to 1 frame. Simulation was run with an NVIDIA© A30 GPU. Crosses indicate missing conditions.
Fig. 3. Estimate of the relative phosphene brightness for different stimulation amplitudes. The simulator was provided with a stimulation train of 166ms with a pulse width of 170µs at a frequency of 300Hz (see equations 7-10). Left: the predicted peak brightness levels reproduced by our model (red) and psychometric data reported by (6) (light blue). Note that for stimulation amplitudes of 20.0µA and lower, the simulator generated no phosphenes as the threshold for activation was not reached. Right: the modeled tissue activation and brightness response over time. Values below the 50% threshold for the tissue activation and the corresponding brightness values are displayed with dashed lines.
Fig. 5. Relative brightness of a phosphene in response to repeated stimulation, overlaid on experimental results by (23). The stimulation sequence consisted of 50 pulse trains at a four-second stimulation interval, followed by five pulse trains at an interval of four minutes to test recovery. Please notice the split x-axis with variable scaling.
Fig. 6. Schematic illustration of the end-to-end machine-learning pipeline adapted from (7). A convolutional neural network encoder is trained to convert input images or video frames into a suitable electrical stimulation protocol. In the training procedure, our simulator generates a simulation of the expected prosthetic percept, which is evaluated by a second convolutional neural network that decodes a reconstruction of the input image. The quality of the encoding is iteratively optimized by updating the network parameters using back-propagation. Different loss terms can be used to constrain the phosphene encoding, such as the reconstruction error between the reconstruction and the input, a regularization loss between the phosphenes and the input, or a supervised loss term between the reconstructions and some ground-truth labeled data (not depicted here). Note that the internal parameters of the simulator (e.g. the estimated tissue activation) can also be used as loss terms.
Fig. 7. Results of training the end-to-end pipeline on video sequences from the moving MNIST dataset (58). Columns indicate different frames. Top row: the input frames; middle row: the simulated phosphene; bottom row: the decoded reconstruction of the input. This figure is best viewed in digital format.
Biologically plausible phosphene simulation for the differentiable optimization of visual cortical prostheses

December 2022

·

152 Reads

·

1 Citation

Blindness affects millions of people around the world, and is expected to become increasingly prevalent in the years to come. For some blind individuals, a promising solution to restore a form of vision are cortical visual prostheses, which convert camera input to electrical stimulation of the cortex to bypass part of the impaired visual system. Due to the constrained number of electrodes that can be implanted, the artificially induced visual percept (a pattern of localized light flashes, or 'phosphenes') is of limited resolution, and a great portion of the field's research attention is devoted to optimizing the efficacy, efficiency, and practical usefulness of the encoding of visual information. A commonly exploited method is the non-invasive functional evaluation in sighted subjects or with computational models by making use of simulated prosthetic vision (SPV) pipelines. Although the SPV literature has provided us with some fundamental insights, an important drawback that researchers and clinicians may encounter is the lack of realism in the simulation of cortical prosthetic vision, which limits the validity for real-life applications. Moreover, none of the existing simulators address the specific practical requirements for the electrical stimulation parameters. In this study, we developed a PyTorch-based, fast and fully differentiable phosphene simulator. Our simulator transforms specific electrode stimulation patterns into biologically plausible representations of the artificial visual percepts that the prosthesis wearer is expected to see. The simulator integrates a wide range of both classical and recent clinical results with neurophysiological evidence in humans and non-human primates. The implemented pipeline includes a model of the retinotopic organisation and cortical magnification of the visual cortex. Moreover, the quantitative effect of stimulation strength, duration, and frequency on phosphene size and brightness as well as the temporal characteristics of phosphenes are incorporated in the simulator. Our results demonstrate the suitability of the simulator for both computational applications such as end-to-end deep learning-based prosthetic vision optimization as well as behavioural experiments. The modular approach of our work makes it ideal for further integrating new insights in artificial vision as well as for hypothesis testing. In summary, we present an open-source, fully differentiable, biologically plausible phosphene simulator as a tool for computational, clinical and behavioural neuroscientists working on visual neuroprosthetics.


Hyperrealistic neural decoding: Linear reconstruction of face stimuli from fMRI measurements via the GAN latent space

July 2020

·

171 Reads

·

1 Citation

A bstract We introduce a new framework for hyperrealistic reconstruction of perceived naturalistic stimuli from brain recordings. To this end, we embrace the use of generative adversarial networks (GANs) at the earliest step of our neural decoding pipeline by acquiring functional magnetic resonance imaging data as subjects perceived face images created by the generator network of a GAN. Subsequently, we used a decoding approach to predict the latent state of the GAN from brain data. Hence, latent representations for stimulus (re-)generation are obtained, leading to state-of-the-art image reconstructions. Altogether, we have developed a highly promising approach for decoding sensory perception from brain activity and systematically analyzing neural information processing in the human brain. Disclaimer This manuscript contains no real face images; all faces are artificially generated by a generative adversarial network.


Guest Editorial: Image and Video Inpainting and Denoising

May 2020

·

259 Reads

IEEE Transactions on Pattern Analysis and Machine Intelligence

The papers in this special issue comprise all aspects of computer vision and pattern recognition devoted to image and video inpainting, including related tasks like denoising, debluring, sampling, super-resolutkon enhancement, restoration, hallucination, etc. The special issue was associated to the 2018 Chalearn Looking at People Satellite ECCV Workshop1 and the 2018 ChaLearn Challenges on Image and Video Inpainting.



DeepRF: Ultrafast population receptive field mapping with deep learning

August 2019

·

92 Reads

·

7 Citations

Population receptive field (pRF) mapping is an important asset for cognitive neuroscience. The pRF model is used for estimating retinotopy, defining functional localizers and to study a vast amount of cognitive tasks. In a classic pRF, the cartesian location and receptive field size are modeled as a 2D Gaussian kernel in visual space and are estimated by optimizing the fit between observed responses and predicted responses. In the standard framework this is achieved using an iterative gradient descent algorithm. This optimization is time consuming because the number of pRFs to fit (e.g., fMRI voxels) is typically large. This computation time increases further with the complexity of the pRF model (e.g., adding HRF parameters, surround suppression and uncertainty measures). Here, we introduce DeepRF, which uses deep convolutional neural networks to estimate pRFs. We compare the performance of DeepRF with that of the conventional method using a synthetic dataset for which the ground truth is known and an empirical dataset. We show that DeepRF achieves state-of-the-art performance while being more than 3 orders of magnitude faster than the conventional method. This enables easier and faster modeling of more complex pRF models, resolving an important limitation of the conventional approach.


Fig. 1. Representative face images for the highest and lowest levels of each Big-Five trait, obtained from [8]. Images were created by aligning and averaging the faces of 100 unique individuals that had the highest and lowest evaluations for each trait on the ChaLearn First Impression database [9] (we inverted images for Neuroticism trait, i.e., low → high, as they were drawn in [8] from the perspective of Emotion Stability).
Challenges on apparent personality analysis.
First Impressions: A Survey on Vision-Based Apparent Personality Trait Analysis

July 2019

·

2,598 Reads

·

92 Citations

IEEE Transactions on Affective Computing

Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.


Wasserstein Variational Gradient Descent: From Semi-Discrete Optimal Transport to Ensemble Variational Inference

November 2018

·

79 Reads

Particle-based variational inference offers a flexible way of approximating complex posterior distributions with a set of particles. In this paper we introduce a new particle-based variational inference method based on the theory of semi-discrete optimal transport. Instead of minimizing the KL divergence between the posterior and the variational approximation, we minimize a semi-discrete optimal transport divergence. The solution of the resulting optimal transport problem provides both a particle approximation and a set of optimal transportation densities that map each particle to a segment of the posterior distribution. We approximate these transportation densities by minimizing the KL divergence between a truncated distribution and the optimal transport solution. The resulting algorithm can be interpreted as a form of ensemble variational inference where each particle is associated with a local variational approximation.


Explaining First Impressions: Modeling, Recognizing, and Explaining Apparent Personality from Videos

February 2018

·

2 Reads

Explainability and interpretability are two critical aspects of decision support systems. Within computer vision, they are critical in certain tasks related to human behavior analysis such as in health care applications. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of computer vision with an emphasis on looking at people tasks. Specifically, we review and study those mechanisms in the context of first impressions analysis. To the best of our knowledge, this is the first effort in this direction. Additionally, we describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, the evaluation protocol, and summarize the results of the challenge. Finally, derived from our study, we outline research opportunities that we foresee will be decisive in the near future for the development of the explainable computer vision field.


Citations (8)


... One more rigorous approach is to focus on subjective interpretability: through having normally sighted individuals perform perceptual tasks using simulated prosthetic vision [100][101][102] . Another approach is to use simulations as input images for a decoder that is trained to generate a reconstruction of the original input image, as has been done recently using a cortical simulator that approximates some of the same phenomena as our more elaborated model 83 . ...

Reference:

A virtual patient simulation modeling the neural and perceptual effects of human visual cortical stimulation, from pulse trains to percepts
Biologically plausible phosphene simulation for the differentiable optimization of visual cortical prostheses

... A wide range of previous studies has employed SPV with sighted subjects to non-invasively investigate the usefulness of prosthetic vision in everyday tasks, such as mobility (13,14,16,32,33), hand-eye coordination (34), reading (16,35) or face recognition (36,37). Several studies have examined the effect of the number of phosphenes, spacing between phosphenes and the visual angle over which the phosphenes are spread (e.g., (14,34,(38)(39)(40)). ...

Simulating neuroprosthetic vision for emotion recognition
  • Citing Conference Paper
  • September 2019

... These packages might encode coordinates differently, so the exercise becomes translating those coordinate systems to the coordinate system of the scanner. Likewise, approaches such as DeepRF (Thielen et al., 2019) or fast, real-time pRF mapping (Bhat et al., 2021) can reconstruct pRFs based near realtime. These approaches demand significant resources from the software as well as skills from the experimenter. ...

DeepRF: Ultrafast population receptive field mapping with deep learning

... Personality is a psychological construct that describes human behavior in terms of habitual and fairly stable patterns of emotions, thoughts, and attributes [1,2]. Personality is typically characterized by the OCEAN traits typified by the big-five model [3]: Openness (creative vs conservative), Conscientiousness (diligent vs disorganized), Extraversion (social vs aloof), Agreeableness (empathetic vs distant) and Neuroticism (anxious vs emotionally stable). ...

First Impressions: A Survey on Vision-Based Apparent Personality Trait Analysis

IEEE Transactions on Affective Computing

... Recent studies have already shown the effectiveness of kineme patterns for emotional trait prediction [10,11], while acoustic features and facial expressions have been successfully employed for estimating personality attributes [1,12,13] and candidate hireability (suitability to hire/interview later) [14,15]. Examining various LSTM architectures for classification and regression on the diverse FICS [16] and MIT interview [17] datasets, we make the following observations: (i) Both kinemes and AUs achieve explanative trait prediction. (ii) Multimodal approaches leverage cue-complementarity to better predict interview and personality attributes than unimodal ones. ...

Explaining First Impressions: Modeling, Recognizing, and Explaining Apparent Personality from Videos

IEEE Transactions on Affective Computing

... The required multimodal features have to be extracted from the time-synchronised sensor data. For extraction of visual features, state-of-the-art techniques utilize deep learning architectures like VGG-Face [19], FaceNet-1 [20], etc. ...

Visualizing Apparent Personality Analysis with Deep Residual Networks

... • Building upon our initial results [18], we novelly employ kinemes, action units and speech features for the estimation of personality and interview traits. Given the strong correlations among personality and interview traits [16,19], we show that the three behavioral modalities are both predictive and explanative of these traits. We explore distinct strategies for temporally fusing behavioral features. ...

Multimodal First Impression Analysis with Deep Residual Networks
  • Citing Article
  • September 2017

IEEE Transactions on Affective Computing

... Earlier works already used XAI on several subjective tasks. For example, Escalante et al. (2017) developed a challenge to test different explainable systems used for first impression analysis in job applications. Weitz et al. (2019) investigated different XAI methods on facial pain and emotion recognition models. ...

Design of an explainable machine learning challenge for video interviews
  • Citing Conference Paper
  • May 2017