Philipp Thölke’s research while affiliated with Pompeu Fabra University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (11)


Fig 1. Topographical plots showing the spatial distribution of T-values (caffeine-placebo) and classification decoding accuracy (SVM and LDA) based on EEG power spectral density features across different frequency bands, compensating for changes in the aperiodic component (see Section 3.5). The T-values and the decoding accuracy values were tested for statistical significance using permutation tests. Grey dots represent significance at p<0.05 and white dots at p<0.01, all corrected for multiple comparisons using the maximum statistics.
Fig 4. Visualization of feature importance in the random forests trained on a feature space which combines all feature types across all channels (11 features × 20 channels = 220 total). The bar-plots show a ranking of all input dimensions, colored according to the corresponding feature. Warm colors (red to yellow) show spectral power (corrected) in five frequency bands, while cold colors (blue and green) correspond to complexity and criticality measures. The topographical maps depict the spatial distribution of feature importance. 1000 random forests were trained per sleep stage (NREM and REM), here we show the average feature importance across all models from one sleep stage. Higher values indicate higher importance of a feature at a specific channel.
Caffeine induces age-dependent increases in brain complexity and criticality during sleep
  • Preprint
  • File available

June 2024

·

63 Reads

Philipp Thölke

·

Maxine Arcand-Lavigne

·

·

[...]

·

Caffeine is the most widely consumed psychoactive stimulant worldwide. Yet important gaps persist in understanding its effects on the brain, especially during sleep. We analyzed sleep EEG in 40 subjects, contrasting 200mg of caffeine against a placebo condition, utilizing inferential statistics and machine learning. We found that caffeine ingestion led to an increase in brain complexity, a widespread flattening of the power spectrum's 1/f-like slope, and a reduction in long-range temporal correlations. Being most prominent during non-REM sleep, these results suggest that caffeine shifts the brain towards a critical regime and more diverse neural dynamics. Interestingly, this was more pronounced in younger adults (20-27 years) compared to middle-aged participants (41-58 years) whose sleep brain dynamics were less affected by caffeine. Interpreting these data in the light of modeling and empirical work on EEG-derived measures of excitation-inhibition balance provides novel insights into the effects caffeine has on the sleeping brain.

Download


Divergent Creativity in Humans and Large Language Models

May 2024

·

8,236 Reads

·

2 Citations

The recent surge in the capabilities of Large Language Models (LLMs) has led to claims that they are approaching a level of creativity akin to human capabilities. This idea has sparked a blend of excitement and apprehension. However, a critical piece that has been missing in this discourse is a systematic evaluation of LLM creativity, particularly in comparison to human divergent thinking. To bridge this gap, we leverage recent advances in creativity science to build a framework for in-depth analysis of divergent creativity in both state-of-the-art LLMs and a substantial dataset of 100,000 humans. We found evidence suggesting that LLMs can indeed surpass human capabilities in specific creative tasks such as divergent association and creative writing. Our quantitative benchmarking framework opens up new paths for the development of more creative LLMs, but it also encourages more granular inquiries into the distinctive elements that constitute human inventive thought processes, compared to those that can be artificially generated.


TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations

February 2024

·

32 Reads

·

24 Citations

Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in the TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for TensorNet models, with performance gains ranging from 2-fold to 10-fold over previous iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and the smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.



Comparison of simulated and experimental protein structures
Structures obtained from CG simulations of the protein-specific model (orange) and the multi-protein model (blue), compared to their respective experimental structures (gray). Structures were sampled from the native macrostate, which was identified as the macrostate containing the conformation with the minimum RMSD with respect to the experimental crystal structure. Ten conformations were sampled from each conformational state (visualized as transparent shadows) and the lowest RMSD conformation of each macrostate is displayed in cartoon representation, reconstructing the backbone structure from α-carbon atoms. The native conformation of each protein, extracted from their corresponding crystal structure is shown in opaque gray. The text indicates the protein name and PDB ID for the experimental structure. WW-Domain and NTL9 results for the multi-protein model are not shown, as the model failed to recover the experimental structures. The statistics of native macrostates are included in Table 2.
Trajectory analysis of protein dynamics
Three individual CG trajectories selected from validation MD of Trp-Cage, WW-Domain, and Protein G. Each visualized simulation, colored from purple to yellow, explores the free energy surface, accesses multiple major basins and transitions among conformations. Top panels: 100 states sampled uniformly from the trajectory plotted over CG free energy surface, projected over the first two time-lagged independent components (TICs) for Trp-Cage (a), WW-Domain (b), and Protein G (c). The red line indicates the all-atom equilibrium density by showing the energy level above the free energy minimum with the value of 7.5 kcal/mol. The experimental structure is marked as a red star. Bottom panels: Cα-RMSD of the trajectory with reference to the experimental structure for Trp-Cage (d), WW-Domain (e), and Protein G (f). Source data are provided as a Source data file.
Free energy surface comparison across all-atom reference and coarse-grained models
Comparison between the reference MD (left), protein-specific model (center), and multi-protein model (right) coarse-grained simulations free energy surface across the first two TICA dimensions for each protein. The free energy surface for each simulation set was obtained by binning over the first two TICA dimensions, dividing them into a 80 × 80 grid, and averaging the weights of the equilibrium probability in each bin computed by the Markov state model. The red triangles indicate the experimental structures. The red line indicates the all-atom equilibrium density by showing the energy level above free energy minimum with the values of 9 kcal/mol for Villin and α3D, 6 kcal/mol for NTL9, and 7.5 kcal/mol for the remaining proteins. Source data are provided as a Source data file.
Free energy surface and structural analysis of Protein G simulations
a Free energy surface of Protein G over the first two TICs for the all-atom MD simulations (top) and the coarse-grained simulations (bottom) using the protein-specific model. The circles identify different relevant minima (yellow—native, magenta—misfolded, cyan—partially folded, red—random coil). b The propensity of all the secondary structural elements of Protein G across the different macrostates, estimated using an RMSD threshold of 2 Å for each structural element shown in the x-axis. c Sampled conformations from the macrostates of coarse-grained simulations corresponding to the marked minima in the free energy surfaces in (a). Sampled structure colors correspond to the minima colors in the free energy surface plot, with blurry lines of the same color showing additional conformations from the same state. Arrows represent the main pathways leading from the random coil to the native structure with the corresponding percentages of the total flux of each pathway. Source data are provided as a Source data file.
Machine learning coarse-grained potentials of protein thermodynamics

September 2023

·

460 Reads

·

69 Citations

A generalized understanding of protein dynamics is an unsolved scientific problem, the solution of which is critical to the interpretation of the structure-function relationships that govern essential biological processes. Here, we approach this problem by constructing coarse-grained molecular potentials based on artificial neural networks and grounded in statistical mechanics. For training, we build a unique dataset of unbiased all-atom molecular dynamics simulations of approximately 9 ms for twelve different proteins with multiple secondary structure arrangements. The coarse-grained models are capable of accelerating the dynamics by more than three orders of magnitude while preserving the thermodynamics of the systems. Coarse-grained simulations identify relevant structural states in the ensemble with comparable energetics to the all-atom systems. Furthermore, we show that a single coarse-grained potential can integrate all twelve proteins and can capture experimental structural features of mutated proteins. These results indicate that machine learning coarse-grained potentials could provide a feasible approach to simulate and understand protein dynamics.


Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data

June 2023

·

99 Reads

·

81 Citations

NeuroImage

Machine learning (ML) is increasingly used in cognitive, computational and clinical neuroscience. The reliable and efficient application of ML requires a sound understanding of its subtleties and limitations. Training ML models on datasets with imbalanced classes is a particularly common problem, and it can have severe consequences if not adequately addressed. With the neuroscience ML user in mind, this paper provides a didactic assessment of the class imbalance problem and illustrates its impact through systematic manipulation of data imbalance ratios in (i) simulated data and (ii) brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI). Our results illustrate how the widely-used Accuracy (Acc) metric, which measures the overall proportion of successful predictions, yields misleadingly high performances, as class imbalance increases. Because Acc weights the per-class ratios of correct predictions proportionally to class size, it largely disregards the performance on the minority class. A binary classification model that learns to systematically vote for the majority class will yield an artificially high decoding accuracy that directly reflects the imbalance between the two classes, rather than any genuine generalizable ability to discriminate between them. We show that other evaluation metrics such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC), and the less common Balanced Accuracy (BAcc) metric - defined as the arithmetic mean between sensitivity and specificity, provide more reliable performance evaluations for imbalanced data. Our findings also highlight the robustness of Random Forest (RF), and the benefits of using stratified cross-validation and hyperprameter optimization to tackle data imbalance. Critically, for neuroscience ML applications that seek to minimize overall classification error, we recommend the routine use of BAcc, which in the specific case of balanced data is equivalent to using standard Acc, and readily extends to multi-class settings. Importantly, we present a list of recommendations for dealing with imbalanced data, as well as open-source code to allow the neuroscience community to replicate and extend our observations and explore alternative approaches to coping with imbalanced data.


Machine Learning Coarse-Grained Potentials of Protein Thermodynamics

December 2022

·

112 Reads

A generalized understanding of protein dynamics is an unsolved scientific problem, the solution of which is critical to the interpretation of the structure-function relationships that govern essential biological processes. Here, we approach this problem by constructing coarse-grained molecular potentials based on artificial neural networks and grounded in statistical mechanics. For training, we build a unique dataset of unbiased all-atom molecular dynamics simulations of approximately 9 ms for twelve different proteins with multiple secondary structure arrangements. The coarse-grained models are capable of accelerating the dynamics by more than three orders of magnitude while preserving the thermodynamics of the systems. Coarse-grained simulations identify relevant structural states in the ensemble with comparable energetics to the all-atom systems. Furthermore, we show that a single coarse-grained potential can integrate all twelve proteins and can capture experimental structural features of mutated proteins. These results indicate that machine learning coarse-grained potentials could provide a feasible approach to simulate and understand protein dynamics.


Class imbalance should not throw you off balance: Choosing classifiers and performance metrics for brain decoding with imbalanced data

July 2022

·

50 Reads

·

3 Citations

Machine learning (ML) is becoming a standard tool in neuroscience and neuroimaging research. Yet, because it is such a powerful tool, the appropriate application of ML requires a sound understanding of its subtleties and limitations. In particular, applying ML to datasets with imbalanced classes, which are very common in neuroscience, can have severe consequences if not adequately addressed. With the neuroscience machine-learning user in mind, this technical note provides a didactic overview of the class imbalance problem and illustrates its impact through systematic manipulation of class imbalance ratios in both simulated data, and real electroencephalography (EEG) and magnetoencephalography (MEG) brain data. Our results illustrate how in highly imbalanced data, the commonly used Accuracy (Acc) metric yields misleadingly high performances by preferentially predicting the majority class, while other evaluations metrics (e.g. Balanced Accuracy (BAcc) and the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC)) may still provide reliable performance evaluations. In terms of classifiers and cross-validation schemes, our data highlights the higher robustness of Random Forest (RF) and Stratified K-Fold cross-validation, compared to the other approaches tested. Critically, for neuroscience ML applications that seek to minimize overall classification error (not preferentially that of a single class), we recommend the routine use of BAcc, rather than the simple and more commonly used Acc metric. Importantly, we provide a best practices list of recommendations for dealing with imbalanced data, and open-source code to allow the neuroscience community to replicate our observations and further explore the best practices in handling imbalanced data.


TorchMD-NET: Equivariant Transformers for Neural Network based Molecular Potentials

February 2022

·

96 Reads

·

9 Citations

The prediction of quantum mechanical properties is historically plagued by a trade-off between accuracy and speed. Machine learning potentials have previously shown great success in this domain, reaching increasingly better accuracy while maintaining computational efficiency comparable with classical force fields. In this work we propose TorchMD-NET, a novel equivariant transformer (ET) architecture, outperforming state-of-the-art on MD17, ANI-1, and many QM9 targets in both accuracy and computational efficiency. Through an extensive attention weight analysis, we gain valuable insights into the black box predictor and show differences in the learned representation of conformers versus conformations sampled from molecular dynamics or normal modes. Furthermore, we highlight the importance of datasets including off-equilibrium conformations for the evaluation of molecular potentials.


Citations (6)


... 32 Data Loading 33 The MEEGNet library provides a robust data-loading functionality through a 34 hierarchical class structure featuring two primary classes: EpochedDataset and 35 ContinuousDataset. The EpochedDataset class is optimized for event-based data, 36 where recordings are segmented around specific events, such as stimulus presentations. 37 In contrast, the ContinuousDataset class handles continuous data without event 38 markers, such as resting-state recordings. ...

Reference:

MEEGNet: An open source python library for the application of convolutional neural networks to MEG
Neuro-GPT: Towards A Foundation Model For EEG
  • Citing Conference Paper
  • May 2024

... Advancements in neural networks are illustrated by multiple innovative architectures and frameworks designed for various applications. For instance, the evolution of TorchMD-Net has led to a more versatile framework that enhances molecular simulations through the adoption of neural network-based potentials [11]. In cryptocurrency markets, a transformative approach utilizing transformer neural networks and technical indicators significantly enhances price prediction capabilities [12]. ...

TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations
  • Citing Article
  • February 2024

... Furthermore, coarse grained NNPs can also be trained and used to run coarse-grained molecular dynamics simulations that ignore irrelevant parts of a systems, enabling additional orders of magnitude acceleration [22][23][24][25][26]. The simplest example of a coarse grained NNP is a continuum solvent NNP. ...

Machine learning coarse-grained potentials of protein thermodynamics

... The final decoding performance was obtained by averaging the results across these five iterations. We used balanced accuracy as the performance metric, defined as the arithmetic mean of sensitivity and specificity, as it provides a more robust evaluation for potentially imbalanced data [53]. To ensure that decoding results were not biased by a specific pseudo-trial selection, we repeated the entire procedure 25 times per participant and averaged the balanced accuracy over these repetitions for statistical testing. ...

Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data
  • Citing Article
  • June 2023

NeuroImage

... This is a big problem. To overcome this problem, the stratified 10-fold cross validation technique, which allows the use of data by preserving the sample percentages of each class of the COVID-19 S protein datasets, has been also preferred (Thölke et al., 2022;Mbow et al., 2021). As seen in Table 6, the total amounts of the Training and Testing dataset for the holdout technique by years have been shown. ...

Class imbalance should not throw you off balance: Choosing classifiers and performance metrics for brain decoding with imbalanced data
  • Citing Preprint
  • July 2022

... The fifth MLIP is Gaussian Approximation Potential, 75 which addresses the gap between models that explicitly treat electrons and those that do not, aiming to accurately model the Born-Oppenheimer potential energy surface (PES) without simulating electrons, such as FCHL19, 178 GMNN, 179 and so forth. Finally, this study introduces TorchMD-NET, 180 a novel equivariant Transformer (ET) architecture for MLIP, notably overcoming the traditional trade-off between accuracy and computational speed. The model sets a standard in accuracy and efficiency, outperforming existing methods on the MD17, ANI-1, and QM9 datasets. ...

TorchMD-NET: Equivariant Transformers for Neural Network based Molecular Potentials
  • Citing Preprint
  • February 2022