Anton Thielmann’s research while affiliated with Technische Universität Clausthal and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (16)


Mambular: A Sequential Model for Tabular Deep Learning
  • Preprint

August 2024

·

15 Reads

Anton Frederik Thielmann

·

·

·

[...]

·

Soheila Samiee

The analysis of tabular data has traditionally been dominated by gradient-boosted decision trees (GBDTs), known for their proficiency with mixed categorical and numerical features. However, recent deep learning innovations are challenging this dominance. We introduce Mambular, an adaptation of the Mamba architecture optimized for tabular data. We extensively benchmark Mambular against state-of-the-art models, including neural networks and tree-based methods, and demonstrate its competitive performance across diverse datasets. Additionally, we explore various adaptations of Mambular to understand its effectiveness for tabular data. We investigate different pooling strategies, feature interaction mechanisms, and bi-directional processing. Our analysis shows that interpreting features as a sequence and passing them through Mamba layers results in surprisingly performant models. The results highlight Mambulars potential as a versatile and powerful architecture for tabular data analysis, expanding the scope of deep learning applications in this domain. The source code is available at https://github.com/basf/mamba-tabular.


A view on the conducted analysis on two dimensions
Daily post frequency by source
of the framework
Overview of our time series cross validation approach
Test set losses of different LSTM stock prediction models over time

+7

One-way ticket to the moon? An NLP-based insight on the phenomenon of small-scale neo-broker trading
  • Article
  • Full-text available

June 2024

·

47 Reads

Social Network Analysis and Mining

We present an Natural Language Processing based analysis on the phenomenon of “Meme Stocks”, which has emerged as a result of the proliferation of neo-brokers like Robinhood and the massive increase in the number of small-scale stock investors. Such investors often use specific Social Media channels to share short-term investment decisions and strategies, resulting in partial collusion and planning of investment decisions. The impact of online communities on the stock prices of affected companies has been considerable in the short term. This paper has two objectives. Firstly, we chronologically model the discourse on the most prominent platforms. Secondly, we examine the potential for using collaboratively made investment decisions as a means to assist in the selection of potential investments.. To understand the investment decision-making processes of small-scale investors, we analyze data from Social Media platforms like Reddit, Stocktwits and Seeking Alpha. Our methodology combines Sentiment Analysis and Topic Modelling. Sentiment Analysis is conducted using VADER and a fine-tuned BERT model. For Topic Modelling, we utilize LDA, NMF and the state-of-the-art BERTopic. We identify the topics and shapes of discussions over time and evaluate the potential for leveraging information of the decision-making process of investors for trading choices. We utilize Random Forest and Neural Network Models to show that latent information in discussions can be exploited for trend prediction of stocks affected by Social Network driven herd behavior. Our findings provide valuable insights into content and sentiment of discussions and are a vehicle to improve efficient trading decisions for stocks affected from short-term herd behavior.

Download

selected topics (Sports and Space) for the best performing topic models accross all metrics as well as for a bad performing model.
Topics in the Haystack: Enhancing Topic Quality through Corpus Expansion

January 2024

·

43 Reads

·

4 Citations

Computational Linguistics

Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic models, follow the same underlying approach of topic interpretability and topic extraction. We propose a method that incorporates a deeper understanding of both sentence and document themes, and goes beyond simply analyzing word frequencies in the data. Through simple corpus expansion, our model can detect latent topics that may include uncommon words or neologisms, as well as words not present in the documents themselves. Additionally, we propose several new evaluation metrics based on intruder words and similarity measures in the semantic space. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task. We demonstrate the competitive performance of our method with a large benchmark study, and achieve superior results compared to state-of-the-art topic modeling and document clustering models.



Variable list
Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

March 2023

·

20 Reads

Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic models, follow the same underlying approach of topic interpretability and topic extraction. We propose a method that incorporates a deeper understanding of both sentence and document themes, and goes beyond simply analyzing word frequencies in the data. This allows our model to detect latent topics that may include uncommon words or neologisms, as well as words not present in the documents themselves. Additionally, we propose several new evaluation metrics based on intruder words and similarity measures in the semantic space. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task. We demonstrate the competitive performance of our method with a large benchmark study, and achieve superior results compared to state-of-the-art topic modeling and document clustering models.


Figure 2. The figure sketches how the different types of approximations could be implemented as neural networks.
Figure 4. Turning the knot positions into learnable model parameters, enables the SNAMs to fit jagged functions. The black marks on the x-axis indicate the knot positions after training.
Figure 5. The results of the SplineNAM on the CA Housing dataset displaying a heatmap of the spatial effect.
Structural Neural Additive Models: Enhanced Interpretable Machine Learning

February 2023

·

152 Reads

Deep neural networks (DNNs) have shown exceptional performances in a wide range of tasks and have become the go-to method for problems requiring high-level predictive power. There has been extensive research on how DNNs arrive at their decisions, however, the inherently uninterpretable networks remain up to this day mostly unobservable "black boxes". In recent years, the field has seen a push towards interpretable neural networks, such as the visually interpretable Neural Additive Models (NAMs). We propose a further step into the direction of intelligibility beyond the mere visualization of feature effects and propose Structural Neural Additive Models (SNAMs). A modeling framework that combines classical and clearly interpretable statistical methods with the predictive power of neural applications. Our experiments validate the predictive performances of SNAMs. The proposed framework performs comparable to state-of-the-art fully connected DNNs and we show that SNAMs can even outperform NAMs while remaining inherently more interpretable.



Figure 1. The network structure of a simple NAMLSS model. Each input variable as well as each distributional parameter is handled by a different neural network. h (k) are different activation functions depending on the distributional parameter that is modelled. E.g. a quadratic transformation for modelling the variance in a normally distributed variable to ensure the non-negativity constraint. The presented structure demonstrates a NAMLSS modelling a distribution with two parameters, e.g. a normal distribution.
Figure 3. California Housing: Graphs for median income and population respectively learned by the NAMLSS model. We see an increase in housing prices with a larger median income. Additionally, we find a larger variance in housing prices in less densely populated areas.
Figure 4. California Housing: Graphs for longitude and latitude respectively learned by the NAMLSS model. The house price jumps around the location of Los Angeles are depictable. Additionally, we find a decrease in variance for areas further away from the large cities.
Statistics of the benchmarking datasets.
Hyperparameters for the neural models for the California Housing dataset
Neural Additive Models for Location Scale and Shape: A Framework for Interpretable Neural Regression Beyond the Mean

January 2023

·

142 Reads

Deep neural networks (DNNs) have proven to be highly effective in a variety of tasks, making them the go-to method for problems requiring high-level predictive power. Despite this success, the inner workings of DNNs are often not transparent, making them difficult to interpret or understand. This lack of interpretability has led to increased research on inherently interpretable neural networks in recent years. Models such as Neural Additive Models (NAMs) achieve visual interpretability through the combination of classical statistical methods with DNNs. However, these approaches only concentrate on mean response predictions, leaving out other properties of the response distribution of the underlying data. We propose Neural Additive Models for Location Scale and Shape (NAMLSS), a modelling framework that combines the predictive power of classical deep learning models with the inherent advantages of distributional regression while maintaining the interpretability of additive models.


Penalized Regression Splines in Mixture Density Networks

December 2022

·

91 Reads

·

1 Citation

Mixture Density Networks (MDN) belong to a class of models that can be applied to data which cannot be sufficiently described by a single distribution since it originates from different components of the main unit and therefore needs to be described by a mixture of densities. In some situations, however, MDNs seem to have problems with the proper identification of the latent components. While these identification issues can to some extent be contained by using custom initialization strategies for the network weights, this solution is still less than ideal since it involves subjective opinions. We therefore suggest replacing the hidden layers between the model input and the output parameter vector of MDNs and estimating the respective distributional parameters with penalized cubic regression splines. Applying this approach to data from Gaussian mixture distributions as well gamma mixture distributions proved to be successful with the identification issues not playing a role anymore and the splines reliably converging to the true parameter values.


Process of the Document Simulation and Analysis
Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data

July 2022

·

269 Reads

·

25 Citations

Computational Statistics

Topic models are a useful and popular method to find latent topics of documents. However, the short and sparse texts in social media micro-blogs such as Twitter are challenging for the most commonly used Latent Dirichlet Allocation (LDA) topic model. We compare the performance of the standard LDA topic model with the Gibbs Sampler Dirichlet Multinomial Model (GSDMM) and the Gamma Poisson Mixture Model (GPM), which are specifically designed for sparse data. To compare the performance of the three models, we propose the simulation of pseudo-documents as a novel evaluation method. In a case study with short and sparse text, the models are evaluated on tweets filtered by keywords relating to the Covid-19 pandemic. We find that standard coherence scores that are often used for the evaluation of topic models perform poorly as an evaluation metric. The results of our simulation-based approach suggest that the GSDMM and GPM topic models may generate better topics than the standard LDA model.


Citations (9)


... The ZeroShot Topic Model (Bianchi et al. 2021) or the Contextualized Topic Model (Bianchi et al. 2020), additionally integrate embedded documents (Reimers and Gurevych 2019) into their modelling. These embedded documents also allow for using simple clustering techniques such as HDBSCAN or Gaussian Mixture models (Grootendorst 2022;Angelov 2020;Thielmann et al. 2024) and create topics, that are on par with state of the art probabilistic models (Thielmann et al. 2024(Thielmann et al. , 2022. While the underlying clustering techniques create only clustered documents, the topics can be easily extracted from them, by utilizing either distances in the embedding space (Angelov 2020;Thielmann et al. 2024) or class based term frequency-inverse document frequency scores (Grootendorst 2022). ...

Reference:

One-way ticket to the moon? An NLP-based insight on the phenomenon of small-scale neo-broker trading
Topics in the Haystack: Enhancing Topic Quality through Corpus Expansion

Computational Linguistics

... Secondly, BERTopic is easy to use and does not require extensive hyperparameter tuning, making it a user-friendly tool for topic modeling. Finally, BERTopic is highly interpretable, as it provides not only the topics but also the words and phrases that are most representative of each topic, making it a valuable tool for understanding the underlying structure of the text [23]. To conclude, BERTopic is a powerful tool for topic modeling that leverages the strengths of BERT to provide improved performance over traditional topic modeling methods. ...

Coherence based Document Clustering
  • Citing Conference Paper
  • February 2023

... Then the documents are traversed for several iterations, in each of which a cluster is re-assigned to each document according to a conditional distribution where these rules are followed: 1) choose a cluster with more documents, 2) choose a cluster where the documents therein have higher similarity. GSDMM has a much better performance on short texts compared with LDA [23]; therefore, it is the better option for this study. ...

Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data

Computational Statistics

... Note that methods that use web-scraping to generate labeled data could be a potential remedy to generate labeled out-of-domain data to train a classifier[27][28][29]. However, finding conspiracy-related out-ofdomain data for tweets via web-scraping is very challenging and more time intensive than our approach. ...

AuDoLab: Automatic document labelling and classification for extremely unbalanced data

The Journal of Open Source Software

... For evaluation, we hence propose new, non word-co-occurence based measures and use existing measures leveraging word embeddings (Terragni, Fersini, and Messina 2021). We validate the intruder based metrics by computing correlations with human annotations. ...

One-Class Support Vector Machine and LDA Topic Model Integration—Evidence for AI Patents
  • Citing Chapter
  • June 2021

Studies in Computational Intelligence

... Unstructured text contents, in contrast, need to be represented as a set of features using text mining (Munzert et al., 2014) or topic modeling (Thielmann et al., 2021). While text mining focuses on extracting features, such as keywords or item frequencies from text, topic modeling can be useful for identifying semantic clusters that may relate to perceptions or discourses referring to places or spatial entities. ...

Unsupervised Document Classification integrating Web Scraping, One-Class SVM and LDA Topic Modelling
  • Citing Article
  • March 2021

... For example, Kant, Weisser, and Säfken (2020) offer an approach for aggregating tweets based on common hashtags, a strategy also utilized by Luber et al. (2021). On the other hand Thielmann et al. (2021) and Thielmann, Weisser, and Krenz (2021) use expansion corpora before actual topic modeling to combat severe imbalances in their corpora. Bicalho et al. (2017) present a framework for extending short documents in topic modeling, although it only employs words already present in the main vocabulary. ...

Unsupervised Document Classification integrating Web Scraping, One-Class SVM and LDA Topic Modelling

... Additionally, we introduce a novel evaluation metric, based upon the centroid cluster of stopwords in the embedding space. Given the approach of enhancing the reference corpus, the described model might be especially useful when evaluating short texts or identifying sparsely represented topics in a corpus [48,49]. Through the inherent sparsity of the data, the words best describing a topic might not be included in the reference corpus and an enhancement could thus greatly improve the creation of topics. ...

One-Class Support Vector Machine and LDA Topic Model Integration - Evidence for AI Patents