About
37
Publications
4,780
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
67
Citations
Introduction
I currently work in the field of Explainable Artificial Intelligence (XAI) and Critical AI studies. For it, I try to blend qualitative and quantitative understanding to push a human-oriented approach to those fields. Additionally, I work with Dimension Reductions techniques to set a visual baseline for those methods. In my opinion, visualization is the go-to way for communicating between AI and humans. If you are interested, contact tcech@uni-potsdam.de for details.
Current institution
Publications
Publications (37)
The semantic similarity between documents of a text corpus can be visualized using map-like metaphors based on twodimensional scatterplot layouts. These layouts result from a dimensionality reduction on the document-term matrix or a representation within a latent embedding, including topic models. Thereby, the resulting layout depends on the input...
The semantic similarity between documents of a text corpus can be visualized using map-like metaphors based on two-dimensional scatterplot layouts. These layouts result from a dimensionality reduction on the document-term matrix or a representation within a latent embedding, including topic models. Thereby, the resulting layout depends on the input...
Standard datasets are frequently used to train and evaluate Machine Learning models. However, the assumed standardness
of these datasets leads to a lack of in-depth discussion on how their labels match the derived categories for the respective use case, which we demonstrate by reviewing recent literature that employs standard datasets. We find that...
Machine Learning models underlie a trade-off between accurracy and explainability. Given a trained, complex model, we contribute a dashboard that supports the process to derive more explainable models, here: Fast-and-Frugal Trees, with further introspection using feature importances and spurious correlation analyses. The dashboard further allows to...
Machine Learning models underlie a trade-off between accurracy and explainability. Given a trained, complex model, we contribute a dashboard that supports the process to derive more explainable models, here: Fast-and-Frugal Trees, with further introspection using feature importances and spurious correlation analyses. The dashboard further allows to...
Dimensionality Reduction Techniques (DRs) are used for projecting high-dimensional data onto a two-dimensional plane. One subclass of DRs are such techniques that utilize landmarks. Landmarks are a subset of the original data space that are projected by a slow and more precise technique. The other data points are then placed in relation to these la...
Dimensionality Reduction Techniques (DRs) are used for projecting high-dimensional data onto a two-dimensional plane. One subclass of DRs are such techniques that utilize landmarks. Landmarks are a subset of the original data space that are projected by a slow and more precise technique. The other data points are then placed in relation to these la...
Text spatializations for text corpora often rely on two-dimensional scatter plots generated from topic models and dimensionality reductions. Topic models are unsupervised learning algorithms that identify clusters, so-called topics, within a corpus, representing the underlying concepts. Furthermore, topic models transform documents into vectors, ca...
Text spatializations for text corpora often rely on two-dimensional scatter plots generated from topic models and dimensionality reductions. Topic models are unsupervised learning algorithms that identify clusters, so-called topics, within a corpus, representing the underlying concepts. Furthermore, topic models transform documents into vectors, ca...
Deep learning models achieve high accuracy in the semantic segmentation of 3D point clouds; however, it is challenging to discern which patterns a model has learned and how it derives its output from the input. Recently, the Integrated Gradients method has been adopted to explain semantic segmentation models for 3D point clouds. This method can be...
As machine learning models are becoming more widespread and see use in high-stake decisions, the explainability of these decisions is getting more relevant. One approach for explainability are counterfactual explanations, which are defined as changes to a data point such that it appears as a different class. Their close connection to the original d...
Text synthesis tools are becoming increasingly popular and better at mimicking human language. In trust-sensitive decisions, such as plagiarism and fraud detection, identifying AI-generated texts poses larger difficulties: decisions need to be made explainable to ensure trust and accountability. To support users in identifying AI-generated texts, w...
Topic models are a class of unsupervised learning algorithms for detecting the semantic structure within a text corpus. Together with a subsequent dimensionality reduction algorithm, topic models can be used for deriving spatializations for text corpora as two-dimensional scatter plots, reflecting semantic similarity between the documents and suppo...
Topic models are a class of unsupervised learning algorithms for detecting the semantic structure within a text corpus. Together with a subsequent dimensionality reduction algorithm, topic models can be used for deriving spatializations for text corpora as two-dimensional scatter plots, reflecting semantic similarity between the documents and suppo...
Quali-quantitative methods provide ways for interrogating Convolutional Neural Networks (CNN). For it, we propose a dashboard using a quali-quantitative method based on quantitative metrics and saliency maps. By those means, a user can discover patterns during the training of a CNN. With this, they can adapt the training hyperparameters of the mode...
Using software metrics as a method of quantification of software, various approaches were proposed for locating defect-prone source code units within software projects. Most of these approaches rely on supervised learning algorithms, which require labeled data for adjusting their parameters during the learning phase. Usually, such labeled training...
Continuous Integration and Continuous Delivery are best practices used in the context of DevOps. By using automated pipelines for building and testing small software changes, possible risks are intended to be detected early. Those pipelines continuously generate log events that are collected in semi-structured log files.
In practice, these log fil...
Self-Supervised Network Projections (SSNP) are dimensionality reduction algorithms that produce low-dimensional layouts from high-dimensional data. By combining an autoencoder architecture with neighborhood information from a clustering algorithm, SSNP intend to learn an embedding that generates visually separated clusters. In this work, we extend...
For various program comprehension tasks, software visualization techniques can be beneficial by displaying aspects related to the behavior, structure, or evolution of software. In many cases, the question is related to the semantics of the source code files, e.g., the localization of files that implement specific features or the detection of files...
Presentation of research paper "CodeCV: Mining Expertise of GitHub Users from Coding Activities" at the 22nd IEEE International Working Conference on Source Code Analysis and Manipulation in Limassol, Cyprus.
The number of software projects developed collaboratively on social coding platforms is steadily increasing. One of the motivations for developers to participate in open-source software development is to make their development activities easier accessible to potential employers, e.g., in the form of a resume for their interests and skills. However,...
Based on the assumption that semantic relatedness between documents is reflected in the distribution of the vocabulary, topic models are a widely used class of techniques for text analysis tasks. The application of topic models results in concepts, the so-called topics,
and a high-dimensional description of the documents. For visualization tasks, t...
Presentation of research paper "A Benchmark for the Use of Topic Models for Text Visualization Tasks" at the 15th International Symposium on Visual Information Communication and Interaction in Chur, Switzerland.
The number of publicly accessible software repositories on online platforms is growing rapidly. With more than 128 million public repositories (as of March 2020), GitHub is the world’s largest platform for hosting and managing software projects. Where it used to be necessary to merge various data sources, it is now possible to access a wealth of da...
Presentation of the research paper "Efficient GitHub Crawling using the GraphQL API" at the 22nd International Conference on Computational Science and Its Applications in Malaga, Spain.
In order to detect software risks at an early stage, various software visualization techniques have been developed for monitoring the structure, behaviour, or the underlying development process of software. One of greatest risks for any IT organization consists in an inappropriate distribution of knowledge among its developers, as a projects' succe...
Presentation of research paper "Visualization of Knowledge Distribution across Development Teams using 2.5D Semantic Software Maps".
The number of publicly accessible software repositories on online platforms is growing rapidly. With more than 128 million public repositories (as of March 2020), GitHub is the world’s largest platform for hosting and managing software projects. Where it used to be necessary to merge various data sources, it is now possible to access a wealth of da...
Software visualization techniques provide effective means for program comprehension tasks as they allow developers to interactively explore large code bases. A frequently encountered task during software development is the detection of source code files of similar semantic. To assist this task we present Software Forest, a novel 2.5D software visua...