Michaela GeierhosUniversity of the Bundeswehr Munich · Fakultät für Informatik
Michaela Geierhos
Professor
About
90
Publications
17,982
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
252
Citations
Introduction
Additional affiliations
April 2020 - present
October 2017 - March 2020
January 2013 - September 2017
Publications
Publications (90)
With the proliferation of social media, more personal information is being shared online than ever before, raising significant privacy concerns. This paper presents a novel approach to identify and mitigate privacy risks by generating digital twins from social media data. We propose a comprehensive framework that includes data collection, processin...
In recent years, many studies have focused on correlating the profiles of real users across different social media. On the one hand, this provides a better overview of the user’s social behavior; on the other hand, it can be used to warn of possible abuse through identity theft or cyberbullying. We try to make the threat on the Web predictable for...
The rapid growth of digital information and the increasing complexity of user queries have made traditional search methods less effective in the context of business-related websites. This paper presents an innovative approach to improve the search experience across a variety of domains, particularly in the industrial sector, by integrating semantic...
Semantic image editing allows users to selectively change entire image attributes in a controlled manner with just a few clicks. Most approaches use a generative adversarial network (GAN) for this task to learn an appropriate latent space representation and attribute-specific transformations. Attribute entanglement has been a limiting factor for pr...
We present a concept for quantifying evaluative phrases to later compare rating texts numerically instead of just relying on stars or grades. We achieve this by combining deep learning models in an aspect-based sentiment analysis pipeline along with sentiment weighting, polarity, and correlation analyses that combine deep learning results with meta...
This paper elaborates on the notion of uncertainty in the context of annotation in large text corpora, specifically focusing on (but not limited to) historical languages. Such uncertainty might be due to inherent properties of the language, for example, linguistic ambiguity and overlapping categories of linguistic description, but could also be cau...
With the increase of user-generated content on social media, the detection of abusive language has become crucial and is therefore reflected in several shared tasks that have been performed in recent years. The development of automatic detection systems is desirable, and the classification of abusive social media content can be solved with the help...
Abusive language detection has become an integral part of the research, as reflected in numerous publications and several shared tasks conducted in recent years. It has been shown that the obtained models perform well on the datasets on which they were trained, but have difficulty generalizing to other datasets. This work also focuses on model gene...
The following system description presents our approach to the detection of fake news in texts. The given task has been framed as a multi-class classification problem. In a multi-class classification problem, each input chunk is assigned one of several class labels. To dissect content patterns in the training data, we made use of topic modeling. Top...
The following system description presents our approach for detecting fake news in texts. The given task was formulated as a multi-class classification problem. Our approach is based on the combination of two BERT-based classification models: One model determines whether the textual content is relevant to the task; the second model assigns it a trut...
Current question answering systems often focus on providing a simple entity or short sentence as an answer. By gaining confidence in information retrieval systems, users start to ask more complex questions that require sophisticated answers, such as reasoning chains. However, no research has been carried out yet to determine how exhaustive an answe...
This work addresses the automatic resolution of software requirements. In the vision of On-The-Fly Computing, software services should be composed on demand, based solely on natural language input from human users. To enable this, we build a chatbot solution that works with human-in-the-loop support to receive, analyze, correct, and complete their...
We investigated manifestations of ethnic and gender-based prejudice in a rather understudied high-status environment, that is we studied biased ratings of physicians with a migration background and female physicians. In a preregistered, archival study, we analyzed ratings of more than 140,000 physicians on a German rating website for medical profes...
In this study, we describe a text processing pipeline that transforms user-generated text into structured data. To do this, we train neural and transformer-based models for aspect-based sentiment analysis. As most research deals with explicit aspects from product or service data, we extract and classify implicit and explicit aspect phrases from Ger...
Semantic tagging in technical documentation is an important but error-prone process, with the objective to produce highly structured content for automated processing and standardized information delivery. Benefits thereof are consistent and didactically optimized documents, supported by professional and automatic styling for multiple target media....
This paper elaborates on the notion of uncertainty in the context of annotation in large text corpora, specifically focusing on (but not limited to) historical languages. Such uncertainty might be due to inherent properties of the language, for example, linguistic ambiguity and overlapping categories of linguistic description, but could also be cau...
This chapter concentrates on aspect-based sentiment analysis, a form of opinion mining where algorithms detect sentiments expressed about features of products, services, etc. We especially focus on novel approaches for aspect phrase extraction and classification trained on feature-rich datasets. Here, we present two new datasets, which we gathered...
Peer-to-Peer news portals allow Internet users to write news articles and make them available online to interested readers. Despite the fact that authors are free in their choice of topics, there are a number of quality characteristics that an article must meet before it is published. In addition to meaningful titles, comprehensibly written texts a...
The vision of On-the-Fly (OTF) Computing is to compose and provide software services ad hoc, based on requirement descriptions in natural language. Since non-technical users write their software requirements themselves and in unrestricted natural language, deficits occur such as inaccuracy and incompleteness. These deficits are usually met by natur...
On-The-Fly Computing is the vision of covering software needs of end users by fully-automatic compositions of existing software services. End users will receive so-called service compositions tailored to their very individual needs, based on natural language software descriptions. This everyday language may contain inaccuracies and incompleteness,...
Physician review websites are known around the world. Patients review the subjectively experienced quality of medical services supplied to them and publish an overall rating on the Internet, where quantitative grades and qualitative texts come together. On the one hand, these new possibilities reduce the imbalance of power between health care provi...
Physician Review Websites allow users to evaluate their experiences with health services. As these evaluations are regularly contextualized with facts from users’ private lives, they often accidentally disclose personal information on the Web. This poses a serious threat to users’ privacy. In this paper, we report on early work in progress on “Text...
Annotation tools typically use the common text analysis pipeline where
(i) tokenization takes place, (ii) End-of-Sentences are detected, (iii) Part-ofSpeech (POS) tags are assigned, and (iv) syntactic annotations are applied.
But this does not work for non-standard data where rules or pre-trained
models are not yet available for all steps, and boun...
Consulting a physician was long regarded as an intimate and private matter. The physician-patient relationship was perceived as sensitive and trustful. Nowadays, there is a change, as medical procedures and physicians consultations are reviewed like other services on the Internet. To allay user’s privacy doubts, physician review websites assure ano...
In this paper, we present a search solution that makes local news information easily accessible. In the era of fake news, we provide an approach for accessing news information through opinion mining. This enables users to view news on the same topics from different web sources. By applying sentiment analysis on social media posts, users can better...
Users prefer natural language software requirements because of their usability and accessibility. Many approaches exist to elaborate these requirements and to support the users during the elicitation process. But there is a lack of adequate resources, which are needed to train and evaluate approaches for requirement refinement. We are trying to clo...
While requirements focus on how the user interacts with the system, user stories concentrate on the purpose of software features. But in practice, functional requirements are also described in user stories. For this reason, requirements clarification is needed, especially when they are written in natural language and do not stick to any templates (...
One purpose of requirement refinement is that higher-level requirements have to be translated to something usable by developers. Since customer requirements are often written in natural language by end users, they lack precision, completeness and consistency. Although user stories are often used in the requirement elicitation process in order to de...
Welche Informationen über Unternehmenszusammenschlüsse werden in Zeitungsnachrichten vermittelt, und wie können diese Informationen automatisch extrahiert werden? Dies soll am Beispiel des Verhaltens von Aktionären während eines Zusammenschlusses ermittelt werden. Dazu werden die wichtigsten Aussagen über das Votum der Aktionäre im Hinblick auf ein...
Patients 2.0 increasingly inform themselves about the quality of medical services on physician rating websites. However, little is known about whether the reviews and ratings on these websites truly reflect the quality of services or whether the ratings on these websites are rather influenced by patients' individual rating behavior. Therefore, we i...
The contacts a health care provider (HCP), like a physician, has to other HCPs is perceived as a quality characteristic by patients. So far, only the German physician rating website jameda.de gives information about the inter-connectedness of HCPs in business networks. However, this network has to be maintained manually and is thus incomplete. We t...
The individual search for information about physicians on Web 2.0 platforms can affect almost all aspects of our lives. People can directly access physician rating websites via web browsers or use any search engine to find physician reviews and ratings filtered by location resp. specialty. However, sometimes keyword search does not meet user needs...
Der Erfahrungsaustausch zwischen Patienten findet verstärkt über Arztbewertungsportale statt. Dabei ermöglicht die Anonymität des Netzes ein weitestgehend ehrliches Beschwerdeverhal-ten, von dem das sensible Arzt-Patienten-Vertrauensverhältnis unbeschädigt bleibt. Im Rah-men des vorliegenden Beitrags wurden anonyme Arztbewertungen im Web 2.0 automa...
Opinion mining from physician rating websites depends on the quality of the extracted information. Sometimes reviews are user-error prone and the assigned stars or grades contradict the associated
content. We therefore aim at detecting random individual error within reviews. Such errors comprise the disagreement in polarity of review texts and the...
Received medical services are increasingly discussed and recommended on physician rating websites (PRWs). The reviews and ratings on these platforms are valuable sources of information for patient opinion mining. In this paper, we have tackled three issues that come along with inconsistency analysis on PRWs: (1) Natural language processing of user-...
Existing approaches towards service composition demand requirements of the customers in terms of service templates, service query profiles, or partial process models. However, addressed non-expert customers may be unable to fill-in the slots of service templates as requested or to describe, for example, pre- and postconditions, or even have difficu...
In this paper, we describe our system developed for the GErman SenTiment AnaLysis shared Task (GESTALT) for participation in the Maintask 2: Subjective Phrase and Aspect Extraction from Product Reviews. We present a tool, which identifies subjective and aspect phrases in German product reviews. For the recognition of subjective phrases, we pursue a...
In this paper, we present a system which makes scientific data available following the linked open data principle using standards like RDF und URI as well as the popular D2R server (D2R) and the customizable D2RQ mapping language. Our scientific data sets include acronym data and expansions, as well as researcher data such as author name, affiliati...
This paper focuses on the first step in combining prescriptive analytics with scenario techniques in order to provide strategic development after the use of InSciTe, a data prescriptive analytics application. InSciTe supports the improvement of researchers 'individual performance by recommending new research directions. Standardized influential fac...
In this paper, we focus on the acronym representation, the concept of abbreviation of major terminology.
To this end, we try to find the most efficient method to disambiguate the sense of the acronym.
Comparing the various feature types, we found that using single noun (NN) overwhelmingly
outperformed noun phrase (NP) base. Moreover, the result als...
Finding information about people in the World Wide Web is one of the most common activities of Internet users. It is now impossible to manually analyze all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. The Wikipedia community still puts...
Our purpose is to perform data record extraction from online event calendars exploiting sublanguage and domain characteristics. We therefore use so-called domain-dependent data (D3) completely based on
language-specific key expressions and HTML patterns to recognize every single event given on the investigated web page. One of the most remarkable a...
Within this chapter, we will describe a novel technical service dealing with the integration of social networking channels into existing business processes. Since many businesses are moving to online communities as a means of communicating directly with their customers, social media has to be explored as an additional communication channel between...
The conceptual condensability of technical terms permits us to use them as effective queries to search scientific databases. However, authors often employ alternative expressions to represent the meanings of specific terms, in other words, Terminological Paraphrases (TPs) in the literature for certain rea-sons. In this paper, we propose an effectiv...
Since customers first share their problems with a social networking community before directly addressing a company, social networking sites such as Facebook, Twitter, MySpace or Foursquare will be the interface between customer and company. For this reason, it is assumed that social networks will evolve into a common communication channel – not onl...
Finding information about people in the World Wide Web is one of the most common activities of Internet users. It is now impossible to manually analyze all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. The Wikipedia community still puts...
This paper presents a novel linguistic information extraction approach exploiting analysts' stock ratings for statistical decision making. Over a period of one year, we gathered German stock analyst reports in order to determine market trends. Our goal is to provide business statistics over time to illustrate market trends for a user-selected compa...
Within this paper, we will describe a new approach to customer interaction management by integrating social networking channels into existing business processes. Until now, contact center agents still read these messages and forward them to the persons in charge of customer's in the company. But with the introduction of Web 2.0 and social networkin...
Within this paper, we describe the special requirements of a semantic annotation scheme used for biographical event extraction in the framework of the European collaborative research project Biographe. This annotation scheme supports interlingual search for people due to its multilingual support covering four languages such as English, German, Fren...
SCM is a simple, modular and flexible system for web monitoring and customer interaction management. In our view, its main advantages are the following: It is completely web based. It combines all technologies, data, software agents and human agents involved in the monitoring and customer interaction process. It can be used for messages written in...
This extended abstract describes a poster. Its focus is on describing a new approach to customer interaction management by integrating social networking channels into existing business processes. Until now, contact center agents still read these messages and forward them to the persons in charge of customer service and support in the company. But w...
This paper presents an approach to extract data records from websites, particularly ones with event calendars. We therefore use language- specific key expressions and HTML patterns to recognize every single event given on the inves- tigated web page. One of the most remarkable advantages of our method is that it does not re- quire any additional cl...
This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifies business specific information. We therefore concentrate on the extraction of characteristic vocabulary like company names, addresses, contact details, CEOs, etc. Above...
The automatic extraction of biographical information from business news is a complex task. This approach deals with the characterization of the so called biographical relations by means of local grammars. In order to provide a well-founded proceeding, it is necessary to give a complete and accurate definition of the notion “relation” seen here as a...
Ce papier présente le contexte linguistique et la modélisation de notre système iBeCOOL (Informations Biographiques Extraites à l'aide de COntextes Observés Linguistiquement) dédié à l'extraction d'infor-mations biographiques dans les textes de la presse financière en langue anglaise. La notion d'événement biographique (tel que la naissance, le mar...
The standard approach of job search engines disregards the structural aspect of job announcements in the Web. Bag-of-words indexing leads to a high amount of noise. In this paper we describe a method that uses local grammars to transform unstructured Web pages into structured forms. Evaluation experiments show high efficiency of information access...
Dieser Beitrag beschäftigt sich mit der Informationsextraktion aus Stellenanzeigen im französischsprachigen Web. Ziel dieser Arbeit ist es, unstrukturierte Dokumente in Repräsentationsvektoren anhand lokaler Grammatiken zu transformieren. Auf diese Weise wird es möglich, den Stellenmarkt für Jobsuchmaschinen transparenter zu gestalten, indem nur au...