Maria Skeppstedt's research while affiliated with Linnaeus University and other places

Publications (43)

Article
Full-text available
The Topics2Themes tool, which enables text analysis on the output of topic modelling, was originally developed for the English language. In this study, we explored and evaluated adaptations required for applying the tool to Japanese texts. That is, we adapted Topics2Themes to a language that is very different from the one for which the tool was ori...
Conference Paper
Full-text available
A tool that enables the use of active learning , as well as the incorporation of word embeddings, was evaluated for its ability to decrease the training data set size required for a named entity recognition model. Uncertainty-based active learning and the use of word embeddings led to very large performance improvements on small data sets for the e...
Conference Paper
We explored adaptions required for applying the topic modelling tool Topics2Themes to a language that is very different from the one for which the tool was originally developed. Topics2Themes, which enables text analysis on the output of topic modelling, was developed for English, and we here applied it on Japanese texts. As white space is not used...
Article
Computer-assisted text coding can facilitate the analysis of large text collections. To evaluate the functionality of providing an analyst with a ranked list of suggestions for suitable text codes, we used a data set of discussion posts, which had been manually coded for reasons given for taking a stance on the topic of vaccination. We trained a lo...
Conference Paper
The analysis of various opinions and arguments in textual data can be facilitated by automatic topic modeling methods; however, the exploration and interpretation of the resulting topics and terms may prove to be difficult to the analysts. Opinions, stances, arguments, topics, terms, and text documents are usually connected with many-to-many relati...
Article
Arguments used when vaccination is debated on Internet discussion forums might give us valuable insights into reasons behind vaccine hesitancy. In this study, we applied automatic topic modelling on a collection of 943 discussion posts in which vaccine was debated, and six distinct discussion topics were detected by the algorithm. When manually cod...
Article
Full-text available
The aim of this study is to explore the possibility of identifying speaker stance in discourse, provide an analytical resource for it and an evaluation of the level of agreement across speakers. We also explore to what extent language users agree about what kind of stances are expressed in natural language use or whether their interpretations diver...
Conference Paper
The automatic detection of seven types of modifiers was studied: Certainty, Uncertainty, Hypotheticality, Prediction, Recommendation , Concession/Contrast and Source. A classifier aimed at detecting local cue words that signal the categories was the most successful method for five of the categories. For Prediction and Hypotheticality, however, bett...
Conference Paper
Full-text available
Automatic detection of five language components, which are all relevant for expressing opinions and for stance taking, was studied: positive sentiment, negative sentiment, speculation, contrast and condition. A resource-aware approach was taken, which included manual annotation of 500 training samples and the use of limited lexical resources. Activ...
Article
Full-text available
Background Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in most languages. Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English an...
Article
Full-text available
Mining and automating analysis of information from health documents holds great potential for improving health care in many aspects. Health documents include text sources such as medical records, scientific publications, and user-generated texts in e.g. social media. Research in the area of health text mining has grown and matured in recent years....
Conference Paper
Full-text available
A support vector classifier was compared to a lexicon-based approach for the task of detecting the stance categories speculation, contrast and conditional in English consumer reviews. Around 3,000 training instances were required to achieve a stable performance of an F-score of 90 for speculation. This outperformed the lexicon-based approach, for w...
Conference Paper
Full-text available
Random indexing has previously been successfully used for medical vocabulary expansion for Germanic languages. In this study, we used this approach to extract medical terms from a Japanese patient blog corpus. The corpus was segmented into semantic units by a semantic role labeller, and different pre-processing and parameter settings were then eval...
Article
Full-text available
Terminologies that account for variation in language use by linking synonyms and abbreviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language...
Article
Full-text available
The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. We integrat...
Article
Full-text available
Objective The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. M...
Article
Full-text available
We translated an existing English negation lexicon (NegEx) to Swedish, French, and German and compared the lexicon on corpora from each language. We observed Zipf's law for all languages, i.e., a few phrases occur a large number of times, and a large number of phrases occur fewer times. Negation triggers "no" and "not" were common for all languages...
Article
Text prediction has the potential for facilitating and speeding up the documentation work within health care, making it possible for health personnel to allocate less time to documentation and more time to patient care. It also offers a way to produce clinical text with fewer misspellings and abbreviations, increasing readability. We have explored...
Article
Full-text available
Named entity recognition of the clinical entities disorders, findings and body structures is needed for information extraction from unstructured text in health records. Clinical notes from a Swedish emergency unit were annotated and used for evaluating a rule-and terminology-based entity recognition system. This system used different preprocessing...
Article
Full-text available
Free text is helpful for entering information into electronic health records, but reusing it is a challenge. The need for language technology for processing Finnish and Swedish healthcare text is therefore evident; however, Finnish and Swedish are linguistically very dissimilar. In this paper we present a comparison of characteristics in Finnish an...
Article
Full-text available
Most methods for negation detection in clinical text have been developed for English text, and there is a need for evaluating the feasibility of adapting these methods to other languages. A Swedish adaption of the English rule-based negation detection system NegEx, which detects negations through the use of trigger phrases, was therefore evaluated....
Article
In this paper we describe the creation of a consensus corpus that was obtained through combining three individual annotations of the same clinical corpus in Swedish. We used a few basic rules that were executed automatically to create the consensus. The corpus contains negation words, speculative words, uncertain expressions and certain expressions...
Conference Paper
NegEx, a rule-based algorithm that detects negations in English clinical text, was translated into Swedish and evaluated on clinical text written in Swedish. The NegEx algorithm detects negations through the use of trigger phrases, which indicate that a preceding or following concept is negated. A list of English trigger phrases was translated into...
Article
Full-text available
We present a comparative study of Finnish and Swedish free-text nursing narratives from intensive care. Although the two languages are linguistically very dissimilar, our hypothesis is that there are similarities that are important and interesting from a language technology point of view. This may have implications when building tools to support pr...
Article
Access to reliable data from electronic health records is of high importance in several key areas in patient care, biomedical research, and education. However, many of the clinical entities are negated in the patient record text. Detecting what is a negation and what is not is therefore a key to high quality text mining. In this study we used the N...
Article
A common problem when combining two bilingual dictionaries to make a third, us-ing one common language as a pivot lan-guage, is the emergence of false trans-lations due to lexical ambiguity between words in the languages involved. This pa-per examines if the translation accuracy improves when using part-of-speech filter-ing of translation candidate...
Article
Full-text available
Physicians are daily demanded to read, understand and reach a summarized comprehension of earlier documentation for the patient at hand. This documentation includes medical procedures and clinical findings such as symptoms, observations, and diagnoses but also reasoning and speculation by previous physicians and nurses. The information is sometimes...

Citations

... As regards argumentation mining for the Russian language, there are not so many studies and datasets on the topic. (Fishcheva and Kotelnikov, 2019) translated into Russian and researched the English language Argumentative Microtext Corpus (ArgMicro) (Peldszus and Stede, 2015;Skeppstedt et al., 2018). In (Fishcheva et al., 2021) this corpus was expanded with machine translation of the Persuasive Essays Corpus (PersEssays) (Stab and Gurevych, 2014). ...
... Frequently occurring words and phrases related to a debate reveals the focus of an ODE [11,[123][124][125]. Natural language processing methods such as topic modelling [126,127] or content analysis [128,129] have also been used to identify the focus of ODEs. The focus is important to quickly make sense of the content shared by an ODE or a group of ODEs. ...
... The stance detection task overlaps with, and is closely related to, different classification tasks such as sentiment analysis (Chakraborty et al., 2020), troll detection (Tomaiuolo et al., 2020), rumour and fake news detection Rani et al., 2020;Collins et al., 2020), and argument mining (Lawrence & Reed, 2020). In addition, stance can be impacted by the discursive and dynamic nature of the task (Mohammad et al., 2017;Somasundaran & Wiebe, 2009;Simaki et al., 2017). ...
... One task closely related to sentiment analysis is stance analysis/classification, which is often defined as the problem of deciding whether a person is in favor or against a given target (topic, entity, etc.) of interest [38]. The definition and operationalization of the concept of stance can be much more broad, as it can involve further aspects of (inter-)subjectivity beyond agreement/disagreement and sentiment/emotions, for instance, uncertainty or rudeness [39], [40]; however, we do not follow that broader definition in this work and focus on stance as sentiment/attitude expressed towards a specific topic. ...
... The existing computational approaches usually focus on AGREEMENT or DISAGREEMENT on a certain topic (Chen and Ku 2016; Mohammad et al. 2016Mohammad et al. , 2017, and only a few works take a wider view of stance aspects/categories into account, such as NECESSITY and VOLITION . In this work, we follow the approach taken in our interdisciplinary project, where the researchers in linguistics defined the stance categories of interest (see Table 1) and the experts in computational linguistics implemented a custom stance classifier (Skeppstedt et al. 2016b. Classification is carried out at the utterance level in a multi-label fashion, i.e., one utterance can be labeled with multiple stance categories simultaneously. ...
... Although also less extensive vocabularies have been shown useful for medical text mining [17], limitations in the vocabularies used can lead to decreased performance. Previous studies on applying Swedish UMLS resources (which are less extensive than resources for English) for detecting entities in Swedish clinical text showed, for instance, a recall of 55 % for detecting Disorders, 33 % for detecting Findings, 80 % for detecting Body Parts [18], and a recall of 74 % for detecting Pharmaceutical Drugs [19]. Controlled medical vocabularies are, moreover, typically focused on terms from the professional medical language, despite the important applications of text mining from social media, such as syndromic surveillance [20,21] or detection of adverse drug reactions [22]. ...
... According to the Guide to Teaching English in College 2020, "College English, " as a compulsory course in higher education, is both humanistic and instrumental, and plays an important role in improving the overall quality of college students (Ahltorp et al., 2016;Wang, 2016). The English language course should be teacher-oriented, mobilize students' motivation to learn foreign languages, and meet the future development needs of the country and students (Lee and Park, 2008). ...
... The medium for arguing for human beings is natural language, whereas the input for ML algorithms and techniques should be distinct, structured and composed with well-established rules. A wide range of methodologies have been applied for modeling natural language, such as explicit distinctive components [1], argumentative zoning [21], tree structures, dialogoriented diagrams [22], serial structure of arguments [23,24] and modifications to simpler structures of existing schemes [25,26]. However, the claim or other parts of an argument might be implicit [27,8,28,29,9] and tacit assumptions or premises (enthymemes) take place related to commonsense reasoning. ...
... In the literature, Narayanan et al, 19 Skeppstedt et al, 20 Mausam et al, 16 Mausam, 21 Chikersal et al, 17 and Nakayama and Fujii 18 have worked on related proposals. Narayanan et al 19 highlighted the problems of not dealing with conditions in the field of opinion mining. ...
... If such sub-clusters are positioned at large distances from each other in the semantic space, this might have the effect that words that are not part of these sub-clusters, but close to two or more clusters, will incorrectly receive a higher ranking than words that are close to the centroids of the sub-clusters. This was shown to be the case in a study using distributional semantics for expanding a Swedish vocabulary of cue terms for uncertainty and negation [63]. The strategy of first clustering the seed terms used into more distributionally similar subsets and thereafter using similarity to the centroids of these subsets as the criterion for ranking unknown words outperformed the strategy of treating the seed terms used as one single distributionally similar category of terms. ...