
Agnieszka Mykowiecka- Doctor of Engineering
- Professor (Associate) at Polish Academy of Sciences
Agnieszka Mykowiecka
- Doctor of Engineering
- Professor (Associate) at Polish Academy of Sciences
About
87
Publications
7,747
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
559
Citations
Introduction
Current institution
Additional affiliations
October 2003 - present
October 2003 - present
January 1987 - present
Publications
Publications (87)
We describe new functionalities that have been added to an existing and widely used software solution in Polish public healthcare. The system automatically evaluates a number of medical scores. It provides epidemiologic monitoring concerning infectious diseases; classifies a particular patient to a specific risk group; and detects anomalies. Moreov...
The paper addresses an experiment in detecting metaphorical usage of adjectives and nouns in Polish data. First, we describe the data developed for the experiment. The corpus consists of 1833 excerpts containing adjective-noun phrases which can have both metaphorical and literal senses. Annotators assign literal or metaphorical senses to all adject...
Medical free-text records store a lot of useful information that can be exploited in developing computer-supported medicine. However, extracting the knowledge from the unstructured text is difficult and depends on the language. In the paper, we apply Natural Language Processing methods to process raw medical texts in Polish and propose a new method...
In the paper, we test two different approaches to the {unsupervised} word sense disambiguation task for Polish. In both methods, we use neural language models to predict words similar to those being disambiguated and, on the basis of these words, we predict the partition of word senses in different ways. In the first method, we cluster selected sim...
The paper addresses the problem of automatic identification of phrases to be included in back-of-book indexes. We analyzed books in Polish and English published with subject indexes compiled by their authors. We checked what kinds of phrases are placed in those indexes and how often they actually occur in the corresponding books. In the experiments...
The paper addresses the problem of extending an existing and widely used program for Polish public healthcare with a function for detecting possible occurrences of drug side effects. The task is performed in two steps. First, we extract information that binds names of drugs with side effects and their frequency. In the next step, we look for simila...
Is it true that patients with similar conditions get similar diagnoses? In this paper we present a natural language processing (NLP) method that can be used to validate this claim. We (1) introduce a method for representation of medical visits based on free-text descriptions recorded by doctors, (2) introduce a new method for segmentation of patien...
Is it true that patients with similar conditions get similar diagnoses? In this paper we show NLP methods and a unique corpus of documents to validate this claim. We (1) introduce a method for representation of medical visits based on free-text descriptions recorded by doctors, (2) introduce a new method for clustering of patients' visits and (3) p...
In our paper, we address the problem of recognition of irrelevant phrases in terminology lists obtained with an automatic term extraction tool. We focus on identification of multi-word phrases that are general terms or discourse expressions. We defined several methods based on comparison of domain corpora and a method based on contexts of phrases i...
The paper addresses the Polish version of SimLex-999 which we extended to contain not only measurement of similarity but also relatedness. The data was translated by three independent linguists; discrepancies in translation were resolved by a fourth person. The agreement rates between the translators were counted and an analysis of problems was per...
Testing word embeddings for Polish
Distributional Semantics postulates the representation of word meaning in the form of numeric vectors which represent words which occur in context in large text data. This paper addresses the problem of constructing such models for the Polish language. The paper compares the effectiveness of models based on lemma...
Każda dziedzina wiedzy czy forma komunikacji posiada własne, charakterystyczne dla niej, słownictwo. Słowniki zawierające słowa i wielowyrazowe terminy dziedzinowe służące identyfikacji istotnych pojęć i ich leksykalnych odpowiedników w tradycyjnym podejściu tworzone były przez zajmujących się tą dziedziną specjalistów. Metoda ta jest jednak bardzo...
Every knowledge domain or form of communication has its own characteristic vocabulary. In the traditional approach, dictionaries containing words and multi-word terms identifying important concepts and their lexical equivalents were created by specialists in a subject area. This method, however, is very time-consuming and therefore inadequate, espe...
The poster presents a tool for terminology extraction described in the paper published in the LREC 2016 conference proceedings.
The purpose of this paper is to introduce the TermoPL tool created to extract terminology from domain corpora in Polish. The program extracts noun phrases, term candidates, with the help of a simple grammar that can be adapted for user's needs. It applies the C-value method to rank term candidates being either the longest identified nominal phrases...
Domain corpora are often not very voluminous and even important terms can occur in them not as isolated maximal phrases but only within more complex constructions. Appropriate recognition of nested terms can thus influence the content of the extracted candidate term list and its order. We propose a new method for identifying nested terms based on a...
In the paper, we examine the idea of supporting domain ontology creation by an automatic clustering of selected
terms identified using a terminology extraction method.
We discuss the problem of introducing a structure into a set of similar concepts.
We extract terminology from economic articles in Polish Wikipedia, then we select several sets of s...
In the paper we analyse Polish descriptive adjectives which occur in domain related texts. The experiments were done on data obtained from hospital discharge records. Prenominal adjectives selected from these texts were filtered out of presumably relative adjectives and clustered on the basis of a set of context related features and interword relat...
Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. T...
In the paper, we propose a new method of identifying terms nested within candidates for the terms extracted from domain texts. The list of all terms is then ranked by the process of automatic term recognition. Our method of identifying nested terms is based on two aspects: grammatical correctness and normalised pointwise mutual information (NPMI) c...
This book constitutes the refereed proceedings of the International Conference on Intelligent Information Systems, IIS 2013, held in Warsaw, Poland in June 2013. The 28 full papers included in this volume were carefully reviewed and selected from 53 submissions. The contributions are organized in topical sections named: Natural language processing,...
Conference Proceedings of the International on Intelligent Information Systems, IIS 2013
This book constitutes the refereed proceedings of the International Conference on Intelligent Information Systems, IIS 2013, held in Warsaw, Poland in June 2013. The 28 full papers included in this volume were carefully reviewed and selected from 53 submission...
The paper presents a method of extracting terminology from Polish texts which consists of two steps. The first one identifies candidates for terms, and is supported by linguistic knowledge-a shallow grammar used for extracted phrases is given. The second step is based on statistics, consisting in ranking and filtering candidates for domain terms wi...
The paper presents results of clustering terms extracted from economic articles in Polish Wikipedia. First, we describe the method of automatic term extraction supported by linguistic knowledge. Then, we define different types of term similarities used in the clustering experiment. Term similarities are based on Polish Wordnet and morphosyntactic a...
In this paper we present a process of converting an existing dictionary of Old Polish into LMF (Lexical Markup Framework) format. We discuss problems related to the transformation of a resource build to be used as a paper book into its electronic version. We describe the subsequent stages of the process consisting in scanning the paper source follo...
The paper presents the first results of clustering terms extracted from hospital discharge documents written in Polish. The aim of the task is to prepare data for an ontology reflecting the domain of documents. To begin, the characteristic of the language of texts, which differs significantly from general Polish, is given. Then, we describe the met...
This paper presents the results of testing two approaches in the automatic semantic labeling of medical data. For a chosen
domain (diabetic patients’ discharge records) a set of domain related concepts was identified. The annotated resource is the
result of a rule based application, that relies on the results of two related rule based information e...
The paper presents a method of automatic construction of a semantically annotated corpus using the results of a rulebased information extraction (IE) application. Construction of the corpus is based on using existing programs for text tokenization and morphological analysis and combining their results with domain related correction rules. We reuse...
The paper presents an ontology of natural language temporal expressions which occur in dialogues led by users of a public
transport call center. It was elaborated on the basis of analysis of 500 transliterated dialogues and contains a multihierarchy
of concepts representing semantics of time referring fragments of interlocutors’ utterances. The pap...
In this paper we present a corpus of Polish spoken dialogues annotated on several levels, from transcription of dialogues and their morphosyntactic analysis, to semantic annotation. The corpus is one of the results of LUNA project. The description is concentrated on
the semantic annotation on the levels of concepts (attribute-value) and predicates...
In this paper we present arguments that elaborating a rule based information extraction system is a good starting point for obtaining a semantic annotated corpus of medical data. Our claim is supported by evaluation results of the automatic annotation of a corpus containing hospital discharge reports of diabetic patients.
The article presents results of an experiment consisting in automatic concept annotation of the transliterated spontaneous
human-human dialogues in the city transportation domain. The data source was a corpus of dialogues collected at a Warsaw call
center and annotated with about 200 concepts’ types. The machine learning technique we used is the li...
The paper describes a rule-based information extraction (IE) system developed for Polish medical texts. We present two applications designed to select data from medical documentation in Polish: mammography reports and hospital records of diabetic patients. First, we have designed a special ontology that subsequently had its concepts translated into...
The paper describes the creation of a domain model for an Information Extraction (IE) application in the medical domain. First,
we present texts: mammography reports and diabetology patients’ discharge documents, for which IE systems were created. The
methodology and results of terminology extraction for both domains are described. Next, the main f...
In the paper we present a method of automatic annotation of transliterated spontaneous human-human dialogues on the level
of domain attributes. It has been used for the preparation of an annotated corpus of dialogs within LUNA project. We describe
the domain ontology, process of manual creation of rules, annotation schema and evaluation.
presentation at NLP seminar, ICS PAS, Warsaw, Poland
In the paper we present the method of automatic recognition and annotation of proper names which occur in dialogs gathered at the Warsaw city transportation information center. We describe different types of proper names and how people use them in dialogs. We present rules of automatic recognition and lemmatization of proper names in the transporta...
The paper concerns construction of the Polish spontaneous spoken dialogs corpus built within the LUNA project. It elaborates on the process of collecting conversations, their transcription and annotation at morpho-syntactic and concept levels. Corpus annotation is performed using a mixture of manual and automated techniques.
The paper presents a corpus of Polish spoken dialogues being a result of the LUNA (spoken Language UNderstanding in multilinguAl communication systems) project. We describe the process of collecting the corpus and its annotation on several levels, from transcription of dialogues and their morphosyntactic analysis, to semantic annotation on concepts...
The paper presents a corpus of Polish spoken dialogues being a result of the LUNA (spoken Language UNderstanding in multilinguAl communication systems) project. We describe the process of collecting the corpus and its annotation on several levels, from
transcription of dialogues and their morphosyntactic analysis, to semantic annotation on concepts...
The paper presents two rule-based information extraction (IE) from two types of patients' documentation in Polish. For both document types, values of sets of attributes were assigned using specially designed grammars.
The paper presents a program for automatic spelling correction of texts from a very specific domain, which has been applied
to mammography reports. We describe different types of errors and present the program of correction based on the Levenshtein
distance and probability of bigrams.
The paper presents a collection of resources developed for Information Extraction (IE) from Polish texts. In particular, we mention two IE platforms adapted to Polish and several IE applications built on top of one of them: named entity recognition, creation of terminology lexicons, and data extraction from medical texts.
The paper 1 describes the ontology development for an IE (Information Extraction) application for Polish mammography reports, experiences and lessons learned, and the evaluation of the system. Information extraction requires prior knowledge on data structures we would like to identify. When information being searched for is as complicated as this c...
The paper presents a rule-based information extraction (IE) system for Polish medical texts. We select the most important information from diabetic patients' records. Most data being processed are free-form texts, only a part is in table form. The work has three goals: to test classical IE methods on texts in Polish, to create relational database c...
We present the final version of the system for automatic content extraction from Polish medical data. The system combines
general IE techniques with an external post-processing. The obtained data is normalized and linked to a simplified ontology.
Then, it is automatically grouped to form more complex structures representing medical reports.
The paper focuses on resolving natural language issues which have been affecting performance of our system processing Polish medical data. In particular, we address phenomena such as ellipsis, anaphora, comparisons, coordination and negation occurring in mammogram reports. We propose practical data-driven solutions which allow us to improve the sys...
In this paper, we present an environment designed for extraction of medical data from mammogram reports. We process data collected from various Polish health care providers and transform them into attribute-value structures, according to a simpli�ed mammographic ontology. We use a general purpose information extraction (IE) platform, SProUT, enrich...
The paper presents a method for intelligent automatic processing of medical reports. First, we extract single pieces of information
using SProUT (a general-purpose Information Extraction platform), and then, externally merge the results in order to obtain
a detailed formalised description of the reports.
The aim of this article is to present the initial results of adapting SProUT, a multi-lingual Natural Language Processing platform developed at DFKI, Germany, to the processing of Polish. The article describes some of the problems posed by the integration of Morfeusz, an external morphological analyzer for Polish, and various solutions to the probl...
In this paper, we present an environment de-signed for extraction of medical data from mam-mographic reports. We process data collected from various Polish health care providers and transform them into attribute-value structures, according to a simplied mammographic on-tology. We use a general purpose informa-tion extraction (IE) platform, SProUT,...
The paper presents both conceptual and technical issues related to the construction of an HPSG test-suite for Polish. The test-suite consists of sentences of written Polish --- both grammatical and ungrammatical. Each sentence is annotated with a list of linguistic phenomena it illustrates. Additionally, grammatical sentences are encoded in HPSG-st...
Przegl , ad system'ow do implementacji gramatyk HPSG Niniejszy raport stanowi przegl , ad najbardziej popularnych system'ow umozliwiaj , acych implementacj , e gramatyk formalnych j , ezyka naturalnego. Selekcji system'ow dokonano ze wzgl , edu na ich przydatno's'c do implementacji gramatyk HPSG. Dla kazdego z uwzgl , ednionych system'ow (TFS, CUF,...
Computer-based generation of natural-language utterances is a complex task the aim of which is to enable computer systems to communicate with their users in natural-language. The focus of this paper is on the problem of how to generate coherent and well-structured texts. After an overview of the features which influence a text's shape, various meth...
In this paper an overview of the problems connected with natural-language (NL) generation as well as some already existing systems are presented. The realization of NL intefaces to computer systems has become recently a very promising goal of AI. Although a lot of research has been already done, no satisfying solution exists as yet. This overview i...
Statistical relationships of the duration of survival and selected prognostic factors, in the group of 192 patients with Hodgkin's disease irradiated in the years 1965-1975 in the Institute of Oncology in Warsaw are discussed. The analysis of the multiple regressions has been performed and correlation coefficients of Pearson calculated. It has been...
Randomized controlled clinical trial has been carried out in the Oncology Center in Warsaw and Medical Academy in Lódź to assess the effectiveness of application of Metronidazole as radiosensitizer of hypoxic cells in patients with squamous cell laryngeal cancer treated by Cobalt60 irradiation in the years 1982-1986. Clinical estimation has been ob...
In this paper we present general assumptions and goals of the LUNA (spoken Language UNderstanding in multilinguAl communication systems) project. We describe the process of collecting a Polish corpus of spoken dialogs and the accepted annotation schema of this corpus at several levels, from transcription of dialogs and morphosyntactic analysis, to...
esu e -Abstract Cet article p esente les aspects conceptuels et techniques concernant la con-struction d'un corpus anno e pour le polonais. Le corpus contient des phrases du polonai ecrit rep esen ees comme des structures AVM dans le cadre de formalisme HPSG. En plus, chaque phrase est anno es pour les types des p eno enes linguistiques illust es....
The paper discusses a program for removing patient identification information from hospital discharge documents in order to make them available for scientific research e.g. information extraction system designing. The presented method allows de-anonymization of documents using a key-code file that is created on the basis of a patient's surname, for...