Rafał Jaworski

Rafał Jaworski
  • PhD
  • Professor (Assistant) at Adam Mickiewicz University in Poznań

About

45
Publications
4,619
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
90
Citations
Introduction
Rafał Jaworski currently works at the Faculty of Mathematics and Computer Science, Adam Mickiewicz University. Rafał does research in Computing in Social science, Arts and Humanities, Artificial Intelligence and Algorithms. Their current project is 'Concordia'.
Current institution
Adam Mickiewicz University in Poznań
Current position
  • Professor (Assistant)
Additional affiliations
September 2009 - present
Adam Mickiewicz University in Poznań
Position
  • Professor (Assistant)

Publications

Publications (45)
Article
Full-text available
Parallel corpora have been widely used in the fields of natural language processing and translation as they provide crucial multilingual information. They are used to train machine translation systems, compile dictionaries, or generate inter-language word embeddings. There are many corpora available publicly; however, support for some languages is...
Conference Paper
Full-text available
This paper reports on the implementation and deployment of an MT system in the Polish branch of EY Global Limited. The system supports standard CAT and MT functionalities such as translation memory fuzzy search, document translation and post-editing, and meets less common , customer-specific expectations. The deployment began in August 2018 with a...
Chapter
Full-text available
This paper describes a novel tool for concordance searching, named Concordia. It combines the capabilities of standard concordance searchers with the usability of a translation memory. The tool is described in detail with regard to main applied methods and differences when compared to already existing CAT tools. Concordia uses three data structures...
Chapter
This paper presents the work on automatic converb detection in Old Braj poetry from the 15–17 centuries. This is a part of research on non-finite verbal forms in early New Indo-Aryan (NIA) language corpora comprising data from Old Rajasthani, Awadhi, Braj, Dakkhini and Pahari [8]. The goal of the detection mechanism is to successfully identify a pl...
Article
Full-text available
Computer assisted tools used to seem as though not made from the point of view of their targeted users [O’Brien, 2012:15]. However, their usability has been improving. In Translation Studies there exists a gap in research on process-oriented usability involving data triangulation. In our study based on the assumption that translation is a situated...
Chapter
Full-text available
This paper describes experiments in applying statistical classification algorithms for the detection of converbs – rare word forms found in historical texts in New Indo-Aryan languages. The digitized texts were first manually tagged with the help of a custom made tool called IA Tagger enabling semi-automatic tagging of the texts. One of the feature...
Article
This paper discusses the issue of the influence of personal epistemological tendencies in native speakers of Polish on their perception of non-standard Polish. We argue in favor of taking into consideration the interpersonal differences concerning the way one conceptualizes the reality through the lenses of language as valid factors in of language...
Conference Paper
Full-text available
This paper presents the work on automatic converb detection in Early Braj prose and poetry from 15-17th centuries. This is a continuation of research on non-finite verbs in early New Indo-Aryan (NIA) languages (Jaworski and Stro´nskiStro´nski, 2016). The goal of the detection mechanism is to successfully identify a plaintext word as a converb or no...
Conference Paper
Full-text available
This paper presents an idea to bring crowdsourcing to a higher level, for the purpose of acquiring valuable machine translation and natural language processing resources. In the proposed scenario, students are being educated in order to improve the quality and effectiveness of their natural language processing (NLP) related work. Their motivation i...
Conference Paper
Full-text available
This paper presents an experiment on automatic classification of words, conducted on textual data coming from the Polish Digital Libraries. The main goal was to implement an algorithm which would aid manual extraction of domain-specific vocabulary from raw texts. In this scenario, the electronic format of input texts was a result of optical charact...
Poster
Full-text available
The poster presents 3 basic steps used in the system TMrepository used for collection of parallel data through crowdsourcing: - collecting and uploading parallel corpora - review and quality check - get top ranking (gamification)
Conference Paper
Full-text available
This paper presents the Multiservice platform and its integration with the CLARIN Language Resources Switchboard. Multiservice combines a set of offline natural language processing tools for the Polish language. It features, among others, disambiguating tagging, dependency parsing and coreference resolution. A demonstration version of the platform,...
Conference Paper
Full-text available
End user satisfaction is essential for developing better CAT tools. This is why we conducted a pilot experiment for a larger usability evaluation of Concordia, a translation memory (TM) tool (Jaworski, 2015; online demo: http://concordia.vm.wmi.amu.edu.pl/cat/jrc_plen/). Situated Translation constituted the theoretical framework for the study with...
Conference Paper
Full-text available
This article describes research in automatic content-based temporal classification of texts. Experiments are carried out on a set of texts coming from Polish digital libraries, dating between the years 1814 and 2013. Following successful research in the field of temporal classification, this work aims at creating an automatic dating mechanism to be...
Conference Paper
This article describes a series of experiments on gender attribution of Polish texts. The research was conducted on the publicly available corpus called “He Said She Said”, consisting of a large number of short texts from the Polish version of Common Crawl. As opposed to other experiments on gender attribution, this research takes on a task of clas...
Conference Paper
Full-text available
This article describes a series of experiments on gender attribution of Polish texts. The research was conducted on the publicly available corpus called " He Said She Said " , consisting of a large number of short texts from the Polish version of Common Crawl. As opposed to other experiments on gender attribu-tion, this research takes on a task of...
Article
This article describes a series of experiments on gender attribution of Polish texts. The research was conducted on the publicly available corpus called " He Said She Said " , consisting of a large number of short texts from the Polish version of Common Crawl. As opposed to other experiments on gender attribu-tion, this research takes on a task of...
Conference Paper
Full-text available
This paper presents the idea of applying an open source, web-based platform – Gonito.net – for hosting challenges for researchers in the field of natural language processing. Researchers are encouraged to compete in well-defined tasks by developing tools and running them on provided test data. The researcher who submits the best results becomes the...
Chapter
Full-text available
This article presents the technique of approximate sentence matching. It is known to perform well as an aid in linguistics-related tasks, among others in translation. The article presents two different applications of this technique in author’s algorithms for concordance searching and sentence clustering. Technical part of this article is succeeded...
Chapter
Full-text available
This article describes a novel technique in Computer-Aided Translation (CAT) which is meant to be a new generation of translation memory lookup. It combines the benefits of a regular translation memory search (sometimes referred to as fuzzy sentence matching) with the power of a concordance searcher. Input to the search algorithm is a whole sentenc...
Chapter
Full-text available
The paper reports on PSI-Toolkit, an extensible set of NLP tools developed at Adam Mickiewicz University in Pozna´nPozna´n. All processors of the toolkit operate on a common data structure called PSI-lattice. This feature allows the seamless incorporation of NLP tools created by other researchers. PSI-Toolkit is licensed under LGPL, which allows fo...
Chapter
Full-text available
An annotation tool called IA Tagger, used for semi-automatic tagging of New Indo-Aryan texts, is presented. One of the features of the system is the generation of statistical data on occurrences of words and phrases in various contexts, which helps perform historical linguistic analysis at the levels of morphosyntax, semantics and pragmatics. The p...
Conference Paper
Full-text available
One of the essential steps in the analysis of large document collections is the thematic classification of those documents. As technical capabilities for document acquisition and storage have increased significantly, text corpora have grown to sizes which make manual analysis by humans infeasible. For that reason, it is necessary to design processe...
Conference Paper
translaide.pl is a CAT system developed by the Polish company PolEng Sp. z o.o. that supports multiple input and output languages. The main idea of the system is to enable the sharing of resources among translators. A demo version of the system is available on the internet (http://translaide.pl), yet it is primarily intended for exclusive use in a...
Thesis
Full-text available
Celem pracy jest opracowanie nowych algorytmów z dziedziny przetwarzania języka naturalnego, wyszukiwania przybliżonego oraz analizy skupień, które znajdą zastosowanie we wspomaganiu tłumaczenia z jednego języka naturalnego na inny.
Chapter
Full-text available
In this paper, the idea of Computer-Aided Translation is first introduced and a modern approach to CAT is then presented. Next, we provide a more detailed description of one of the state-of-art CAT systems - memoQ. Then, the author’s approach to the idea - the Anubis system - is described and evaluated. While Anubis is comparable to memoQ in terms...
Article
Full-text available
In the era of the Internet, parallel Translation Memories (TM) are relatively easy to acquire. Online translation databases, published translation memories or even multilingual web pages provide an abundance of translations. One of the major fields, in which translation memories are of use, is Computer-Aided Translation (CAT). For a given sentence,...
Conference Paper
Full-text available
This paper presents an idea in Example-Based Machine Translation - computing the transfer score for each produced translation. When an EBMT system finds an example in the translation memory, it tries to modify the sentence in order to produce the best possible translation of the input sentence. The user of the system, however, is unable to judge th...
Article
Full-text available
The paper presents two ideas for overcoming the problem of translation memory data sparseness. The first is specialization in a narrow domain. The second idea is a novel method of preparing a specialized translation memory for a given purpose. It is based on the assumption that the most useful sentences that might appear in a domain-restricted tran...

Network

Cited By