Marian Simko

Marian Simko
Kempelen Institute of Intelligent Technologies · Natural Language Processing

Assoc. professor

About

60
Publications
7,544
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
676
Citations
Additional affiliations
July 2012 - June 2020
Slovak University of Technology in Bratislava
Position
  • Professor (Assistant)
Description
  • Lecturer: Management in Software Engineering, Management in Information Systems; Teaching assistant: Web Publishing, Procedural Programming, Principles of Software Engineering; Supervisor: bachelor and master degree theses, team project
July 2012 - June 2020
Slovak University of Technology in Bratislava
Position
  • Professor (Assistant)
Description
  • Research interests: domain modeling, user modeling, semantic web, adaptation; ontology engineering, semantics extraction, natural language processing; information processing, information filtering, recommendation, technology enhanced learning
Education
October 2008 - March 2012
Slovak University of Technology in Bratislava
Field of study
  • Software Engineering
September 2006 - June 2008
Slovak University of Technology in Bratislava
Field of study
  • Software Engineering
September 2002 - June 2006

Publications

Publications (60)
Preprint
Full-text available
Cross-lingual knowledge transfer, especially between high- and low-resource languages, remains a challenge in natural language processing (NLP). This study offers insights for improving cross-lingual NLP applications through the combination of parameter-efficient fine-tuning methods. We systematically explore strategies for enhancing this cross-lin...
Preprint
Full-text available
The dissemination of false information across online platforms poses a serious societal challenge, necessitating robust measures for information verification. While manual fact-checking efforts are still instrumental, the growing volume of false information requires automated methods. Large language models (LLMs) offer promising opportunities to as...
Article
Full-text available
The dominant probing approaches rely on the zero-shot performance of image-text matching tasks to gain a finer-grained understanding of the representations learned by recent multimodal image-language transformer models. The evaluation is carried out on carefully curated datasets focusing on counting, relations, attributes, and others. This work int...
Preprint
Full-text available
This position paper discusses the problem of multilingual evaluation. Using simple statistics, such as average language performance, might inject linguistic biases in favor of dominant language families into evaluation methodology. We argue that a qualitative analysis informed by comparative linguistics is needed for multilingual results to detect...
Article
Aspect-based sentiment analysis (ABSA) deals with the determination of sentiments for opinion targets. While historically this research task has been addressed with pipeline approaches, more recent works use neural networks to jointly deal with the aspect term and opinion term extraction, as well as the polarity classification. Although learned tog...
Article
Full-text available
Hate speech should be tackled and prosecuted based on how it is operationalized. However, the existing theoretical definitions of hate speech are not sufficiently fleshed out or easily operable. To overcome this inadequacy, and with the help of interdisciplinary experts, we propose an empirical definition of hate speech by providing a list of 10 ha...
Preprint
Full-text available
We introduce a new Slovak masked language model called SlovakBERT in this paper. It is the first Slovak-only transformers-based model trained on a sizeable corpus. We evaluate the model on several NLP tasks and achieve state-of-the-art results. We publish the masked language model, as well as the subsequently fine-tuned models for part-of-speech ta...
Chapter
While classic aspect-based sentiment analysis typically includes three sub-tasks (aspect extraction, opinion extraction, and aspect-level sentiment classification), recent studies focus on exploring possibilities of knowledge sharing from different tasks, such as document-level sentiment analysis or document-level domain classification that are les...
Chapter
While digital space is a place where users communicate increasingly, the recent threat of COVID-19 infection even more emphasised the necessity of effective and well-organised online environment. Therefore, it is nowadays, more whenever in the past, important to deal with various unhealthy phenomena, that prohibit effective communication and knowle...
Chapter
In this work we combine cross-lingual and cross-task supervision for zero-shot learning. Our main contribution is that we discovered that coupling models, i.e. models that share neither a task nor a language with the zero-shot target model, can improve the results significantly. Coupling models serve as a regularization for the other auxiliary mode...
Chapter
Many languages still lack the annotated training data needed for supervised learning. This issue is often addressed by using auxiliary supervision and the so called transfer learning. In this work we focus on the problem of combining two types of auxiliary supervision – cross-lingual and cross-task. Previous work has shown promising results for thi...
Article
Many intelligent systems in business, government or academy process natural language as an input for their inference or they might even communicate with users in natural language. The natural language processing within them is currently often done utilizing machine learning models. However, machine learning needs training data and such data are oft...
Chapter
Automatic text generation can significantly help to ease human effort in many every-day tasks. Recent advancements in neural networks supported further research in this area and also brought significant improvement in quality of text generation. Unfortunately, most of the research deals with English language and possibilities of text generation of...
Preprint
In this paper, we present neural model architecture submitted to the SemEval-2019 Task 9 competition: "Suggestion Mining from Online Reviews and Forums". We participated in both subtasks for domain specific and also cross-domain suggestion mining. We proposed a recurrent neural network architecture that employs Bi-LSTM layers and also self-attentio...
Article
Full-text available
Support for adaptive learning with respect to increased interaction and collaboration over the educational content in state-of-the-art models of web-based educational systems is limited. Explicit formalization of such models is necessary to facilitate extendibility, reusability and interoperability. Domain models are the most fundamental parts of a...
Chapter
Machine learning is an increasingly important approach to Natural Language Processing. Most languages however do not possess enough data to fully utilize it. When dealing with such languages it is important to use as much auxiliary data as possible. In this work we propose a combination of multitask and multilingual learning. When learning a new ta...
Conference Paper
In this paper, we present neural models submitted to Shared Task on Implicit Emotion Recognition, organized as part of WASSA 2018. We propose a Bi-LSTM architecture with regularization through dropout and Gaussian noise. Our models use three different embedding layers: GloVe word embeddings trained on Twitter dataset, ELMo embeddings and also sente...
Preprint
Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappr...
Conference Paper
When accessing or manipulating large document corpora, semantics is crucial for enabling machines understand document content and deliver advanced functionality such as recommendation or intelligent search. Manual creation of semantics is a tedious task not sufficiently supported in state-of-the-art tools. In our work we focus on supporting efficie...
Article
The domain model is an essential part of an adaptive learning system. For each educational course, it involves educational content and semantics, which is also viewed as a form of conceptual metadata about educational content. Due to the size of a domain model, manual domain model creation is a challenging and demanding task for teachers or content...
Conference Paper
In our work we aim at keyword extraction from movie subtitles. Keywords and key phrases although missing the context can be found very helpful in finding, understanding, organizing and recommending the media content. Generally, they are used by search engines to help find the relevant information. Movies and video content are becoming massively ava...
Conference Paper
In this paper we tackle the problem of lemmatization of inflectional languages. We introduce a new algorithm which utilizes vector models of words. Current approaches in this area are limited to knowing either full grammar rules or the translation matrix between the word and its basic form. However, this information is encoded in natural text. Our...
Article
Full-text available
In the constantly growing blogosphere with no restrictions on form or topic, a number of writing styles and genres have emerged. Recognition and classification of these styles has become significant for information processing with an aim to improve blog search or sentiment mining. One of the main issues in this field is detection of informative and...
Conference Paper
Full-text available
In this demo paper we present adaptive educational system ALEF, which addresses several drawbacks of existing systems supporting learning programming such as limited support for specific adaptation, collaboration, and motivation resulting from complexity of learning programming, which must involve practicing and active experimentation. ALEF constit...
Conference Paper
Creating notes and annotations in educational content during learning helps students in better organization of learning materials. In addition, the provided content represents an interesting source of information for further processing, which can result into enrichment of the educational content or metadata. In this paper we report on new type of a...
Article
People and companies selling goods or providing services have always desired to know what people think about their products. The number of opinions on the Web has significantly increased with the emergence of microblogs. In this paper we present a novel method for sentiment analysis of a text that allows the recognition of opinions in microblogs wh...
Chapter
Full-text available
Web 2.0 has had a tremendous impact on education. It facilitates access and availability of learning content in variety of new formats, content creation, learning tailored to students’ individual preferences, and collaboration. The range of Web 2.0 tools and features is constantly evolving, with focus on users and ways that enable users to socializ...
Conference Paper
People spend large amount of time browsing the Web while fulfilling various needs, but they find it difficult to spare some time for education. We believe that the time spent by browsing can be used more efficiently. We proposed a method for web augmentation during casual web browsing, which facilitates foreign language vocabulary learning. Our met...
Conference Paper
Full-text available
The Web 2.0 principles reflect into learning domain and provide means for interactivity and collaboration. Student activities during learning in this environment can be utilized to gather data usable for learning corpora enrichment. It is now a research issue to examine, to what extent the student crowd is reliable in delivering useful artifacts an...
Conference Paper
Automated acquisition of relevant domain terms from educational documents available in social educational systems can benefit from processing a growing number of user-created annotations assigned to the content. Annotations provide us potentially useful information about documents and can improve the results of base Automatic Term Recognition (ATR)...
Conference Paper
We introduce a tool aimed to facilitate the management of content, metadata and social annotations assigned to documents in semantic web-based applications. The COME2T (COllaboration- and MEtadata-oriented COntent Management EnvironmenT) allows easy administration of lightweight semantics for the provided content and user-created annotations, which...
Conference Paper
Full-text available
To allow advanced processing of information available on the Web, the web content necessitates semantic descriptions (metadata) processable by machines. Manual creation of metadata even in a lightweight form such as (web page) relevant terms is for us humans demanding and almost an impossible task, especially when considering open information space...
Conference Paper
Information growth is faster than ever before. We need to provide advanced services facilitating information “consumption” (e.g., recommendation, personalized navigation). At least a lightweight semantics is necessary for such services. Nowadays keyword paradigm is widely used and seems to achieve satisfactory results in fields such as social bookm...
Conference Paper
Full-text available
Microblogs are a phenomenon of modern social media. As there is much real-time social information in there, they are candidates to be used as a source for mining important information enhancing user experience in variety of web applications, especially those related with content adaptation and recommendation. In this paper we deal with microblog-ba...
Conference Paper
Full-text available
Adaptive educational hypermedia necessitate semantic description of a domain, which is used by an adaptive engine to perform adaptation to a learner. The bottleneck of adaptive hypermedia is manual authoring of such semantic description performed by a domain expert mainly due to the amount of descriptions to be created. In this paper we present a m...
Conference Paper
In order to compute page rankings, search algorithms primarily utilize information related to page content and link structure. Microblog as a phenomenon of today provides additional, potentially relevant, information --- short messages often containing hypertext links to web resources. Such source is particularly valuable when considering a tempora...
Conference Paper
Full-text available
In recent years we have witnessed expansion of Web 2.0. Its main feature is allowing users' collaboration in content creation using various means, e.g. annotations, discussions, wikis, blogs or tags. This approach has influenced also web-based learning, for which the term "Learning 2.0" has been introduced. In this paper we explore using tags in su...
Conference Paper
Full-text available
We focus on mining relevant information from web pages. Unlike plain text documents, web pages contain another source of potentially relevant information - easily processable mark-up. We propose an approach to keyword extraction that enhances Automatic Term Recognition (ATR) algorithms intended for processing plain text documents with an analysis o...
Article
Full-text available
The current Web has many aspects. It is no longer only a place for content presentation. The Web is more and more a place where we actually spend time performing various tasks, a place where we look for interesting information based on discussions, opinions of others, as well as a place where we spend part of our recreation and leisure time. In add...
Article
Full-text available
In this paper we present a method for adaptive selection of test questions according to the individual needs of students within a web-based educational system. It functions as a combination of three particular methods. The first method is based on the course structure and focuses on the selection of the most appropriate topic for learning. The seco...
Chapter
Full-text available
State-of-the-art learning management systems provide their stake-holders with many features coming from Web 2.0 paradigm, but often ignore the need for personalization and adaptation during the learning. More, learning activities are often fragmented – a student needs to make a decision whether he or she wants to take questions or read explanatory...
Conference Paper
Full-text available
In this paper we present an approach to user modeling based on the domain model that we generate automatically by resource (text) content processing and analysis of associated tags from a social annotation service. User’s interests are modeled by overlaying the domain model – via keywords extracted from resource’s (text) content, and tags assigned...
Conference Paper
Full-text available
Current educational systems use advanced mechanisms for adaptation by utilizing available knowledge about the domain. However, describing a domain area in sufficient detail to allow accurate personalization is a tedious and time-consuming task. Only few works are related to the support of teachers by discovering the knowledge from educational mater...
Conference Paper
Full-text available
To make learning process more effective, the educational systems d eliver content adapted to specific user needs. Adequate personalization requires the domain of learning to be described explicitly in a particular detail, involving relationships between knowledge elements referred to as concepts. Manual creation of necessary annotations is in the c...
Article
In order to make learning process more effective, educational systems tailor learning material according to user needs. Adequate adap-tation requires rich domain description to enable adaptation engines to make at least basic reasoning. State-of-the-art approaches rely mostly on domain experts or teachers who supply an adaptive system with neces-sa...
Article
Full-text available
Adaptive educational systems tailor learning material to user goals, needs and characteristics. While supporting more effective learning, they require semantic descriptions enabling adaptation engines to make at least basic rea-soning. However, creating such descriptions manually is extremely demanding task. The situation is even more complicated w...
Article
To satisfy user's information needs, the most accurate results for entered search query need to be returned. Traditional approaches based on query and resource Bag-Of-Words model comparison are overcome. In order to yield better search results, the role of semantic search is increasing. However, the presence of semantic data is not common as much a...

Network

Cited By