Tamás Váradi

Tamás Váradi
Hungarian Academy of Sciences | HAS

PhD

About

53
Publications
6,795
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
511
Citations
Citations since 2016
5 Research Items
204 Citations
20162017201820192020202120220102030
20162017201820192020202120220102030
20162017201820192020202120220102030
20162017201820192020202120220102030

Publications

Publications (53)
Conference Paper
Full-text available
Over the course of the last few years, lexicography has witnessed the burgeoning of increasingly reliable automatic approaches supporting the creation of lexicographic resources such as dictionaries, lexical knowledge bases and annotated datasets. In fact, recent achievements in the field of Natural Language Processing and particularly in Word Sens...
Conference Paper
Full-text available
Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover,...
Conference Paper
Full-text available
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade h...
Preprint
Full-text available
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade h...
Conference Paper
This paper presents the methods and results of a project that collects and analyses public comments written in response to political posts on Facebook using natural language processing and social psychological methods in order to explore emotional attitudes and social behavior.
Conference Paper
Full-text available
This paper presents the methodology and results of a project for the large-scale analysis of public messages in political discourse on Facebook, the dominant social media site in Hungary. We propose several novel social psychology-motivated dimensions for natural language processing-based text analysis that go beyond the standard sentiment-based an...
Conference Paper
In this paper the first preliminary results of the analysis of marks collected within the tables of META-NET series of Language White Papers of CESAR project languages are demonstrated. Although they are preliminary results, we can consider them useful for showing us where real gaps in language resources and tools can be detected.
Conference Paper
Full-text available
This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative's work throughout Europe in order to boost progress a...
Article
Full-text available
The present article describes a newly launched portal (helyesírás.mta.hu) that employs innovative language technology to provide spelling advice. The article discusses the changing social context in the age of digital communication where, thanks to the Internet and social media, masses of people now have the means to express themselves publicly but...
Conference Paper
Full-text available
The paper reports on the development of the Hungarian Gigaword Corpus, an extended new edition of the Hungarian National Corpus, with upgraded and redesigned linguistic annotation and an increased size of 1.5 billion tokens. Issues concerning the standard steps of corpus collection and preparation are discussed with special emphasis on linguistic a...
Conference Paper
Full-text available
The purpose of this demo is to introduce the linguistic development tool NooJ. The tool has been in development for a number of years and it has a solid community of computational linguists developing grammars in two dozen languages ranging from Arabic to Vietnamese 1 . Despite its manifest capabilities and reputation, its appeal within the wider H...
Conference Paper
Full-text available
The purpose of this demo is to introduce the Language Resources, Tools and Services (LRTS) that are being prepared within the Central and South-East European Resources (CESAR) project. To the computational linguistic community the languages covered by CESAR (Bulgarian, Croatian, Hungarian, Polish, Serbian and Slovakian) were so far considered under...
Article
Full-text available
The paper intends to give a brief summary of one the most recent efforts on building the pan-European language technology infrastructure: META-NET – a network of Excellence consisting of 54 research centres from 33 countries – and specifically, its Central and South-European participating project: CESAR. One of the major activities of the project i...
Article
The paper investigates the lexical and syntactic properties of the conversational modules of the Budapest Sociolinguistic Interview using basic language technology methods and tries to capture the differences between spoken and written language use with quantitative measures. The analysis compares the spoken language corpus with automatically annot...
Conference Paper
Full-text available
Abstract In this paper the first preliminary results of the analysis of marks collected within the tables of META-NET series of Language White Papers of CESAR project languages are demonstrated. Although they are preliminary results, we can consider them useful for showing us where real gaps in language resources and tools can be detected. Keywords...
Conference Paper
Full-text available
Abstract Currently, research infrastructures are being designed and established in many disciplines, all partly to address the problem that they all suffer from an enormous fragmentation of their resources and tools. In the domain of language resources and tools the CLARIN initiative has been funded since 2008 to overcome many of the integration an...
Article
Full-text available
In the present paper we intend to investigate to what extent use of parallel corpora can help to eliminate some of the difficulties noted with bilingual dictionaries. The particular issues addressed are the bidirectionality of translation equivalence, the coverage of multiword units, and the amount of implicit knowledge presupposed on the part of t...
Conference Paper
Full-text available
Abstract This paper gives an overview of the CLARIN project [1], which aims to create a research infrastructure that makes language resources and technology (LRT) available and readily usable to scholars of all disciplines, in particular the humanities and social sciences (HSS).
Article
Full-text available
This paper presents a complete outline of the results of the Hungar-ian WordNet (HuWN) project: the construction process of the general vocabu-lary Hungarian WordNet ontology, its validation and evaluation, the construc-tion of a domain ontology of financial terms built on top of the general ontol-ogy, and two practical applications demonstrating t...
Conference Paper
The paper argues for the viability and utility of partial machine translation (MT) in multilingual information systems. The notion of partial MT is modelled on partial parsing and involves a bottomup pattern matching approach where the finite-state transducers assign translation equivalents locally. The article focuses on the linguistic underpinnin...
Article
The present paper reports on an attempt to annotate noun phrases in Hungarian using cascaded regular grammars. Hungarian presents several difficulties to shallow parsing such as discourse oriented constituent order as well as left-branching recursive possessive and participle structure inside noun phrases. The approach uses cascaded regular grammar...
Article
Full-text available
The present paper shows how an aligned parallel corpus can be used to investigate the consistency of translation equivalence across the two languages in a parallel corpus. The particular issues addressed are the bidirectionality of translation equivalence, the coverage of multiword units, and the amount of implicit knowledge presupposed on the part...
Conference Paper
Full-text available
In the scope of the TELRI concerted action a working group is investigating the formation of a tool catalogue and repository. The idea is similar to that of the ACL Natural Language Software Registry, but the contents should be mostly limited to corpus processing tools available free of cost for research use. The catalogue should also offer a help-...
Article
Bilingual parallel corpora offer a treasure house of human translator's knowledge of the correspondences between the two languages. Extracting by automatic means the translation equivalents deemed accurate and contextually appropriate by a human translator is of great practical importance for various fields such as example-based machine translation...
Article
The purpose of this paper is to describe a new version of the Spoken English Corpus which will be of interest to phoneticians and other speech scientists. The Spoken English Corpus is a well-known collection of spoken-language texts that was collected and transcribed in the 1980's in a joint project involving IBM UK and the University of Lancaster...
Article
Full-text available
The paper reports on the development of the Hungarian National Corpus, which was completed at the end of 2001 after four years' effort. The HNC is designed to be a balanced reference corpus of current written Hungarian consisting of 150 million words. The paper first discusses basic design issues concerning the composition of the corpus. The HNC ad...

Network

Cited By

Projects