Lluís Padró

Lluís Padró
Universitat Politècnica de Catalunya | UPC · Department of Computer Science

PhD in Artficial Intelligence

About

133
Publications
16,974
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,271
Citations

Publications

Publications (133)
Article
Full-text available
In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Althoug...
Article
Full-text available
Textual descriptions of processes are ubiquitous in organizations, so that documentation of the important processes can be accessible to anyone involved. Unfortunately, the value of this rich data source is hampered by the challenge of analyzing unstructured information. In this paper we propose a framework to overcome the current limitations on de...
Chapter
Decision models are strategic for formalizing how data influences the main decisions in a organization. Due to its importance, standard notations like DMN have appeared in recent years, to serve as a central resource for synchronizing the people and systems with respect to decisions. However, the modeling of DMN specifications can be tedious and er...
Chapter
The automatic extraction of formal process information from textual descriptions of processes is a challenging problem, but worth exploring, since it enables organizations to align complementary information that talks about processes. In this paper we continue our previous work on this area, based on defining hierarchical/tree patterns on the depen...
Article
A fundamental problem in conformance checking is aligning event data with process models. Unfortunately, existing techniques for this task are either complex, or can only be applicable to restricted classes of models. This in practice means that for large inputs, current techniques often fail to produce a result. In this paper we propose a method t...
Chapter
Computing a mapping between two process models is a crucial technique, since it enables reasoning and operating across processes, like providing a similarity score between two processes, or merging different process variants to generate a consolidated process model. In this paper we present a new flexible technique for process model mapping, based...
Conference Paper
Organizations often have textual descriptions as a way to document their main processes. These descriptions are primarily used by the company’s personnel to understand the processes, specially for those ones that cannot interpret formal descriptions like BPMN or Petri nets. In this paper we present a technique based on Natural Language Processing a...
Preprint
Text Spotting in the wild consists of detecting and recognizing text appearing in images (e.g. signboards, traffic signals or brands in clothing or objects). This is a challenging problem due to the complexity of the context where texts appear (uneven backgrounds, shading, occlusions, perspective distortions, etc.). Only a few approaches try to exp...
Article
The creation of a process model faces the challenge of constructing a syntactically correct entity which accurately reflects the semantics of the reality, and is understandable. This paper proposes a framework called ${Model Judge}$ , focused towards the two main actors in the process of learning process model creation: novice modellers and instr...
Preprint
Applications such as textual entailment, plagiarism detection or document clustering rely on the notion of semantic similarity, and are usually approached with dimension reduction techniques like LDA or with embedding-based neural approaches. We present a scenario where semantic similarity is not enough, and we devise a neural approach to learn sem...
Chapter
A fundamental problem in conformance checking is aligning event data with process models. Unfortunately, existing techniques for this task are either complex, or can only be applicable to restricted classes of models. This in practice means that for large inputs, current techniques often fail to produce a result. In this paper we propose a method t...
Chapter
The existence of unstructured information that describes processes represents a challenge in organizations, mainly because this data cannot be directly referred into process-aware ecosystems due to ambiguities. Still, this information is important, since it encompasses aspects of a process that are left out when formalizing it on a particular model...
Chapter
Many scene text recognition approaches are based on purely visual information and ignore the semantic relation between scene and text. In this paper, we tackle this problem from natural language processing perspective to fill the gap between language and vision. We propose a post-processing approach to improve scene text recognition accuracy by usi...
Chapter
The effect of digital transformation in organizations needs to go beyond automation, so that human capabilities are also augmented. A possibility in this direction is to make formal representations of processes more accessible for the actors involved. On this line, this paper presents a methodology to transform a formal process description into a c...
Article
Full-text available
Is it possible to identify or measure prepositional meaning? In our article we review a particular case of semantic universe, the verbs of movement in Spanish. In this context, we try to answer positively the initial question and validate a method. From the selection of a corpus of 71,206 prepositional phrases in Spanish, where three prepositions-a...
Chapter
Many current state-of-the-art methods for text recognition are based on purely local information and ignore the semantic correlation between text and its surrounding visual context. In this paper, we propose a post-processing approach to improve the accuracy of text spotting by using the semantic relation between the text and the scene. We initiall...
Preprint
Many scene text recognition approaches are based on purely visual information and ignore the semantic relation between scene and text. In this paper, we tackle this problem from natural language processing perspective to fill the gap between language and vision. We propose a post-processing approach to improve scene text recognition accuracy by usi...
Preprint
Many current state-of-the-art methods for text recognition are based on purely local information and ignore the semantic correlation between text and its surrounding visual context. In this paper, we propose a post-processing approach to improve the accuracy of text spotting by using the semantic relation between the text and the scene. We initiall...
Preprint
Full-text available
Many current state-of-the-art methods for text recognition are based on purely local information and ignore the semantic correlation between text and its surrounding visual context. In this paper, we propose a post-processing approach to improve the accuracy of text spotting by using the semantic relation between the text and the scene. We initiall...
Article
Full-text available
Process model descriptions are an ubiquitous source of information that exists in any organization. To reach different types of stakeholders, distinct descriptions are often kept, so that process understandability is boosted with respect to individual capabilities. While the use of distinct representations allows more stakeholders to interpret proc...
Conference Paper
Full-text available
The Business Process Management (BPM) field focuses in the coordination of labor so that organizational processes are smoothly executed in a way that products and services are properly delivered. At the same time, NLP has reached a maturity level that enables its widespread application in many contexts, thanks to publicly available frameworks. In t...
Conference Paper
With the aim of having individuals from different backgrounds and expertise levels examine the operations in an organization, different representations of business processes are maintained. To have these different representations aligned is not only a desired feature, but also a real challenge due to the contrasting nature of each process represent...
Conference Paper
Full-text available
In this paper an automatic morphology learning system for complex and agglutinative languages is presented. We process complex agglutinative morphology of Indian languages using Adaptor Grammars and linguistic rules of morphology. Adaptor Grammars are a compositional Bayesian framework for grammatical inference, where we define a morphological gram...
Conference Paper
AETAS is an online tool for converting text into RDF linked data with resolution of temporal expressions. AETAS follows fully SOA architecture and is accessible via web-service. It implements a novel approach for semantic representation and linked temporal graphs built from natural language sentences. In this paper, we present a demonstration tool,...
Article
Full-text available
The language used in social media is often characterized by the abundance of informal and non-standard writing. The normalization of this non-standard language can be crucial to facilitate the subsequent textual processing and to consequently help boost the performance of natural language processing tools applied to social media text. In this paper...
Article
Full-text available
Parsers have evolved significantly in the last decades, but currently big and accurate improvements are needed to enhance their performance. ParTes, a test suite in Spanish and Catalan for parsing evaluation, aims to contribute to this situation by pointing to the main factors that can decisively improve the parser performance.
Article
Despite the recent advances in parsing, significant efforts are needed to improve the current parsers performance, such as the enhancement of the argument/adjunct recognition. There is evidence that verb subcategorization frames can contribute to parser accuracy, but a number of issues remain open. The main aim of this paper is to show how subcateg...
Article
This paper presents ParTes, the first test suite in Spanish and Catalan for parsing qualitative evaluation. This resource is a hierarchical test suite of the representative syntactic structure and argument order phenomena. ParTes proposes a simplification of the qualitative evaluation by contributing to the automatization of this task. © 2014 Socie...
Article
Full-text available
This article presents an ensemble parse approach to detecting and selecting high-quality linguistic analyses output by a hand-crafted HPSG grammar of Spanish implemented in the LKB system. The approach uses full agreement (i.e., exact syntacticmatch) along with aMaxEnt parse selection model and a statistical dependency parser trained on the same da...
Conference Paper
Full-text available
In this paper we introduce TweetNorm_es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of...
Article
This work is focused on research in machine learning for coreference resolution. Coreference resolution is a natural language processing task that consists of determining the expressions in a discourse that refer to the same entity. The main contributions of this article are (i) a new approach to coreference resolution based on constraint satisfact...
Article
Full-text available
An overview of the shared task is presented: description, corpora, annotation, preprocess, participant systems and results.
Data
Tweets in Spanish language, annotated for lexical normalization purposes. Created for the tweet normalization challenge at Tweet-Norm 2013.
Article
Full-text available
An overview of the shared task is presented: description, corpora, annotation, preprocess, participant systems and results.
Article
Full-text available
This paper describes research on the effects of PoS tagging as a preprocess for HPSG-based deep parsing in the context of an open-source Spanish treebank development in the DELPH-IN framework. The treebank annotation is performed by hand selecting the proper decisions among the choices proposed by the system and ranked by a statistical module. The...
Conference Paper
Full-text available
This paper describes the participation of RelaxCor in the CoNLL-2011 shared task: "Modeling Unrestricted Coreference in Ontonotes". RELAXCOR is a constraint-based graph partitioning approach to coreference resolution solved by relaxation labeling. The approach combines the strengths of groupwise classifiers and chain formation methods in one global...
Conference Paper
We present a general and simple method to adapt an existing NLP tool in order to enable it to deal with historical varieties of languages. This approach consists basically in expanding the dictionary with the old word variants and in retraining the tagger with a small training corpus. We implement this approach for Old Spanish. The results of a tho...
Article
Full-text available
FreeLing es una librería de código abierto para el procesamiento multilíngüe automático, que proporciona una amplia gama de servicios de análisis lingüístico para diversos idiomas. FreeLing ofrece a los desarrolladores de aplicaciones de Procesamiento del Lenguaje Natural funciones de análisis y anotación lingüística de textos, con la consiguiente...
Article
Full-text available
FreeLing is an open-source open-source multilingual language processing library providing a wide range of language analyzers for several languages. It offers text processing and language annotation facilities to natural language processing application developers, simplifying the task of building those applications. FreeLing is customizable and exte...
Article
Full-text available
KNOW2: Tecnologías de comprensión del lenguaje para el acceso multilingüe a la información orientada a dominios Eneko Agirre Resumen: El objetivo de KNOW2 es avanzar en el desarrollo de un entorno inte-grado que permita la implantación a bajo coste de portales verticales de acceso a la información para dominios concretos. El proyecto tiene una dura...
Conference Paper
Full-text available
This paper presents a constraint-based graph partitioning approach to coreference resolution solved by relaxation labeling. The approach combines the strengths of groupwise classifiers and chain formation methods in one global method. Experiments show that our approach significantly outperforms systems based on separate classification and chain for...
Article
Full-text available
This article presents a new freely available trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia and has been automatically enriched with linguistic information. To our knowledge, this is the largest such corpus that is freely available to the community: In its present version, it contains over 750 million wo...
Article
Full-text available
FreeLing is an open-source multilingual language processing library providing a wide range of language analyzers for several languages. It offers text processing and language annotation facilities to natural language processing application developers, simplifying the task of building those applications. FreeLing is customizable and extensible. Deve...
Conference Paper
Full-text available
This paper presents the development of an open-source Spanish Dependency Grammar implemented in FreeLing environment. This grammar was designed as a resource for NLP applications that require a step further in natural language automatic analysis, as is the case of Spanish-to-Basque translation. The development of wide-coverage rule-based grammars u...
Conference Paper
Full-text available
This article presents a new freely available trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia and has been automatically enriched with linguistic information. To our knowledge, this is the largest such corpus that is freely available to the community: In its present version, it contains over 750 million wo...
Article
Full-text available
FreeLing is an open-source library providing a wide range of language analysis utilities for several different languages. It is intended to provide NLP application developers with any text processing and language annotation tools they may need in order to simplify their development task. Moreover, FreeLing is customizable and extensible. Developers...
Article
Full-text available
El processament computacional de la llengua abraça qualsevol activitat relacionada amb la creació, la gestió i la utilització de tecnologia i de recursos lingüístics. En el pla científic, aquesta activitat és central en disciplines com ara la lingüística de corpus, l’enginyeria lingüística o el processament del llenguatge natural escrit o parlat. E...
Article
Full-text available
L'article fa balanç de la Jornada del Processament Computacional del català. S'hi exposen els objectius, es planteja una visió del processament des dels àmbits de recursos i recerca i el perfil dels participants, es plantegen els principals fils d'argumentació del debat que s'hi va portar a terme i se n'extreuen unes conclusions.
Article
Full-text available
We present the conclusions of the first "Jornada del processament Computacional del Català", held in Barcelona on March 2009
Article
Full-text available
El proyecto KNOW pretende añadir significado, conocimiento y razonamiento a las tecnologías actuales de Procesamiento del Lenguaje Natural. Postprint (published version)
Article
Full-text available
This paper presents an extension to perform Word Sense Disambiguation of an integrated ar-chitecture designed for Semantic Parsing. In the proposed collaborative framework, both tasks are addressed simultaneously. The feasibility and robustness of the proposed architecture for Semantic Parsing have been tested against a well-defined task on Word Se...
Conference Paper
Full-text available
This paper presents a method for Entity Disambiguation in Information Extraction from different sources in the web. Once entities and relations between them are extracted, it is needed to determine which ones are referring to the same real-world entity. We model the problem as a graph partitioning problem in order to combine the available informati...
Article
Full-text available
This paper presents a general method for alias assignment task in information extraction. We compared two approaches to face the problem and learn a classifier. The first one quantifies a global similarity between the alias and all the possible entities weighting some features about each pair alias-entity. The second is a classical classifier where...
Article
Full-text available
This paper describes UPC's participation in the SemEval-2007 task 9 (Màrquez et al., 2007). We addressed all four subtasks using supervised learning. The paper introduces several novel issues: (a) for the SRL task, we propose a novel reranking algorithm based on the re-ranking Perceptron of Collins and Duffy (2002); and (b) for the same task we int...
Conference Paper
Full-text available
In this work an extension of CSSR algorithm using Maxi- mum Entropy Models is introduced. Preliminary experiments to perform Named Entity Recognition with this new system are presented. The Causal State Splitting Reconstruction (CSSR) algorithm (1) infers the causal states of a process from data, building a deterministic automaton that is expected...
Article
Full-text available
CSSR algorithm learns automata representing the patterns of a process from sequential data. This paper studies the applicability of CSSR to some Noun Phrase detection. The ability of the algorithm to capture the patterns behind this tasks and the conditions under which it performs better are studied. Also, an approach to use the acquired models to...
Article
Full-text available
OpenTrad is an operating open-source and transfer-based machine translation system for Spanish, Galician, Catalan and Basque. It can be accessed in the URL www.opentrad.org for translating tex, documents or web pages. Programs and data can be downloaded from SourceForge.
Article
Full-text available
This paper describes version 1.3 of the FreeLing suite of NLP tools. FreeLing was first released in February 2004 providing morpholog-ical analysis and PoS tagging for Catalan, Spanish, and English. From then on, the package has been improved and enlarged to cover more languages (i.e. Italian and Galician) and offer more services: Named entity reco...
Article
Full-text available
The main goal of this work is to compare different methods for building Topic Signatures, which are vectors of weighted words acquired from large corpora. We used two different software tools, ExRetriever [Cuadros et al., 2004] and In-fomap [Dorow and Widdows, 2003], for acquiring Topic Signatures from corpus. Using these tools, we retrieve sense e...
Conference Paper
In this work, Causal-State Splitting Reconstruction algorithm, originally conceived to model stationary processes by learning finite state automata from data sequences, is for the first time applied to NLP tasks, namely Named Entity Recognition. The obtained results are slightly below the best systems presented in CoNLL 2002 shared task, though giv...
Article
Full-text available
The main goal of this work is to compare two methods for building Topic Signatures, which are vectors of weighted words acquired from large corpora. We used two different software tools, ExRetriever and Infomap, for acquiring Topic Signatures from corpus. Using these tools, we retrieve sense examples from large text collections. Both systems constr...
Article
Full-text available
We present the current status of development of an open architecture for the translation from Spanish into Basque. The machine translation architecture uses an open source analyser for Spanish and new modules mainly based on finite-state transducers. The project is integrated in the OpenTrad initiative, a larger governmentfunded project shared amon...
Article
Full-text available
En este artículo presentamos un nuevo sistema para el reconocimiento de nombres propios en español. Este sistema está basado en el algoritmo CSSR (Causal-States Splitting Reconstruction) (Shalizi and Shalizi, 2004) que aprende un autómata de estados finitos partiendo de datos secuenciales. Los resultados obtenidos son ligeramente peores que los mej...
Article
Full-text available
In this paper a comparative study of Automated Text Summarization (TS) Systems is presented. It describes the factors to be taken into account for evaluating those systems and outlines three alternative classifications. The paper provides extensive examples of working TS systems according to their characterizing features, performance, and obtained...
Conference Paper
Full-text available
This paper presents a study aiming to find out the best strategy to develop a fast and accurate HMM tagger when only a limited amount of training material is available. This is a crucial factor when dealing with languages for which small annotated material is not easily available. First, we develop some experiments in English, using WSJ corpus as...
Conference Paper
Word Sense Disambiguation (WSD) systems are usually evaluated by comparing their absolute performance, in a fixed experimental setting, to other alternative algorithms and methods. However, little attention has been paid to analyze the lexical resources and the corpora defining the experimental settings and their possible interactions with the over...