Rafael C. Carrasco

Rafael C. Carrasco
  • Computer Science & Physics
  • University of Alicante

About

106
Publications
12,327
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,398
Citations
Current institution
University of Alicante
Additional affiliations
January 2005 - December 2010
University of Alicante
January 1988 - December 1991
University of Valencia

Publications

Publications (106)
Article
Full-text available
Diversity indices have been traditionally used to capture the biodiversity of ecosystems by measuring the effective number of species or groups of species. In contrast to abundance, which grows with the amount of data available and is sensitive to the appearance of small groups, diversity indices provide a more robust indicator on the variability o...
Preprint
Full-text available
Diversity indices have been traditionally used to capture the biodiversity of ecosystems by measuring the effective number of species or groups of species. In contrast to abundance, which is correlated with the amount of data available, diversity indices provide a more robust indicator on the variability of individuals. These types of indices can b...
Article
Full-text available
For some decades now, galleries, libraries, archives, and museums (GLAM) institutions have provided access to information resources in digital format. Although some datasets are openly available, they are often not used to their full potential. Recently, approaches such as the so‐called Labs within GLAM institutions promote the reuse of digital col...
Article
Cultural heritage institutions have recently started to share their metadata as Linked Open Data (LOD) in order to disseminate and enrich them. The publication of large bibliographic data sets as LOD is a challenge that requires the design and implementation of custom methods for the transformation, management, querying and enrichment of the data....
Preprint
Translation models based on hierarchical phrase-based statistical machine translation (HSMT) have shown better performances than the non-hierarchical phrase-based counterparts for some language pairs. The standard approach to HSMT learns and apply a synchronous context-free grammar with a single non-terminal. The hypothesis behind the grammar refin...
Conference Paper
Full-text available
The normalization of edit sequences sorts the operations according to the document position they modify instead of the instant when they were generated. The stress on the spatial distribution of operations simplifies the analysis of conflictive types of operations and provides an alternative formalism-also within the operational transformation sche...
Article
Cultural heritage institutions have recently begun to consider the benefits of sharing their collections using linked open data to disseminate and enrich their metadata. As datasets become very large, challenges appear, such as ingestion, management, querying and enrichment. Furthermore, each institution has particular features related to important...
Article
Full-text available
This paper presents a new method with which to assist individuals with no background in linguistics to create monolingual dictionaries such as those used by the morphological analysers of many natural language processing applications. The involvement of non-expert users is especially critical for under-resourced languages which either lack or canno...
Article
The catalogue of the Biblioteca Virtual Miguel de Cervantes contains about 200,000 records which were originally created in compliance with the MARC21 standard. The entries in the catalogue have been recently migrated to a new relational database whose data model adheres to the conceptual models promoted by the International Federation of Library A...
Article
Bibliographic collections in traditional libraries often compile records from distributed sources where variable criteria have been applied to the normalization of the data. Furthermore, the source records often follow classical standards, such as MARC21, where a strict normalization of author names is not enforced. The identification of equivalent...
Conference Paper
The 200,000 records in the catalogue of the Biblioteca Virtual Miguel de Cervantes have been migrated to a new relational database whose data model adheres to the FRBR and FRAD specifications. The database content has been later mapped to RDF triples which employ the RDA vocabulary to describe the entities, as well as their properties and relations...
Conference Paper
The BVC section of the impact-es diachronic corpus of historical Spanish compiles 86 books —containing approximately 2 million words. About 27% of the words —providing a representative coverage of the most frequent word forms— have been annotated with their lemma, part of speech, and modern equivalent following the Text Encoding Initiative guidelin...
Article
This paper describes an open-source tool which computes statistics of the differences between a reference text an the output of an OCR engine. It also facilitates the spotting of mismatches by generating an aligned bitext where the differences are highlighted and cross linked. The tool accepts a variety of input formats (both for the reference and...
Article
Full-text available
Large bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is exploited. For example, a bitext can be seen as a sequence of biwords ---pairs of parallel words with a high probability of...
Article
Full-text available
The impact-es diachronic corpus of historical Spanish compiles over one hundred books —containing approximately 8 million words— in addition to a complementary lexicon which links more than 10 thousand lemmas with attestations of the different variants found in the documents. This textual corpus and the accompanying lexicon have been released under...
Article
Full-text available
The impact-es diachronic corpus of historical Spanish compiles over one hundred books—containing approximately 8 million words—in addition to a complementary lexicon which links more than 10,000 lemmas with attestations of the different variants found in the documents. This textual corpus and the accompanying lexicon have been released under an ope...
Article
Full-text available
We compare different strategies to apply statistical machine translation techniques in order to retrieve documents that are a plausible translation of a given source document. Finding the translated version of a document is a relevant task; for example, when building a corpus of parallel texts that can help to create and evaluate new machine transl...
Article
Full-text available
It often occurs that local copies of a text are modified by users but that the local modifications are not synchronized (thus allowing the merged text to become the source for later editions) until later when, for instance the network connection is reestablished. Since text editions usually affect a small fraction of the whole content, the history...
Article
Although optimal staff scheduling often requires elaborate computational methods, those cases which are not highly constrained can be efficiently solved using simpler approaches. This paper describes how a simple procedure, combining random and greedy strategies with heuristics, has been successfully applied in a Spanish hospital to assign guard sh...
Article
We describe a technique that maps unranked trees to arbitrary hash codes using a bottom-up Deterministic Tree Automaton (DTA). In contrast to other hashing techniques based on automata, our procedure builds a pseudo-minimal DTA for this purpose. A pseudo-minimal automaton may be larger than the minimal one accepting the same language but, in turn,...
Article
Full-text available
We describe an algorithm that allows the incremental addition or removal of unranked ordered trees to a minimal frontier-to-root deterministic finite-state tree automaton (DTA). The algorithm takes a tree t and a minimal DTA A as input; it outputs a minimal DTA A′ which accepts the language L(A) accepted by A incremented (or decremented) with the t...
Conference Paper
Full-text available
The amount of information that is stored in digital form in more than one language is growing very fast as a consequence of the globalization. Furthermore, there are countries and supra-national entities whose legislation enforces the translation (and storage) of all the official texts into all their official languages. Two texts that are mutual tr...
Article
Full-text available
Introducción al estándar XML
Conference Paper
Full-text available
A frontier-to-root deterministic finite-state tree automaton (DTA) can be used as a compact data structure to store collections of unranked ordered trees. DTAs are usually sparser than string automata, as most transitions are undefined and therefore, special care must be taken in order to minimize them eciently. However, it is dicult to find simple...
Conference Paper
Full-text available
This paper describes the current status of development of an open-source shallow-transfer machine translation (MT) system for the (European) Portuguese ↔ Spanish language pair, developed using the OpenTrad Apertium MT toolbox (www.apertium.org). Apertium uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech t...
Chapter
Recent work has shown that the extraction of symbolic rules improves the generalization performance of recurrent neural networks trained with complete (positive and negative) samples of regular languages. This paper explores the possibility of inferring the rules of the language when the network is trained instead with stochastic, positive-only dat...
Chapter
Starting from basic couplings of the photons to mesons, nucleons and isobars a microscopic manybody theory is developped which allows one to evaluate different photonuclear reactions at intermediate energies. The theory is applied to obtain the total photonuclear cross section and the separation between absorption and (, ) reaction channels.
Conference Paper
Full-text available
The increase in the amount of data available in digital libraries calls for the development of search engines that allow the users to find quickly and effectively what they are looking for. The XML tagging makes possible the addition of structural information in digitized content. These metadata offer new opportunities to a wide variety of new serv...
Article
In this paper, we describe some techniques to learn probabilistic k-testable tree models, a generalization of the well-known k-gram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here, backing-off schemes can be defined to solve data...
Article
Full-text available
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition, and machine translation are some of them. In Part I of this paper, we sur...
Article
Full-text available
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition or in fields to which pattern recognition is linked. In Part I of this paper, we surveyed these objects and studied their properties. In this Part II, we study the relations between probabilistic finite-state automata and other well-known devices that ge...
Article
Full-text available
Probabilistic k-testable models (usually known as k-gram models in the case of strings) can be easily identified from samples and allow for smoothing techniques to deal with unseen events during pattern classification. In this paper, we introduce the family of stochastic k-testable tree languages and describe how these models can approximate any st...
Article
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked. In part I of this paper, we surveyed these objects and studied their properties. In this part II, we study the relations between probabilistic finite-state automata and other well known devices that g...
Conference Paper
Full-text available
In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is that the model can be updated in an incremental fashio...
Conference Paper
Full-text available
A simple, robust sliding-window part-of-speech tagger is pre- sented and a method is given to estimate its parameters from an un- tagged corpus. Its performance is compared to a standard Baum-Welch- trained hidden-Markov-model part-of-speech tagger. Transformation into a finite-state machine —behaving exactly as the tagger itself— is demon- strated...
Conference Paper
This paper describes the application of a new model to learn probabilistic context-free grammars (PCFGs) from a tree bank corpus. The model estimates the probabilities according to a generalized k-gram scheme for trees.It allows for faster parsing,decreases considerably the perplexity of the test samples and tends to give more structured and refine...
Article
We describe a general approach to compute a similarity measure between distributions generated by probabilistic tree automata that may be used in a number of applications in the pattern recognition field. In particular, we show how this similarity can be computed for families of structured (XML) documents. In such case, the use of regular expressio...
Article
Full-text available
Probabilistic k-testable models (usually known as k-gram models in the case of strings) can be easily identified from samples and allow for smoothing techniques to deal with unseen events. In this paper we introduce the family of stochastic k-testable tree languages and describe how these models can approximate any stochastic rational tree language...
Conference Paper
Full-text available
In a previous work, a new probabilistic context-free gram- mar (PCFG) model for natural language parsing derived from a tree bank corpus has been introduced. The model estimates the probabili- ties according to a generalized k-grammar scheme for trees. It allows for faster parsing, decreases considerably the perplexity of the test samples and tends...
Article
Full-text available
In this paper, we compare three different approaches to build a probabilistic context-free grammar for natural language parsing from a tree bank corpus: 1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule 2) a model that also stores information about the parent node's category and, 3)...
Conference Paper
Full-text available
We propose a new Mgorithm which allows for the identification of any stochastic deterministic regular language as well as the determination of the probabilities of the strings in the language. The algorithm builds the prefix tree acceptor from the sample set and merges systematically equivaJent states. Experimentally, it proves very fast a.ad the t...
Conference Paper
Full-text available
In this paper, we describe a generalization for tree stochastic languages of the k-gram models. These models are based on the k-testable class, a subclass of the languages recognizable by ascending tree automata. One of the advantages of this approach is that the probabilistic model can be updated in an incremental fashion. Another feature is that...
Conference Paper
Full-text available
In this paper, we compare three different approaches to build a probabilistic context-free grammar for natural language parsing from a tree bank corpus: (1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule; (2) a model that also stores information about the parent node’s category, and...
Article
Full-text available
Daciuk et al. [Computational Linguistics 26(1):3–16 (2000)] describe a method for constructing incrementally minimal, deterministic, acyclic finite-state automata (dictionaries) from sets of strings. But acyclic finite-state automata have limitations: For instance, if one wants a linguistic application to accept all possible integer numbers or Inte...
Article
Full-text available
We define deterministic augmented letter transducers (DALTs), a class of finitestate transducers which provide an e#cient way of implementing morphological analysers which tokenize their input (i.e., divide texts in tokens or words) as they analyse it, and show how these morphological analysers may be maintained (i.e., how surface form--lexical for...
Article
Full-text available
Regular tree automata (RTA) or, equivalently, forest regular grammars (FRG) have been recently proposed for use as XML (extended markup language) schemata. They are more powerful than usual XML DTDs (document-type definitions) , make the implementation, optimization and pruning of XML queries easier and allow for the implementation of context-sensi...
Article
Recently, a number of authors have explored the use of recursive recursive neural nets (RNN) for the adaptive processing of trees or tree-like structures. One of the most important language-theoretical formalizations of the processing of tree-structured data is that of finite-state tree automata (FSTA). In many cases, the number of states of a nond...
Article
Full-text available
Recently, a number of authors have explored the use of recursive recursive neural nets (RNN) for the adaptive processing of trees or tree-like structures. One of the most important language-theoretical formalizations of the processing of tree-structured data is that of deterministic finite-state tree automata (DFSTA). DFSTA may easily be realized a...
Article
Full-text available
Finite-state machines are the most pervasive models of computation, not only in theoretical computer science, but also in all of its applications to real-life problems, and constitute the best characterized computational model. On the other hand, neural networks ---proposed almost sixty years ago by McCulloch and Pitts as a simplified model of nerv...
Conference Paper
Full-text available
Recently, a number of authors have explored the use of recursive recursive neural nets (RNN) for the adaptive processing of trees or tree-like structures. One of the most important language-theoretical formalizations of the processing of tree-structured data is that of finite-state tree automata (FSTA). In many cases, the number of states of a nond...
Article
. In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is that the model can be updated in an incremental fash...
Article
Full-text available
In this paper, we compare three different approaches to build a probabilistic context-free grammar for natural language parsing from a tree bank corpus: 1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule 2) a model that also stores information about the parent node's category and, 3)...
Article
Full-text available
There has been a lot of interest in the use of discrete-time recurrent neural nets (DTRNN) to learn finite-state tasks, with interesting results regarding the induction of simple finite-state machines from input–output strings. Parallel work has studied the computational power of DTRNN in connection with finite-state computation. This article descr...
Conference Paper
Full-text available
In many applications, objects are represented by a collection of unorganized points that scan the surface of the object. In such cases, an efficient way of storing this information is of interest. In this paper we present an arithmetic compression scheme that uses a tree representation of the data set and allows for better compression rates than ge...
Article
Full-text available
We generalize a former algorithm for regular language identification from stochastic samples to the case of tree languages. It can also be used to identify context-free languages when structural information about the strings is available. The procedure identifies equivalent subtrees in the sample and outputs the hypothesis in linear time with the n...
Article
Full-text available
Recently, a number of authors have explored the use of tree-walking (also called recursive) neural nets (TWNN) for the adaptive processing of data which present themselves as trees or tree-like structures such as directed acyclic graphs. On the other hand, one of the most important language-theoretical formalizations of the processing of treestruct...
Article
Full-text available
. In recent years, there has been a lot of interest in the use of discrete-time recurrent neural nets (DTRNN) to learn nite-state tasks, and in the computational power of DTRNN, particularly in connection with nite-state computation. This paper describes a simple strategy to devise stable encodings of sequential nite-state translators (SFST) in a s...
Article
Full-text available
In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly with the size of the sample set and a measure of t...
Conference Paper
Full-text available
A number of researchers have used discrete-time recurrent neural nets (DTRNN) to learn finite-state machines (FSM) from samples of input and output strings. Trained DTRNN usually show FSM behaviour for strings up to a certain length, but not beyond; this is usually called instability. Other authors have shown that DTRNN may actually behave as FSM f...
Article
Full-text available
In this paper, we explore the applicability to compression tasks of the algorithms for regular language inference from stochastic samples. We compare two arithmetic encoders based upon two dierent kinds of formal languages: string languages and tree languages. The experiments show that tree-based methods outperform the predictive capability of stri...
Article
Stochastic grammars provide a formal background in order to deal with tasks where a random source of structured data is involved. In particular, stochastic tree grammars can be useful if hierarchical relations are established among the elementary components of the data. Grammatical inference methods are often checked with training samples generated...
Article
Full-text available
. We generalize a former algorithm for regular language identification from stochastic samples to the case of tree languages or, equivalently, string languages where structural information is available. We also describe a method to compute efficiently the relative entropy between the target grammar and the inferred one, useful for the evaluation of...
Article
Full-text available
Works dealing with grammatical inference of stochastic grammars often evaluate the relative entropy between the model and the true grammar by means of large test sets generated with the true distribution. In this paper, an iterative procedure to compute the relative entropy between two stochastic deterministic regular grammars is proposed. Resum'e...
Article
Full-text available
We consider the problem of learning context-free grammars from stochastic structural data. For this purpose, we have developed an algorithm (tlips) which identifies any rational tree set from stochastic samples and approximates the probability distribution of the trees in the language. The procedure identifies equivalent subtrees in the sample and...
Article
. Recent work has shown that the extraction of symbolic rules improves the generalization performance of recurrent neural networks trained with complete (positive and negative) samples of regular languages. This paper explores the possibility of inferring the rules of the language when the network is trained instead with stochastic, positiveonly da...
Article
Full-text available
Recent work has shown that second-order recurrent neural networks (2ORNNs) may be used to infer deterministic finite automata (DFA) when trained with positive and negative string examples. This paper shows that 2ORNN can also learn DFA from samples consisting of pairs (W,μ W ) where W is a noisy string of input vectors describing the degree of rese...
Article
Full-text available
Recent work has shown that the extraction of symbolic rules improves the generalization power of recurrent neural networks trained with complete samples of regular languages. This paper explores the possibility of learning rules when the network is trained with stochastic data. For this purpose, a network with two layers is used. If an automaton is...
Article
The recently introduced algorithm LAESA finds the nearest neighbour prototype in a metric space. The average number of distances computed in the algorithm does not depend on the number of prototypes but it shows linear space and time complexities. In this paper, a new algorithm (TLAESA) is proposed which has a sublinear time complexity and keeps th...
Article
The recently introduced algorithm LAESA finds the nearest neighbour prototype in a metric space. The average number of distances computed in the algorithm does not depend on the number of prototypes but it shows linear space and time complexities. In this paper, a new algorithm (TLAESA) is proposed which has a sublinear time complexity and keeps th...
Article
Full-text available
Recent work has shown that second-order recurrent neural networks (2ORNNs) may be used to infer regular languages. This paper presents a modified version of the real-time recurrent learning (RTRL) algorithm used to train 2ORNNs, that learns the initial state in addition to the weights. The results of this modification, which adds extra flexibility...
Article
A symmetrized version of the Nagendraprasad-Wang-Gupta thinning algorithm (Digital Signal Processing 3(1993)97) is presented, which pro- duces simpler and more elegant skeletons of handwritten characters at zero extra computational cost.
Conference Paper
Recent work has shown that second-order recurrent neural networks (20RNNs) may be used to infer deterministic finite automata (DFA) when trained with positive and negative string examples. This paper shows that 20RNN can also learn DFA from samples consisting of pairs (W,mu(W)) where W is a noisy string of input vectors describing the degree of res...
Article
Differential cross sections of nucleons excited in photonuclear reactions in medium and heavy nuclei are studied by considering all relevant reaction mechanisms leading to the excitation of protons or neutrons. We take advantage of previous microscopic studies for the absorption and scattering of photons and photoproduced pions, and implement a sim...
Book
This volume presents the proceedings of the Second International Colloquium on Grammatical Inference (ICGI-94), held in Alicante, Spain in September 1994. Besides 25 research papers carefully selected and refereed by the program committee, the book contains a survey by E. Vidal. The book is devoted to all those aspects of automatic learning that ex...
Article
A local model for the nuclear medium modifications to the photoproduction of [eta] mesons through the [ital N][sup *](1535) resonance is applied to the study of the inclusive reaction in medium and heavy nuclei. The use of effective Lagrangians and many body quantum theory allows one to incorporate the nuclear decay channels of the [ital N][sup *]...
Article
We develop a local approximation to the Δh model for coherent π0 photoproduction in nuclei which allows one to perform reliable calculations in heavy nuclei where the traditional Δh approach is technically unfeasible. We evaluate the cross section in different nuclei and compare our results with available data in 12C.
Article
We study the contribution to ordinary Compton nuclear scattering of the resonant channel gamma + A --> (A'pi-) --> gamma + A with the pi- bound in the nucleus. We show that the interference of this resonant channel with background amplitudes produces significant peaks in the elastic backward differential cross section as a function of the incoming...
Article
The double-differential cross section for inclusive (γ, π) reactions in nuclei is studied by combining elements of microscopic many-body theories previously developed. Pion production in nuclei changes with respect to the impulse approximation not only because of nuclear modifications to the primary (γ, π) reaction compared to the free case, but mo...
Article
We use a recently developed microscopic approach to photonuclear reactions at intermediate energies in order to calculate the enhancement kappa of the nuclear dipole sum rule over its classical Thomas-Reiche-Kuhn value (60 mb MeV)NZ/A. The difficulties in comparing kappa, evaluated with the double commutator, with the observable photonuclear cross...
Article
Starting from the basic interactions between photons, pions, nucleons and isobars we reconstruct a standard model providing an adequate description of the γN→πN reaction. With this, and the ph, Δh effective interactions used with success in the pion-nuclear reactions, we develop a systematic many-body expansion in the number of ph excitations in th...
Chapter
Using a microscopical many body approach to pion and photonuclear reactions we study the mechanisms of pion and photon absorptions with emphasis on the number of nucleons involved in the genuine absorption process.
Conference Paper
Results of computer simulation of pion production in photonuclear reactions are presented. Differential cross sections and photon absorption cross sections are shown. (AIP)
Article
The problem of inclusive radiative pion capture is reanalyzed from a many-body point of view which allows to investigate effects like the Pauli blocking and the polarization of the medium by the spin-isospin interaction in a systematic way. Standard approximations are improved by means of this method, which is however much simpler technically than...
Chapter
In this work we apply a microscopic many-body approach to photonuclear reactions[1] in order to improve the understanding of the dipole sum rule. At the same time, the sum rule provides us with test of consistency of the underlying theory at low photon energies
Article
Full-text available
A new, robust sliding-window part-of-speech tagger is presented, which itself is an approximation of an existing model, and a method is described to esti-mate its parameters from an untagged corpus. The ap-proximation reduces the memory requirements with-out a significant loss in accuracy. Its performance is compared to that of the original sliding...
Article
Full-text available
We describe a technique that maps unranked trees to their hash codes using a bottom-up deterministic tree automaton (DTA). In contrast to techniques imple-mented with minimal tree automata, our procedure builds a pseudo-minimal DTA. Pseudo-minimal automata are larger than the minimal ones but in turn the map-ping can be arbitrary, so it can be dete...
Article
Full-text available
This paper describes a set of tools and Java classes that allow the Lucene text search engine to use morphological information to index and search; in particular, it describes the use of the linguistic resources developed for the Apertium open-source machine translation platform to extract morphological information while indexing. We describe which...

Network

Cited By