Article

Word Division in Spanish.

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Spanish is a language with very precise and regular orthographic rules. A syllabication algorithm strictly based on syntactic analysis, not requiring any semantic knowledge, is presented and further extended to include hyphenation. Algorithms are presented as pattern matching schemata, and efficient implementations are considered.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The literature on word hyphenation in different languages support this view, such as Spanish [5], Greek [6], Turkish [7], Czechoslovak [2], and Norwegian [8,9]. Differences in typical word lengths across languages are also known to cause challenges with general layout templates [10]. ...
... Clearly, hyphenation is language specific as its rules are tied to languages (see, for instance, approaches documented for Spanish [5], Greek [6], Turkish [7], Czechoslovak [2], and Norwegian [8,9]. Typical approaches involve rules and patterns [12] and hyphenation dictionaries [13]. ...
Article
Full-text available
Many low-vision users adjust the browser zoom level to make text more comfortable to read. Responsive websites attempt to fit the content within the viewport width, but several types of problems can potentially occur; long words may be wider than the viewport and thus partially hidden, they may cause large vertical space breaks, and words may be unnecessarily split or incorrectly split. This study set out to get insight into such word wrapping problems on responsive web pages. To help identify word wrapping issues, the tool HYPHERSPACE was developed. Experiments run on 91 websites suggest that hyphenation-related problems are prevalent. Excessive wrapping of words and overflowing words were the most notable problem on about 90% of the websites. One implication of this study is that web designers need to explicitly design for narrow viewports. The proposed tool can help web designers identify hyphenation problems on responsive web pages viewed with high magnification.
... The work of Mañas [2] is representative for the Spanish language. Their study presents requirements for syllabifying in Spanish, while relating to letters or group of letters. ...
... tsort is the standard UNIX utility written (as in the original program) and 12 times faster when pipes are used. An algorithm for hyphenating Spanish words which can be implemented using just Lex [Ma87] is about half the size and only a factor of two slower when programmed in Convert. A filter for a library application originally programmed in Awk runs six times faster when recast in Convert. ...
Article
String-processing languages have typically been implemented as interpreters, macro gener- ators or program generators. Convert, a language whose main appeal is the ease with which string transformations may be expressed as sets of pattern-matching and substitution rules, was developed initially as an interpreter under a symbol manipulation version of REC. This paper describes the implementation of a machine-code compiler of Convert for the Motorola MC680x0 family of microprocessors. The execution speed of programs produced with this com- piler compares favorably with that of equivalent programs written in other string-processing languages.
Article
In this article an automatic scansion model for fixed-metre Spanish poetry is presented. It is a hybrid model that combines hand-made rules with probabilistic information. Through the set of rules, the model is able to extract the syllabic structure of each word, to classify them as stressed or unstressed and to resolve metrical phenomena such as synaloephas or diaereses. The article is mainly focused on the metrical ambiguities produced by synaloephas: verse lines from which it is possible to derive two or more metrical patterns. This metrical ambiguity is resolved through probabilities, assuming a relation between high probabilities and metricality. The system has been evaluated through more than 1,000 lines extracted from a corpus of Golden-Age Spanish sonnets. An accuracy of 95% has been achieved, resulting in not only considerable progress if we compare it to previous proposals, but also in an adequate way of performing the task when compared to human performance.
Article
Wanneer getikte teks nie in ′n reël pas nie, is dit nodig om die laaste woord in die reel korrek te verdeel. Waar hierdie funksie deur ′n rekenaarprogram beheer word, moet ′n program van ′n stel reels in die besondere taal voorsien word. Afrikaanse taalkundige reëls vir lettergreepverdeling soos ‘verdeel tussen twee betekenisvolle komponente’ kan nie deur rekenaars gebruik word sonder om die uitspraak en betekenis van woorde te enkodeer nie. Waar deurlopende teks geprosesseer word, byvoorbeeld in woordverwerking of setwerk, is so ′n benadering ingewikkeld en onnodig. Ons het reels nodig wat betrekking het op die ortografie, byvoorbeeld: ‘verdeel altyd tussen dubbele konsonante’, maar ook hier is onvermydelike uitsonderinge. Een benadering tot die verskaffing van lettergreepfeite aan rekenaarprogramme is om ′n Afrikaanse woordeboek, met die lettergreepverdelings daarin aangebring, in sy geheel te stoor. Alhoewel hierdie benadering sorg dat alle gewone woorde so geakkommodeer kan word, kan dit nie dien as ′n algemene oplossing nie, want nuwe samestellings wat geskep word, sal probleme veroorsaak. So ′n woordeboek is ′n te duur benadering om tans vir persoonlike rekenaars te ondemeem. ′n Algoritmiese oplossing, met ′n woordeboek wat uitsonderings en spesiale gevalle, soos leenwoorde en eiename bevat, is verkieslik. Die ortografiese vorme van lettergrepe in Afrikaans word omskrywe en lettergrepe word geanaliseer ten einde lettergreepgrense en veelvoudige voor-en agtervoegsels te herken. Laasgenoemde drie elemente word gekombineer in ′n rekenaarprosedure om posisies van lettergreepverdelings in ′n woord te bepaal. Woorde wat nie hierdie reels volg nie, word in interne lyste gekodeer. Voorbeelde van toepassings op sowel ′n woordelys as ′n aaneenlopende teks word gegee.
Article
Full-text available
This article presents a computerized database of words for use in experimental research in cognitive psychology and psycholinguistics. The data are based on the oral vocabulary of 200 Spanish-speaking children aged from 11.16 to 49.16 months. The database includes 15,428 Spanish words (tokens) and comprises 1,259 different words (types). It provides information about age of acquisition, orthography, grammar, semantics, and frequency.
Article
This article presents a computerized database of words for use in experimental research in cognitive psychology and psycholinguistics. The data are based on the oral vocabulary of 200 Spanish-speaking children aged from 11.16 to 49.16 months. The database includes 15,428 Spanish words (tokens) and comprises 1,259 different words (types). It provides information about age of acquisition, orthography, grammar, semantics, and frequency.