Table 4 - uploaded by Yoshiki Mikami
Content may be subject to copyright.
Selected countries with its richest language diversity in Asian region

Selected countries with its richest language diversity in Asian region

Source publication
Full-text available
The paper gives an overview and evaluation of language resources of Asian languages, in particular of Indonesian official and local languages that are currently used on the Internet. We have collected over 100 million of Asian web pages downloaded from 43 Asian country domains, and analyzed language properties of them. The presence of a language is...

Context in source publication

Context 1
... 2600, only around 51 languages are recognized by Asian governments as official or national language(s) of the country and other languages have been recognized as a language of their home use. Official and national language(s) in selected Asian countries is summarized in Table 4. [2] CIA Fact book as of July 2006 Through the survey, the rich diversity of written pages is found in the country with the richest diversity of languages in the region, in Indonesia. ...

Similar publications

Full-text available
The World Wide Web (Web) is the largest information repository containing billions of interconnected documents (called the web pages) which are authored by billions of people and organizations. The Web is huge, diverse, unstructured or semi structured, dynamic contents, and multilingual nature; make the effectively and efficiently searching informa...
Full-text available
Résumé -Abstract Dans le cadre du projet Papillon qui vise à la construction de bases lexicales multilingues par ac-ceptions, nous avons défini des stratégies pour peupler un dictionnaire pivot de liens interlingues à partir d'une base vectorielle monolingue. Il peut y avoir un nombre important de sens par entrée et donc l'identification des accept...
Full-text available
This paper is concerned with the ways ideologies relating to multilingualism and linguistic diversity influence action and interaction, emphasising the relevance of performance style as an analytic criterion for analysis. It particularly deals with social contexts in which language becomes something that needs to be ‘managed’ or ‘planned’, both con...
Full-text available
The paper for the CHiC pilot lab describes the motivation, tasks, Europeana collections and topics, evaluation measures as well as the submitted and analyzed information retrieval runs. In its first year, CHiC offered three tasks: ad-hoc, which measured retrieval effectiveness according to relevance of the ranked retrieval results (standard 1000 do...
Full-text available
The majority of the published studies on multilingualism concentrate on education, while only a few published papers describe this concept outside the education environment. The current descriptive study uses Makkah City as an example to describe two multilingual phenomena: Umrah and Hajj as examples of permanent and temporary multilingualism pheno...


... (Lauder, 2004: 3-4). Of these 13 languages, only 7 languages have presence on the Internet (Riza 2006). ...
Full-text available
In this paper, we report a survey of lan- guage resources in Indonesia, primarily of indigenous languages. We look at the offi- cial Indonesian language (Bahasa Indone- sia) and 726 regional languages of Indone- sia (Bahasa Nusantara) and list all the available LRs that we can gathered. This paper suggests that the smaller regional languages may remain relatively unstudied, and unknown, but they are still worthy of our attention. Various LRs of these endan- gered languages are being built and col- lected by regional language centers for study and its preservation. We will also briefly report its presence on the Internet.
... (Lauder, 2004: 3-4). Of these 13 languages, only 7 languages have presence on the Internet (Riza 2006). ...
Full-text available
An ontology can be seen as a representa- tion of concepts in a specific domain. Ac- cordingly, ontology construction can be re- garded as the process of organizing these concepts. If the terms which are used to la- bel the concepts are classified before build- ing an ontology, the work of ontology con- struction can proceed much more easily. Part-of-speech (PoS) tags usually carry some linguistic information of terms, so PoS tagging can be seen as a kind of pre- liminary classification to help constructing concept nodes in ontology because features or attributes related to concepts of different PoS types may be different. This paper pre- sents a simple approach to tag domain terms for the convenience of ontology con- struction, referred to as Term PoS (TPoS) Tagging. The proposed approach makes use of segmentation and tagging results from a general PoS tagging software to pre- dict tags for extracted domain specific terms. This approach needs no training and no context information. The experimental results show that the proposed approach achieves a precision of 95.41% for ex- tracted terms and can be easily applied to different domains. Comparing with some existing approaches, our approach shows that for some specific tasks, simple method can obtain very good performance and is thus a better choice.
Conference Paper
Various intangible cultural expressions in Indonesia such as oral traditions and literature are fragile and easily lost. Currently among 726 languages, 146 are endangered. Although several projects have been initiated for cultural preservation, the available technology that could support communication within indigenous communities, as well as with people outside the community, is still very rare in Indonesia. Speech-to-speech translation is a technology that enables communication among people speaking in different languages, and therefore it is significant for indigenous communities to preserve their cultural language and overcome language barriers. This paper presents the earlier step of long-term development of speech-to-speech translation system from Indonesian ethnic languages to other languages (i.e., English/Indonesian), which is a design and collection of graphemically balanced and parallel speech corpora of four Indonesian major ethnic languages: Javanese, Sundanese, Balinese and Bataks.