Keyboard layouts: Lessons from the Meꞌphaa and Sochiapam Chinantec designs

To read the full-text of this research, you can request a copy directly from the author.


Introduction Codification represents a major challenge for writers of endangered languages. New technologies render the process of typing on a keyboard more accessible and less expensive than at any previous point in time. In the twenty-first century, widely used writing systems depend on electronic input methods for producing printed or electronic materials. This chapter explores keyboard layout design considerations as they were addressed in the creation of two keyboard layouts for the Latin script-based writing systems serving four languages in the Meꞌphaa language family and Sochiapam Chinantec [cso]. In designing the typing experience for endangered language writers, it was necessary to account for: (a) technical differences encountered across major computer operating systems (OS X and Windows); (b) computing culture issues such as the keyboard layout of the dominant language; (c) keystroke frequency of language specific segments; and (d) Unicode compatibility and input issues related to composite characters. The creation and use of a Unicode keyboard for data input facilitated the involvement of speakers of Meꞌphaa during the data-collection stage of a language documentation project by allowing for Unicode-encoded text documents to be generated by the speakers. Early adaption of digital input methods may prove to better meet the needs of both the speech community and researchers. By giving the speech community a keyboard for its orthography, speakers were given the opportunity to enter into, and use, their language in new technological media and the language domains associated with communicating in those media.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... In general, however, support for these language varieties without a longstanding widespread written tradition remains rare within language technology products, despite early efforts to address this problem, as described in e.g. Paterson (2015). This means there are significant opportunities for language technology products to make a positive difference for their users by adding in support for many more language varieties around the world. ...
... It is also worth pointing out that language communities can, and increasingly do, develop their own third-party Android keyboard applications or other types of language technology, frequently working with fieldwork linguists or other academics collaborating closely with these communities, using the kinds of approaches described in Paterson (2015). Thus, even if a language is not yet included in Gboard, it is still possible for an Android keyboard to be created and distributed for use among the community. ...
... After we've designed a layout and built a language model for a given language variety, we typically run a user study with a number of speakers of the target variety. This helps us make sure that the keyboard that's been built meets the needs of the community, which is critically important, as highlighted by Paterson (2015). These speakers are asked to install a beta version of the keyboard, and answer a survey with a variety of quantitative and qualitative questions to gauge their typing experience. ...
Full-text available
This technical report describes our deep internationalization program for Gboard, the Google Keyboard. Today, Gboard supports 900+ language varieties across 70+ writing systems, and this report describes how and why we have been adding support for hundreds of language varieties from around the globe. Many languages of the world are increasingly used in writing on an everyday basis, and we describe the trends we see. We cover technological and logistical challenges in scaling up a language technology product like Gboard to hundreds of language varieties, and describe how we built systems and processes to operate at scale. Finally, we summarize the key take-aways from user studies we ran with speakers of hundreds of languages from around the world.
... For example, difficulty finding a desired character or having words autocorrected to the wrong language can discourage users from typing in the unsupported language. Paterson (2015) provides an example of a speaker of Me'phaa, a language indigenous to Mexico, who is using a standard Spanish or QWERTY keyboard and needs to spend extra effort to type the letter 'á', which appears in Me'phaa about 17 times more frequently than in Spanish: this user is likely to stick to typing in Spanish. An important step to increasing the amount of content available in a low-resource language is to give users access to specific keyboards for their own language. ...
... Creating mobile keyboards for low-resource languages can empower users to generate more web content in these languages. As Paterson (2015) rightly cautions, "technology in and of itself is not the saviour of an endangered language." However, we have shown that it is relatively simple to scale this technology to many languages, and we hope that users can benefit from these keyboards. ...
Full-text available
We present our approach to automatically designing and implementing keyboard layouts on mobile devices for typing low-resource languages written in the Latin script. For many speakers, one of the barriers in accessing and creating text content on the web is the absence of input tools for their language. Ease in typing in these languages would lower technological barriers to online communication and collaboration, likely leading to the creation of more web content. Unfortunately, it can be time-consuming to develop layouts manually even for language communities that use a keyboard layout very similar to English; starting from scratch requires many configuration files to describe multiple possible behaviors for each key. With our approach, we only need a small amount of data in each language to generate keyboard layouts with very little human effort. This process can help serve speakers of low-resource languages in a scalable way, allowing us to develop input tools for more languages. Having input tools that reflect the linguistic diversity of the world will let as many people as possible use technology to learn, communicate, and express themselves in their own native languages.
... Many researchers [8], [9] discussed how existing keyboard layouts (especially those for mobile devices) were handdesigned with certain usages and behaviors in mind that do not necessarily cover speakers of endangered languages. ...
Conference Paper
In this work, we provide a Genetic-based algorithm that is used to quickly find a placement for a set of objects within a given layout such that access to these objects is optimized. The given layout describes the free locations of the objects and the object handles and the access is done through a corpus of object requests. The proposed algorithm optimizes the placement of the objects by searching through a small fraction of the search space. As a case study, we use the algorithm to find a better placement for the keyboard characters than QWERTY and Dvorak Simplified characters placements. The algorithm finds a placement that is better than both QWERTY and Dvorak Simplified by 32.68% and 15.79% respectively on the training set, and 32.71% and 15.84% respectively on the testing set. This result is achieved after searching through only 500K possible solutions, which is about 1.23 × 10 −19 percent only of the total search space. Both training and testing sets are extracted randomly from TED2013 v1.1 English corpus. Moreover, we release the dataset, code and experimental results on our GitHub repository.
... This process was based on Paterson (2015), who explains that it is increasingly common for endangered language speech communities to play an active role in the documentation, preservation and development of their language. Members of the Mapuche communities collaborated in this research, which enabled them, as stated by Paterson (2015), to contribute their knowledge, experiences and worldviews to the language revitalization effort. The Internet enabled Mapuche speakers and speakers of other endangered languages as well as academics to engage with one another globally rather than as before operating independently in different social circles. ...
The Mapudungun language spoken by the Mapuche people in southern Chile is one of the thousands of severely endangered languages in the world today. Despite efforts by the Mapuche people to support it, the language remains threatened. The study reported here is part of an effort to more effectively support the revitalization of Mapudungun aimed at developing a community- and technology-based model for the teaching of the language. The study was divided into two stages: first, the design and development of the model; second, the evaluation of the model based on quasi-experimental research with a pre-test and a post-test. Then, the participants were surveyed to obtain information regarding their beliefs toward the use of technology in the teaching–learning process of Mapudungun. The study’s results indicate that technology can facilitate Mapuche language learning and revitalization.
Full-text available
The use of computing technologies in less commonly taught language (LCTL) and endangered language (EL) learning is different from mainstream computer-assisted language learning (CALL), where several languages, most noticeably English, dominate the literature. Many most commonly taught language (MCTL) learners learn a language for a variety of reasons including potential benefit to their career or because it is compulsory in school. In the case of LCTLs and ELs, there may be different motivating factors including cultural, heritage, and language preservation reasons (Dörnyei & Schmidt, 2001). As the motivation and learning goals of LCTL and EL leaners are often different to those of MCTL learners, it is reasonable to use different evaluation approaches. This paper looks at the role of qualitative research for Finnish, Runyakitara, Ojibwe, and Ndj bbana and reflects on how it can be useful for understanding CALL outcomes for other LCTLs and ELs.
ResearchGate has not been able to resolve any references for this publication.