Francis Bond

Francis Bond
Nanyang Technological University | ntu · Division of Linguistics and Multilingual Studies

PhD

About

178
Publications
29,687
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,760
Citations
Citations since 2016
26 Research Items
1108 Citations
2016201720182019202020212022050100150
2016201720182019202020212022050100150
2016201720182019202020212022050100150
2016201720182019202020212022050100150
Additional affiliations
June 2009 - present
Nanyang Technological University
Position
  • Professor (Associate)
April 1991 - December 2006
NTT Communication Science Laboratories
Position
  • Senior Researcher

Publications

Publications (178)
Conference Paper
The Princeton WordNet for the English language has been used worldwide in NLP projects for many years. With the OMW initiative, wordnets for different languages of the world are being linked via identifiers. The parallel development and linking allows new multilingual application perspectives. The development of a wordnet for the German language is...
Conference Paper
Full-text available
In this paper we compare Oxford Lexico and Merriam Webster dictionaries with Princeton WordNet with respect to the description of semantic (dis)similarity between polysemous and homonymous senses that could be inferred from them. WordNet lacks any explicit description of polysemy or homonymy, but as a network of linked senses it may be used to comp...
Conference Paper
Full-text available
The Princeton WordNet, while one of the most widely used resources for NLP, has not been updated for a long time, and as such a new project English WordNet has arisen to continue the development of the model under an open-source paradigm. In this paper, we detail the second release of this resource entitled "English WordNet 2020". The work has focu...
Article
Full-text available
Though the interest in use of wordnets for lexicography is (gradually) growing, no research has been conducted so far on equivalence between lexical units (or senses) in inter-linked wordnets. In this paper, we present and validate a procedure of sense-linking between plWordNet and Princeton WordNet. The proposed procedure employs a continuum of th...
Conference Paper
Full-text available
According to George K. Zipf, more frequent words have more senses. We have tested this law using corpora and wordnets of English, Spanish, Portuguese, French, Polish, Japanese, Indonesian and Chinese. We have proved that the law works pretty well for all of these languages if we take - as Zipf did - mean values of meaning count and averaged ranks....
Article
Full-text available
Though the interest in use of wordnets for lexicography is (gradually) growing, no research has been conducted so far on equivalence between lexical units (or senses) in inter-linked wordnets. In this paper, we present and validate a procedure of sense-linking between plWordNet and Princeton WordNet. The proposed procedure employs a continuum of th...
Article
Full-text available
Lexical platform – the first step towards user-centred integration of lexical resources Lexical platform – the first step towards user-centred integration of lexical resources The paper describes the Lexical Platform - a means for lightweight integration of independent lexical resources. Lexical resources (LRs) are represented as web components th...
Article
Full-text available
This study aims to analyze and develop a detailed model of syntax and semantics of passive sentences in standard Indonesian in the framework of Head-Driven Phrase Structure Grammar (HPSG) (Pollard & Sag, 1994; Sag et al., 2003) and Minimal Recursion Semantics (MRS) (Copestake et al., 2005), explicit enough to be interpreted by a computer, focusing...
Conference Paper
We aim to support digital humanities work related to the study of sacred texts. To do this, we propose to build a cross-lingual wordnet within the domain of theology. We target the Collaborative Interlingual Index (CILI) directly instead of each individual wordnet. The paper presents background for this proposal: (1) an overview of concepts relevan...
Conference Paper
We present a database of epigraphs collected with the goal of revealing literary influence as a set of connections between authors over time. We have collected epigraphs from over 12,000 literary works and are in the process of identifying their provenance. The database is released under an open license.
Article
Full-text available
A semantic-based search engine for clinical data would be a substantial aid for hospitals to provide support for clinical practitioners. Since electronic medical records of patients contain a variety of information, there is a need to extract meaningful patterns from the Patient Medical Records (PMR). The proposed work matches patients to relevant...
Research
Full-text available
A semantic-based search engine for clinical data for hospitals to provide support for clinical practitioners.
Conference Paper
In this paper, we combine methods to estimate sense rankings from raw text with recent work on word embeddings to provide sense ranking estimates for the entries in the Open Multilingual Wordnet (OMW).The existing Word2Vec Polyglot2 pre-trained models are only built for single word entries, we, therefore, re-train them with multiword expressions fr...
Conference Paper
Full-text available
The paper presents a feature-based model of equivalence targeted at (manual) sense linking between Princeton WordNet and plWordNet. The model incorporates insights from lexicographic and translation theories on bilingual equivalence and draws on the results of earlier synset level mapping of nouns between Princeton WordNet and plWordNet. It takes i...
Article
Full-text available
This paper explores inter-lingual equivalence from the perspective of linking two large lexicosemantic databases, namely the Princeton WordNet of English and the plWordnet (pl. Słowosiec) of Polish. Wordnets are built as networks of lexico-semantic relations between words and their meanings, and constitute a type of monolingual dictionary cum thesa...
Article
The paper focuses on the issue of creating equivalence links in the domain of bilingual computational lexicography. The existing interlingual links between plWordNet and Princeton WordNet synsets (sets of synonymous lexical units – lemma and sense pairs) are re-analysed from the perspective of equivalence types as defined in traditional lexicograph...
Book
This book constitutes the proceedings of the First International Conference on Language, Data and Knowledge, LDK 2017, held in Galway, Ireland, in June 2017. The 14 full papers and 19 short papers included in this volume were carefully reviewed and selected from 68 initial submissions. They deal with language data; knowledge graphs; applications in...
Article
Full-text available
We want to show how basic copula clauses in Indonesian can be dealt with within the framework of Head Driven Phrase Structure Grammar (HPSG) (Pollard & Sag, 1994). We analyzed three types of basic copula clauses in Indonesian: copula clauses with noun phrase complements (NP) expressing the notions of 'proper inclusion' and 'equation', adjective phr...
Article
Full-text available
In this paper, Chinese curricula in the contexts of China and Singapore on primary level are compared and contrasted by both quantitative (Word Segmenter and Text Analyzer) and qualitative methods (in-depth thematic analysis). The research shows challenges for educational administrators, teachers and other professional staff in Chinese education on...
Book
This book describes the fundamentals of Jacy, an implementation of a Japanese head‐driven phrase structure grammar (HPSG) with many useful linguistic implications. Jacy presents sound information about the Japanese language (syntax, semantics, and pragmatics) based on implementation and tested on large quantities of data. As the grammar development...
Conference Paper
Full-text available
The Open Linguistics Working Group (OWLG) brings together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections. A major outcome of our work is the Linguistic Link...
Conference Paper
Full-text available
This paper describes our attempts to add Indonesian definitions to synsets in the Wordnet Bahasa (Nurril Hirfana Mohamed Noor et al., 2011; Bond et al., 2014), to extract semantic relations between lemmas and definitions for nouns and verbs, such as synonym, hyponym, hypernym and instance hypernym, and to generally improve Wordnet. The original, so...
Article
Full-text available
This paper describes some of our attempts in extending Zhong, a Chinese HPSG shared-grammar. New analyses for two Chinese specific phenomena, reduplication and the SUO-DE structure, are introduced. The analysis of reduplication uses lexical rules to capture both the syntactic and semantic properties (amplification in adjectives and diminishing in v...
Article
Full-text available
The A-NOT-A structure is one way to express polar questions in Mandarin Chinese. The present study provides a constraint-based analysis of A-NOT-A questions in Mandarin Chinese within the framework of HPSG (Pollard & Sag, 1994) and MRS (Copestake et al., 2005). We propose two possible approaches to analysing the A-NOT-A structure — a morphological/...
Article
Full-text available
This paper describes an analysis for possessive idioms in English (e.g. ˋI twiddle my thumbs' ˋˋI am idle''). The analysis relies on matching at the semantic level, to allow for syntactic variation. It has been implemented in the English Resource Grammar, and tested by parsing a subset of the British National Corpus. In addition to the syntactic an...
Conference Paper
Full-text available
Semantic annotated parallel corpora, though rare, play an increasingly important role in natural language processing. These corpora provide valuable data for computational tasks like sense-based machine translation and word sense disambiguation, but also to contrastive linguistics and translation studies. In this paper we present the ongoing develo...
Article
Full-text available
Semantic annotated parallel corpora, though rare, play an increasingly important role in natural language processing. These corpora provide valuable data for computational tasks like sense-based machine translation and word sense disambiguation, but also to contrastive linguistics and translation studies. In this paper we present the ongoing develo...
Article
Full-text available
Wordnets play a central role in many natural language processing tasks. This paper introduces a multilingual editing system for the Open Multilingual Wordnet (OMW: Bond and Foster, 2013). Wordnet development, like most lexicographic tasks, is slow and expensive. Moving away from the original Princeton Wordnet (Fellbaum, 1998) development workflow,...
Conference Paper
This paper introduces our attempts to model the Chinese language using HPSG and MRS. Chinese refers to a family of various languages including Mandarin Chinese, Cantonese, Min, etc. These languages share a large amount of structure, though they may differ in orthography, lexicon, and syntax. To model these, we are building a family of grammars: ZHO...
Conference Paper
Full-text available
This paper presents the creation and the initial stage development of a broadcoverage Indonesian Resource Grammar (INDRA) within the framework of Head Driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994) and Minimal Recursion Semantics (MRS) (Copestake et al., 2005). At the present stage, INDRA focuses on verbal constructions and subcateg...
Conference Paper
Full-text available
There are two primary approaches to the use bilingual dictionary in statistical machine translation: (i) the passive approach of appending the parallel training data with a bilingual dictionary and (ii) the pervasive approach of enforcing translation as per the dictionary entries when decoding. Previous studies have shown that both approaches provi...
Conference Paper
The A-NOT-A structure is one way to express polar questions in Mandarin Chinese. The present study provides a constraint-based analysis of A-NOT-A questions in Mandarin Chinese within the framework of HPSG (Pollard & Sag, 1994) and MRS (Copestake et al., 2005). We propose two possible approaches to analysing the A-NOT-A structure — a morphological/...
Article
Full-text available
This paper outlines the creation of an open combined semantic lexicon as a resource for the study of lexical semantics in the Malay languages (Malaysian and Indonesian). It is created by combining three earlier wordnets, each built using different resources and approaches: the Malay Wordnet (Lim & Hussein 2006), the Indonesian Wordnet (Riza, Budion...
Chapter
We discuss the development of a multilingual lexicon linked to the Suggested Upper Merged Ontology (SUMO) formal ontology. The ontology as well as the lexicon have been expressed in Web Ontology Language (OWL), as well as their original formats, for use on the semantic web and in linked data. We describe the Open Multilingual Wordnet (OMW), a multi...
Chapter
Full-text available
Optimally, a translated text should preserve information while maintaining the writing style of the original. When this is not possible, as is often the case with figurative speech, a common practice is to simplify and make explicit the implications. However, in our investigations of translations from English to another language, English-to-Chinese...
Chapter
Full-text available
We discuss some of the issues in producing sense-tagged parallel corpora: including pre-processing, adding new entries and linking. We have preliminary results for three genres: stories, essays and tourism web pages, in both Chinese and English.
Article
In this paper, we investigate which features are useful for ranking semantic representations of text. We show that two methods of generalization improved results: extended grand-parenting and super-types. The models are tested on a subset of SemCor that has been annotated with both Dependency Minimal Recursion Semantic representations and WordNet s...
Article
We have created an open-source mapping between the SIL's semantic domains (used for rapid lexicon building and organization for under-resourced languages) and WordNet, the standard resource for lexical semantics in natural language processing. We show that the resources complement each other, and suggest ways in which the mapping can be improved ev...
Article
This paper surveys the current state of word-net sense annotated corpora. We look at corpora in any language, and describe them in terms of accessibility and usefulness. We finally discuss possibilities in increasing the interoperability of the corpora, especially across languages.
Article
Full-text available
The Optional Omission of Past Tense (OPT) is prevalent in the colloquial register of Singapore English (SCE). This paper describes the investigation of the OPT phenomenon based on time annotated corpora. The Singapore and Hong Kong version of the International Corpus of English were extended with time annotation for this study. In Singapore English...
Chapter
Full-text available
Princeton WordNet (PWN) is one of the most influential resources for semantic descriptions, and is extensively used in natural language processing. Based on PWN, three Chinese wordnets have been developed: Sinica Bilingual Ontological Wordnet (BOW), Southeast University WordNet (SEW), and Taiwan University WordNet (CWN). We used SEW to sense-tag a...
Chapter
Full-text available
Semantically annotated corpora play an important role in natural language processing. This paper presents the results of a pilot study on building a sense-tagged parallel corpus, part of ongoing construction of aligned corpora for four languages (English, Chinese, Japanese, and Indonesian) in four domains (story, essay, news, and tourism) from the...
Conference Paper
We create an open multilingual wordnet with large wordnets for over 26 languages and smaller ones for 57 languages. It is made by combining wordnets with open licences, data from Wiktionary and the Unicode Common Locale Data Repository. Overall there are over 2 million senses for over 100 thousand concepts, linking over 1.4 million words in hundred...
Article
Full-text available
The noun compound – a sequence of nouns which functions as a single noun – is very common in English texts. No language processing system should ignore expressions like steel soup pot cover if it wants to be serious about such high-end applications of computational linguistics as question answering, information extraction, text summarization, machi...
Conference Paper
This paper presents two procedures for extracting transfer rules from parallel corpora for use in a rule-based Japanese-English MT system. First a "shallow" method where the parallel corpus is lemmatized before it is aligned by a phrase aligner, and then a "deep" method where the parallel corpus is parsed by deep parsers before the resulting predic...
Conference Paper
Full-text available
This paper presents an approach to improving performance of statistical machine translation by automatically creating new training data for difficult to translate phenomena. In particular this contribution is targeted towards tackling the poor performance of a state-of-the-art system on negated sentences. The corpus expansion is achieved by high qu...
Conference Paper
Full-text available
We present a system for cross-lingual parse disambiguation, exploiting the assumption that the meaning of a sentence remains unchanged during translation and the fact that different languages have different ambiguities. We simultaneously reduce ambiguity in multiple languages in a fully automatic way. Evaluation shows that the system reliably disca...
Article
Full-text available
Most nouns must be modified by a numeral-classifier combination when quantified in classifier languages like Chinese and Japa-nese. In this paper, we present a method to generate numeral classifiers using Chinese and Japanese WordNets. We assign synsets from WordNet to each classifier by hand and use a modified algorithm to generate sortal classifi...
Article
Full-text available
This paper surveys currently avail-able wordnets. We measure the ef-fect that license choice has on their us-age, measured by the number of cita-tions. Finally, we discuss methods to make wordnets more generally accessi-ble, starting with a shared online server for freely distributable wordnets.
Conference Paper
This paper presents a procedure for extracting transfer rules for multiword expressions from parallel corpora for use in a rule based Japanese-English MT system. We show that adding the multi-word rules improves translation quality and sketch ideas for learning more such rules.
Article
Full-text available
This paper summarizes ongoing efforts to provide software infrastructure (and methodology) for open-source machine translation that combines a deep semantic transfer approach with advanced stochastic models. The resulting infrastructure combines precise grammars for parsing and generation, a semantic-transfer based translation engine and stochastic...
Article
An update summary should provide a fluent summarization of new information on a time-evolving topic, assuming that the reader has already reviewed older documents or summaries. In 2007 and 2008, an annual summarization evaluation included an update summarization ...
Article
Full-text available
KYOTO is an Asian-European project developing a community platform for modeling knowledge and finding facts across languages and cultures. The platform operates as a Wiki system that multilingual and multi-cultural communities can use to agree on the meaning of terms in specific domains. The Wiki is fed with terms that are automatically extracted f...
Article
The NTU-MC compilation taps on the linguistic diversity of multilingual texts available within Singapore. The current version of NTU-MC contains 375,000 words (15,000 sentences) in 6 languages (English, Chinese, Japanese, Korean, Indonesian and Vietnamese) from 6 language families (Indo-European, Sino-Tibetan, Japonic, Korean as a language isolate,...
Conference Paper
form only given. In this talk, the speaker will measure the reduction in ambiguity that can be gained by using translated text to constrain meanings. Instead of using the translation itself to determine senses, they use a shared hierarchy of word senses: WordNet. Experiments with aligned Chinese, English and Japanese text show a substantial reducti...
Conference Paper
We released Japanese WordNet Version 1.0 in March 2010, and are continuing to enrich the Japanese WordNet in several directions. The current version of the Japanese WordNet is a kind of translation of Princeton WordNet 3.0 and we used WordNets of multiple languages in order to disambiguate Japanese translations. Although the structure is based on P...
Article
Full-text available
This paper reconsiders the task of MRD-based word sense disambiguation, in extending the basic Lesk algorithm to investigate the impact on WSD performance of different tokenisation schemes and methods of definition extension. In experimentation over the Hinoki Sensebank and the Japanese Senseval-2 dictionary task, we demonstrate that sense-sensitiv...
Article
Full-text available
In this article, we investigate the use of semantic information in parse selection. We show that fully disambiguated sense-based semantic features smoothed using ontological information are effective for parse selection. Training and testing was undertaken using definition and example sentences taken from a Japanese dictionary corpus (Hinoki), whic...
Article
Full-text available
This paper describes the compilation of hypernym hierachies from the Japanese Wikipedia (Sumida et al., 2008). It then compares the Wikipedia-derived hypernyms and the lemmas from the Japanese WordNet (Bond et al., 2008; Bond et al., 2009) by determining how many matches there are at which levels. The results show that the two data sources contain...
Article
Full-text available
Large amounts of data are essential for training statistical machine translation systems. In this paper we show how training data can be expanded by paraphrasing one side of a paral-lel corpus. The new data is made by parsing then generating using an open-source, precise HPSG-based grammar. This gives sentences with the same meaning, but with minor...
Article
Full-text available
KYOTO is an Asian-European project developing a community platform for modeling knowledge and finding facts across languages and cultures. The platform operates as a Wiki system that multilingual and multi-cultural communities can use to agree on the meaning of terms in specific domains. The Wiki is fed with terms that are automatically extracted f...
Article
Full-text available
The Japanese WordNet currently has 51,000 synsets with Japanese entries. In this paper, we discuss three methods of extending it: increasing the cover, linking it to examples in corpora and linking it to other resources (SUMO and GoiTaikei). In addition, we outline our plans to make it more useful by adding Japanese defini-tion sentences to each sy...
Article
Full-text available
We present a method for combining two bilingual lexicons to make a third, using one language as a pivot. In this case we combine a Japanese-English lexicon with a Malay-English lexicon, to produce a Japanese-Malay lexicon suitable for use in a machine translation system. Our method diers from previous methods in its use of semantic classes to rank...