Robert Dale

Robert Dale
Language Technology Group

Doctor of Philosophy

About

243
Publications
69,623
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,022
Citations
Introduction
I work as an independent consultant, providing expert and unbiased advice in the selection, development and deployment of natural language processing technologies. I write 'Industry Watch', a semi-regular column in the Journal of Natural Language Engineering that explores what's happening in the commercial NLP world. I also produce a 'This Week in NLP' newsletter, which you can sign up for at https://www.language-technology.com/blog.

Publications

Publications (243)
Article
Full-text available
Funding for AI start-ups in general is booming, and natural language processing as a subfield has not missed out. We take a closer look at early-stage funding over the last year—just over US$1B in total—for companies that offer solutions that are based on or make significant use of NLP, providing a picture of what funders think is innovative and ba...
Article
Full-text available
Automated writing assistance – a category that encompasses a variety of computer-based tools that help with writing – has been around in one form or another for 60 years, although it’s always been a relatively minor part of the NLP landscape. But the category has been given a substantial boost from recent advances in deep learning. We review some h...
Article
Full-text available
GPT-3 made the mainstream media headlines this year, generating far more interest than we’d normally expect of a technical advance in NLP. People are fascinated by its ability to produce apparently novel text that reads as if it was written by a human. But what kind of practical applications can we expect to see, and can they be trusted?
Article
Full-text available
It took a while, but natural language generation is now an established commercial software category. It’s commented upon frequently in both industry media and the mainstream press, and businesses are willing to pay hard cash to take advantage of the technology. We look at who’s active in the space, the nature of the technology that’s available toda...
Article
Full-text available
The end of the calendar year always seems like a good time to pause for breath and reflect on what’s been happening over the last 12 months, and that’s as true in the world of commercial NLP as it is in any other domain. In particular, 2019 has been a busy year for voice assistance, thanks to the focus placed on this area by all the major technolog...
Article
Full-text available
It’s now remarkably easy to release to the world a cloud-based application programming interface (API) that provides some software function as a service. As a consequence, the cloud API space has become very densely populated, so that even if a particular API offers a service whose potential value is considerable, there are many other factors that...
Article
Full-text available
The Journal of Natural Language Engineering is now in its 25th year. The editorial preface to the first issue emphasised that the focus of the journal was to be on the practical application of natural language processing (NLP) technologies: the time was ripe for a serious publication that helped encourage research ideas to find their way into real...
Article
Full-text available
The law has language at its heart, so it’s not surprising that software that operates on natural language has played a role in some areas of the legal profession for a long time. But the last few years have seen an increased interest in applying modern techniques to a wider range of problems, so I look here at how natural language processing is bei...
Article
Full-text available
It seems like there’s yet another cloud-based text analytics Application Programming Interface (API) on the market every few weeks. If you’re interested in building an application using these kinds of services, how do you decide which API to go for? In the previous Industry Watch post, we looked at the text analytics APIs from the behemoths in the...
Article
Full-text available
If you’re in the market for an off-the-shelf text analytics API, you have a lot of options. You can choose to go with a major player in the software world, for whom each AI-related service is just another entry in their vast catalogues of tools, or you can go for a smaller provider that focusses on text analytics as their core business. In this fir...
Article
Full-text available
Vastly improved speech recognition, backed by a more slowly improving ability to make sense of the recognized speech, has brought state-of-the-art NLP into our homes in the form of smart speakers and other devices that listen. There’s no doubt these devices can be incredibly useful, but they also may also support incursions into our privacy. We loo...
Article
Full-text available
The commercialisation of natural language processing began over 35 years ago, but it’s only in the last year or two that it’s become substantially more visible, largely because of the intense popular interest in artificial intelligence. So what’s the state of commercial NLP today? We survey the main industry categories of relevance, and offer comme...
Article
Full-text available
We live in a post-truth world. It now matters more whether people think something is true than whether something really is true. This is dangerous, and technology is at least partly to blame. So, as technologists, how can we help to fix this?
Article
Full-text available
By all accounts, 2016 is the year of the chatbot. Some commentators take the view that chatbot technology will be so disruptive that it will eliminate the need for websites and apps. But chatbots have a long history. So what's new, and what's different this time? And is there an opportunity here to improve how our industry does technology transfer?
Article
Full-text available
Ten years ago, Microsoft Word's grammar checker was really the only game in town. The software world, and the world of natural language processing, have changed a lot in that time, so what does the grammar checker marketplace have to offer today?
Article
Full-text available
Machine Translation research suffered a major blow in the 1960s, but it came back with a vengeance. From a commercial point of view, it’s now a mature technology that many Internet users take for granted. We look at where we are now, and consider the scope for new entrants into the market.
Article
Full-text available
With NLP services now widely available via cloud APIs, tasks like named entity recognition and sentiment analysis are virtually commodities. We look at what's on offer, and make some suggestions for how to get rich.
Article
Full-text available
In almost every science fiction movie you’ll see people conversing with machines. Of course, the rise of intelligent personal assistants means you probably do this yourself already. This posting asks: what’s the difference? Also, recent news on Facebook acquisitions, spoken language translation, and sentiment analysis.
Conference Paper
In this paper we present a previously unexplored approach to recognizing the textual extent of temporal expressions. Based on the observation that temporal expressions are syntactic constituents, we use functional dependency relations between tokens in a sentence to determine which words in addition to a trigger word belong to the extent of the exp...
Article
Human speakers generally find it easy to refer to entities in such a way that their hearers can determine who or what is being talked about. In an attempt to model this behaviour, researchers in computational linguistics have explored the development of algorithms that operate in a deliberate manner, choosing attributes of an intended referent on t...
Article
Full-text available
As one of the most well-defined subtasks in Natural Language Generation (NLG), the generation of referring expressions looks like a strong candidate for piloting shared evaluation tasks. Different to other areas of Natural Language Processing, it is still unclear what benefit the introduction of such tasks might have for the field of NLG. Based on...
Conference Paper
Incorrect usage of prepositions and determiners constitute the most common types of errors made by non-native speakers of English. It is not surprising, then, that there has been a significant amount of work directed towards the automated detection and correction of such errors. However, to date, the use of different data sets and different task de...
Conference Paper
Full-text available
Using the example of Murrinh-Patha, Seiss (2011) illustrates how Aus-tralian Aboriginal languages can shed light on the morphology-syntax inter-face: one aspect of their polysynthetic nature is that information often en-coded in phrases and clauses in other languages is instead found in a single morphological word. In this paper, we look at another...
Article
The dissemination of knowledge derived from research and scholarship has a fundamental impact on the ways in which society develops and progresses, and at the same time it feeds back to improve subsequent research and scholarship. Here, as in so many other areas of human activity, the internet is changing the way things work; two decades of emergen...
Conference Paper
Generation Challenges 2011 (GenChal'11) was the fifth round of shared-task evaluation competitions (STECs) involving the generation of natural language. It followed four previous events: the Pilot Attribute Selection for Generating Referring Expressions (ASGRE) Challenge in 2007 which had its results meeting at UCNLG+MT in Copenhagen, Denmark; Refe...
Conference Paper
Full-text available
Hand-crafted approaches to content determination are expensive to port to new domains. Machine-learned approaches, on the other hand, tend to be limited to relatively simple selection of items from data sets. We observe that in time series domains, textual descriptions often aggregate a series of events into a compact description. We present a simp...
Article
Full-text available
Recent years have seen a trend towards em-pirically motivated and more data-driven ap-proaches in the field of referring expression generation (REG). Much of this work has fo-cussed on initial reference to objects in visual scenes. While this scenario of use is one of the strongest contenders for real-world appli-cations of referring expression gen...
Conference Paper
Full-text available
Semantic information retrieval requires that we have a means of capturing the semantics of documents; and a potentially useful feature of the semantics of many documents is the temporal information they contain. In particular, the temporal expressions contained in documents provide important information about the time course of the events those doc...
Conference Paper
Full-text available
Traditional computational approaches to referring expression generation operate in a deliberate manner, choosing the attributes to be included on the basis of their ability to distinguish the intended referent from its distractors. However, work in psycholinguistics suggests that speakers align their referring expressions with those used previously...
Conference Paper
Full-text available
The Big Australian Speech Corpus project incorporates the strategic goals of 30 Chief Investigators from various speech science areas. Speech from 1000 geographically and socially diverse speakers is being recorded using a uniform and automated protocol plus standardized hardware and software to produce a widely applicable and extensible database -...
Conference Paper
Full-text available
In a collocation, the choice of one lexical item depends on the choice made for another. This poses a problem for simple approaches to lex-icalisation in natural language generation sys-tems. In the Meaning-Text framework, recur-rent patterns of collocations have been char-acterised by lexical functions, which offer an elegant way of describing the...
Article
Full-text available
The aim of the Helping Our Own (HOO) Shared Task is to promote the development of automated tools and techniques that can assist authors in the writing task, with a specific focus on writing within the natural language processing community. This paper reports on the results of a pilot run of the shared task, in which six teams participated. We de-s...
Article
Full-text available
Traditional approaches to referring expression generation (REG) have taken as a fundamental requirement the need to distinguish the intended referent from other entities in the context. It seems obvious that this should be a necessary condition for successful reference; but we suggest that a number of recent investigations cast doubt on the signifi...
Conference Paper
Full-text available
We describe the second installment of the Challenge on Generating Instructions in Virtual Environments (GIVE-2), a shared task for the NLG community which took place in 2009--10. We evaluated seven NLG systems by connecting them to 1825 users over the Internet, and report the results of this evaluation in terms of objective and subjective measures.
Conference Paper
Full-text available
A central purpose of referring expressions is to distinguish intended referents from other entities that are in the context; but how is this context determined? This paper draws a distinction between discourse context -other entities that have been mentioned in the dialogue- and visual context -visually available objects near the intended referent....
Conference Paper
Full-text available
Automatically finding email messages that contain requests for action can provide valuable assistance to users who otherwise struggle to give appropriate attention to the actionable tasks in their inbox. As a speech act classification task, however, automatically recognising requests in free text is particularly challenging. The problem is compound...
Conference Paper
Full-text available
In this paper, we propose a new shared task called HOO: Helping Our Own. The aim is to use tools and techniques developed in computational linguistics to help people writing about computational linguistics. We describe a text-to-text generation scenario that poses challenging research questions, and delivers practical outcomes that are useful in th...
Conference Paper
Full-text available
The reliable extraction of knowledge from text requires an appropriate treatment of the time at which reported events take place. Unfortunately, there are very few annotated data sets that support the development of techniques for event time-stamping and tracking the progression of time through a narrative. In this paper, we present a new corpus of...
Conference Paper
Full-text available
Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance, any such disfluencies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies, our particular focus here is the identification and correction of speech repairs using a noisy chann...
Chapter
Full-text available
This paper describes the First Challenge on Generating Instructions in Virtual Environments (GIVE-1). GIVE is a shared task for generation systems which give real-time natural-language instructions to users in a virtual 3D world. These systems are evaluated by connecting users and NLG systems over the Internet. We describe the design and results of...
Conference Paper
Full-text available
Different representational systems permit differing degrees and forms of ambiguity and underspecification in the content they represent. Independently of this observation, a notable feature of natural language as a representational system is that it allows the same content to be expressed in different ways. In this paper, we examine the interaction...
Article
Practitioners and researchers need to stay up-to-date with the latest advances in their fields, but the continual growth in the amount of literature available makes this task increasingly difficult. In this article, we describe the Citation-Sensitive In-Browser Summariser (CSIBS), a new research tool to help manage the literature browsing task. The...
Conference Paper
Full-text available
In this chapter, we take the view that much of the existing work on the generation of referring expressions has focused on aspects of the problem that appear to be somewhat artificial when we look more closely at human-produced referring expressions. In particular, we argue that an over-emphasis on the extent to which each property in a description...
Conference Paper
In abstractive summarisation, summaries can include novel sentences that are generated automatically. In order to improve the grammaticality of the generated sentences, we model a global (sentence) level syntactic structure. We couch statistical sentence generation as a spanning tree problem in order to search for the best dependency tree spanning...
Conference Paper
Full-text available
The GIVE Challenge is a new Internet- based evaluation effort for natural lan- guage generation systems. In this paper, we motivate and describe the software in- frastructure that we developed to support this challenge.
Article
Full-text available
Unrehearsed spoken language often contains many disfluencies. If we want to correctly interpret the content of spoken language, we need to be able to detect these disfluencies and deal with them appropriately. In the work de-scribed here, we use a statistical noisy channel model to detect disfluencies in transcripts of spoken language. Like all sta...
Conference Paper
Full-text available
Under an ARC Linkage Infrastructure, Equipment and Facilities (LIEF) grant, speech science and technology experts from across Australia have joined forces to organise the recording of audio-visual (AV) speech data from representative speakers of Australian English in all capital cities and some regional centres. The Big Australian Speech Corpus (th...
Article
Full-text available
As the complexity and sophistication of document processing tools increases, we can expect to see techniques that go beyond the syntactic and semantic features of documents to consider the more nuanced, context-sensitive aspects of language use that generally fall within the realm of pragmatics. The development of such techniques requires data that...
Conference Paper
Full-text available
In this paper we present the DANTE system, a tagger for temporal expressions in English documents. DANTE performs both recognition and normalization of these expressions in accordance with the TIMEX2 annotation standard. The system is built on modular principles, with a clear separation between the recognition and normalisation components. The inte...
Article
Full-text available
In this paper, we explore a corpus of human-produced referring expressions to see to what extent we can learn the referen-tial behaviour the corpus represents. De-spite a wide variation in the way subjects refer across a set of ten stimuli, we demon-strate that component elements of the re-ferring expression generation process ap-pear to generalise...
Article
Full-text available
We describe the first installment of the Challenge on Generating Instructions in Virtual Environments (GIVE), a new shared task for the NLG community. We motivate the design of the challenge, de- scribe how we carried it out, and discuss the results of the system evaluation.
Conference Paper
Full-text available
Modern digital libraries oer all the hyperlinking possibilities of the World Wide Web: when a reader nds a citation of interest, in many cases she can now click on a link to be taken to the cited work. This paper presents work aimed at providing the same ease of navigation for legacy pdf document collections that were created before the possibility...
Conference Paper
Full-text available
The GIVE Challenge is a recent shared task in which NLG systems are evaluated over the Internet. In this paper, we validate this novel NLG evaluation methodology by comparing the Internet-based results with results we collected in a lab experiment. We find that the results delivered by both methods are consistent, but the Internet- based approach o...
Conference Paper
Full-text available
The amount of scientic material available electronically is forever increasing. This makes reading the published litera- ture, whether to stay up-to-date on a topic or to get up to speed on a new topic, a dicult task. Yet, this is an activity in which all researchers must be engaged on a regular basis. Based on a user requirements analysis, we deve...
Conference Paper
Full-text available
The GIVE Challenge is a new Internet- based evaluation effort for natural lan- guage generation systems. In this paper, we motivate and describe the software in- frastructure that we developed to support this challenge.
Conference Paper
Full-text available
Abstract-like text summarisation requires a means of producing novel summary sen- tences. In order to improve the grammati- cality of the generated sentence, we model a global (sentence) level syntactic struc- ture. We couch statistical sentence genera- tion as a spanning tree problem in order to search for the best dependency tree span- ning a set...
Conference Paper
Full-text available
In the early days of email, widely-used conventions for indicating quoted reply content and email signatures made it easy to segment email messages into their func- tional parts. Today, the explosion of dif- ferent email formats and styles, coupled with the ad hoc ways in which people vary the structure and layout of their messages, means that simp...
Conference Paper
Full-text available
In this paper, we present a study of a large corpus of student logic exercises in which we explore the relationship between two distinct measures of difficulty: the proportion of students whose initial attempt at a given natural language to first-order logic translation is incorrect, and the average number of attempts that are required in order to...
Article
Full-text available
In this paper we describe a six-ways paral-lel public-domain corpus consisting of 2100 United Nations General Assembly Resolu-tions with translations in the six official lan-guages of the United Nations, with an av-erage of around 3 million tokens per lan-guage. The corpus is available in a pre-processed, formatting-normalized TMX for-mat with para...
Article
Full-text available
This paper describes the DSTO/Macquarie University System for Entity Linking (DAMSEL), which competed in the 2009 Text Acquisition Conference Knowledge Base Population task. The system achieves 73.5% accuracy. For a given named entity mention, the system selects a set of candidate entities from the knowledge base and selects the most likely candida...
Article
Full-text available
In this paper, we reect on what we can learn about the processes involved in the generation of referring expres- sions by looking at a corpus of human-produced data. We nd that the data vastly underspecies what might be involved algorithmically, but it does rule out a num- ber of popular algorithms for referring expression gen- eration as candidate...
Article
Full-text available
Practitioners and researchers need to stay up-to-date with the latest advances in their fields, but the constant growth in the amount of literature available makes this task increasingly difficult. We in- vestigated the literature browsing taskvia a user requirements analysis, and identi- fied the information needs that biomed- ical researchers com...
Conference Paper
Full-text available
Abstract-like text summarisation requires a means of producing novel summary sentences. In order to improve the grammaticality of the generated sentence, we model a global (sentence) level syntactic structure. We couch statistical sentence generation as a spanning tree problem in order to search for the best dependency tree spanning a set of chosen...
Conference Paper
Full-text available
Practitioners and researchers need to stay up-to-date with the latest advances in their fields, but the constant growth in the amount of literature available makes this task increasingly difficult. We investigated the literature browsing task via a user requirements analysis, and identified the information needs that biomedical researchers commonly...
Conference Paper
Full-text available
Contemporary speech science is driven by the availability of large, diverse speech corpora. Such infrastructure underpins research and technological advances in various practical, socially beneficial and economically fruitful endeavours, from ASR to hearing prostheses. Unfortunately, speech corpora are not easy to come by because they are both expe...
Article
Full-text available
Large auditory-visual (AV) speech corpora are the grist of modern research in speech science, but no such corpus exists for Australian English. This is unfortunate, for speech science is the brains behind speech technology and applications such as text-to-speech (TTS) synthesis, automatic speech recognition (ASR), speaker recognition and forensic i...
Article
Full-text available
We propose a method for learning dialogue management policies from a fixed data set. The method addresses the challenges posed by Information State Update (ISU)-based dialogue systems, which represent the state of a dialogue as a large set of features, ...
Conference Paper
Full-text available
We are interested in developing a better understanding of what it is that students find difficult in learning logic. We use both natural language and diagram-based methods for teaching students the formal language of first-order logic. In this paper, we present some initial results that demonstrate that, when we look at how students construct diagr...
Conference Paper
Full-text available
In this paper we present a study on the interpretation of weekday names in texts. Our algorithm for assigning a date to a weekday name achieves 95.91% accuracy on a test data set based on the ACE 2005 Training Corpus, outperforming pre- viously reported techniques run against this same data. We also provide the first detailed comparison of various...
Article
Full-text available
There is a prevailing assumption in the litera-ture on referring expression generation that re-lations are used in descriptions only 'as a last resort', typically on the basis that including the second entity in the relation introduces an additional cognitive load for either speaker or hearer. In this paper, we describe an experiemt that attempts t...
Article
Here's a round-up of notable events in the commercial language technology space in the last quarter of 2007, organized by broad application category. A common thread that pops up throughout many of these is the integration of language technology into social networking applications and other related Web 2.0 themes. I'd put my money on this being a h...
Conference Paper
Full-text available
The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of r esearch results, but we believe that it can also be an object o f study and a platform for research in its own right. We describe an enriched and standar...