Ryan Georgi

Ryan Georgi
University of Washington Seattle | UW · Department of Linguistics

PhD in Computational Linguistics

About

12
Publications
1,609
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
67
Citations
Additional affiliations
June 2012 - present
University of Washington Seattle
Position
  • Predoctoral Research Assistant
June 2009 - September 2016
University of Washington Seattle
Position
  • Research Assistant
February 2008 - September 2008
Microsoft
Position
  • Linguistics Test Engineer

Publications

Publications (12)
Conference Paper
Full-text available
Extracting semi-structured text from scientific writing in PDF files is a difficult task that researchers have faced for decades. In the 1990s, this task was largely a computer vision and OCR problem, as PDF files were often the result of scanning printed documents. Today, PDFs have standardized digital typesetting without the need for OCR, but ext...
Conference Paper
Full-text available
The current release of the ODIN (Online Database of Interlinear Text) database contains over 150,000 linguistic examples, from nearly 1,500 languages, extracted from PDFs found on the web, representing a significant source of data for language research, particularly for low-resource languages. Errors introduced during PDF-to-text conversion or poor...
Article
Full-text available
The majority of the world’s languages have little to no NLP resources or tools. This is due to a lack of training data (“resources”) over which tools, such as taggers or parsers, can be trained. In recent years, there have been increasing efforts to apply NLP methods to a much broader swath of the world’s languages. In many cases this involves boot...
Conference Paper
The majority of the world’s languages have little to no NLP resources or tools. This is due to a lack of training data (“resources”) over which tools, such as taggers or parsers, can be trained. In recent years, there have been increasing efforts to apply NLP methods to a much broader swathe of the worlds languages. In many cases this involves boot...
Conference Paper
Full-text available
In this paper, we will demonstrate a system that shows great promise for creating Part-of-Speech taggers for languages with little to no curated resources available, and which needs no expert involvement. Interlinear Glossed Text (IGT) is a resource which is available for over 1,000 languages as part of the Online Database of INterlinear text (ODIN...
Article
Full-text available
Obtaining syntactic parses is an important step in many NLP pipelines. However, most of the world’s languages do not have a large amount of syntactically annotated data available for building parsers. Syntactic projection techniques attempt to address this issue by using parallel corpora consisting of resource-poor and resource-rich language pairs,...
Article
Full-text available
Syntactic parses can provide valuable information for many NLP tasks, such as machine translation, semantic analysis, etc. However, most of the world's languages do not have large amounts of syntactically annotated corpora available for building parsers. Syntactic projection techniques attempt to address this issue by using parallel corpora between...
Conference Paper
Full-text available
Recent studies have shown the potential benefits of leveraging resources for resource-rich languages to build tools for similar, but resource-poor languages. We examine what constitutes "similarity" by comparing traditional phylogenetic language groups, which are motivated largely by genetic relationships, with language groupings formed by clusteri...
Thesis
Full-text available
In this thesis, we propose that instances of interlinear glossed text (IGT), as found in a wide range of linguistic papers, represent enriched content similar to partially annotated corpora. With such a type of data readily available for many languages for which little to no other data is available, we attempt to create a system which utilizes this...

Network

Cited By