Julia Hockenmaier

Julia Hockenmaier
  • University of Illinois Urbana-Champaign

About

90
Publications
15,669
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,455
Citations
Current institution
University of Illinois Urbana-Champaign

Publications

Publications (90)
Preprint
Full-text available
Sparse autoencoders (SAEs) have emerged as a workhorse of modern mechanistic interpretability, but leading SAE approaches with top-$k$ style activation functions lack theoretical grounding for selecting the hyperparameter $k$. SAEs are based on the linear representation hypothesis (LRH), which assumes that the representations of large language mode...
Preprint
Full-text available
As language model (LM) outputs get more and more natural, it is becoming more difficult than ever to evaluate their quality. Simultaneously, increasing LMs' "thinking" time through scaling test-time compute has proven an effective technique to solve challenging problems in domains such as math and code. This raises a natural question: can an LM's e...
Preprint
Recent research highlights the challenges retrieval models face in retrieving useful contexts and the limitations of generation models in effectively utilizing those contexts in retrieval-augmented generation (RAG) settings. To address these challenges, we introduce RAG-RL, the first reasoning language model (RLM) specifically trained for RAG. RAG-...
Preprint
Full-text available
First-order logic (FOL) can represent the logical entailment semantics of natural language (NL) sentences, but determining natural language entailment using FOL remains a challenge. To address this, we propose the Entailment-Preserving FOL representations (EPF) task and introduce reference-free evaluation metrics for EPF, the Entailment-Preserving...
Preprint
Full-text available
Step-by-step reasoning is widely used to enhance the reasoning ability of large language models (LLMs) in complex problems. Evaluating the quality of reasoning traces is crucial for understanding and improving LLM reasoning. However, the evaluation criteria remain highly unstandardized, leading to fragmented efforts in developing metrics and meta-e...
Preprint
Full-text available
Interactive agents capable of understanding and executing instructions in the physical world have long been a central goal in AI research. The Minecraft Collaborative Building Task (MCBT) provides one such setting to work towards this goal (Narayan-Chen, Jayannavar, and Hockenmaier 2019). It is a two-player game in which an Architect (A) instructs...
Preprint
Causal probing is an approach to interpreting foundation models, such as large language models, by training probes to recognize latent properties of interest from embeddings, intervening on probes to modify this representation, and analyzing the resulting changes in the model's behavior. While some recent works have cast doubt on the theoretical ba...
Article
Creating consistency among project schedule data, BIM, and payment applications requires activities in a construction schedule to be mapped with the most relevant ASTM Uniformat classifications. To do so, we introduce UniformatBridge, a new transformer-based natural language processing model, that automatically labels activities in a project schedu...
Preprint
Transformer-based encoder-decoder models that generate outputs in a left-to-right fashion have become standard for sequence-to-sequence tasks. In this paper, we propose a framework for decoding that produces sequences from the "outside-in": at each step, the model chooses to generate a token on the left, on the right, or join the left and right seq...
Preprint
Goal-oriented generative script learning aims to generate subsequent steps based on a goal, which is an essential task to assist robots in performing stereotypical activities of daily life. We show that the performance of this task can be improved if historical states are not just captured by the linguistic instructions given to people, but are aug...
Preprint
Full-text available
We consider the problem of human-machine collaborative problem solving as a planning task coupled with natural language communication. Our framework consists of three components -- a natural language engine that parses the language utterances to a formal representation and vice-versa, a concept learner that induces generalized concepts for plans ba...
Article
In construction, schedule mistakes causing delays beyond substantial completion dates cost contractors expensive liquidated damages. Hence, several industry guidelines, such as the DCMA's 14 point assessment, define schedule quality and offer systematic methods for ensuring it. These guidelines list “logic” as an essential control metric, and they...
Preprint
Full-text available
Text-to-Graph extraction aims to automatically extract information graphs consisting of mentions and types from natural language texts. Existing approaches, such as table filling and pairwise scoring, have shown impressive performance on various information extraction tasks, but they are difficult to scale to datasets with longer input texts becaus...
Preprint
Full-text available
The ability to match pieces of code to their corresponding natural language descriptions and vice versa is fundamental for natural language search interfaces to software repositories. In this paper, we propose a novel multi-perspective cross-lingual neural framework for code--text matching, inspired in part by a previous model for monolingual text-...
Preprint
The phrase grounding task aims to ground each entity mention in a given caption of an image to a corresponding region in that image. Although there are clear dependencies between how different mentions of the same caption should be grounded, previous structured prediction methods that aim to capture such dependencies need to resort to approximate i...
Article
Full-text available
The Flickr30k dataset has become a standard benchmark for sentence-based image description. This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated boundi...
Article
This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues. We model the appearance, size, and position of entity bounding boxes, adjectives that contain attribute information, and spatial relationships between pairs of entities connected by verbs or prepositions. We pay...
Article
Full-text available
We compare the effectiveness of four different syntactic CCG parsers for a semantic slot-filling task to explore how much syntactic supervision is required for downstream semantic analysis. This extrinsic, task-based evaluation provides a unique window to explore the strengths and weaknesses of semantics captured by unsupervised grammar induction s...
Conference Paper
Current evaluation metrics for image description may be too coarse. We therefore propose a series of binary forced-choice tasks that each focus on a different aspect of the captions. We evaluate a number of different off-the-shelf image description systems. Our results indicate strengths and shortcomings of both generation and ranking based approac...
Conference Paper
Full-text available
We compare the effectiveness of four different syntactic CCG parsers for a semantic slot-filling task to explore how much syntactic supervision is required for downstream semantic analysis. This extrinsic, task-based evaluation also provides a unique window into the semantics captured (or missed) by unsupervised grammar induction systems.
Conference Paper
Full-text available
Nearly all work in unsupervised grammar induction aims to induce unlabeled dependency trees from gold part-of-speech- tagged text. These clean linguistic classes provide a very important, though unrealistic, inductive bias. Conversely, induced clusters are very noisy. We show here, for the first time, that very limited human supervision (three freq...
Conference Paper
Full-text available
Work in grammar induction should help shed light on the amount of syntactic structure that is discoverable from raw word or tag sequences. But since most current grammar induction algorithms produce unlabeled dependencies, it is difficult to analyze what types of constructions these algorithms can or cannot capture, and, therefore, to identify wher...
Article
The Flickr30k dataset has become a standard benchmark for sentence-based image description. This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains linking mentions of the same entities in images, as well as 276k manually annotated bounding boxes corresponding to each entity. Such annotat...
Article
We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which we show to be at least as beneficial as distributional similarities for two tasks that require semantic inference. To compute these denotational similarities, we construct a denotation grap...
Conference Paper
This paper studies the problem of associating images with descriptive sentences by embedding them in a common latent space. We are interested in learning such embeddings from hundreds of thousands or millions of examples. Unfortunately, it is prohibitively expensive to fully annotate this many training images with ground-truth sentences. Instead, w...
Conference Paper
This paper describes and analyzes our SemEval 2014 Task 1 system. Its features are based on distributional and denotational similarities; word alignment; negation; and hypernym/hyponym, synonym, and antonym relations.
Article
We introduce a novel nonparametric Bayesian model for the induction of Combinatory Categorial Grammars from POS-tagged text. It achieves state of the art performance on a number of languages, and induces linguistically plausible lexicons.
Conference Paper
We propose and study novel text representation features created from parse tree structures. Unlike the traditional parse tree features which include all the attached syntactic categories to capture linguistic properties of text, the new features are solely or primarily defined based on the tree structure, and thus better reflect the pure structural...
Conference Paper
Associating photographs with complete sentences that describe what is depicted in them is a challenging problem. This paper examines how an approach that is inspired by image tagging techniques which can scale to very large data sets performs on this much harder task, and examines some of the linguistic difficulties that this bag-of-words model fac...
Article
The ability to associate images with natural language sentences that describe what is depicted in them is a hallmark of image understanding, and a prerequisite for applications such as sentence-based image search. In analogy to image search, we propose to frame sentence-based image annotation as the task of ranking a given pool of captions. We intr...
Conference Paper
Topic models, which factor each document into different topics and represent each topic as a distribution of terms, have been widely and successfully used to better understand collections of text documents. However, documents are also associated with further information, such as the set of real-world entities mentioned in them. For example, news ar...
Conference Paper
Full-text available
We investigate how novel English-derived words (anglicisms) are used in a German-language Internet hip hop forum, and what factors contribute to their uptake.
Conference Paper
Full-text available
Our system consists of a simple, EM-based induction algorithm (Bisk and Hockenmaier, 2012), which induces a language-specific Combinatory Categorial grammar (CCG) and lexicon based on a small number of linguistic principles, e.g. that verbs may be the roots of sentences and can take nouns as arguments.
Article
This paper presents an approach for learning to translate simple narratives, i.e., texts (sequences of sentences) describing dynamic systems, into coherent sequences of events without the need for labeled training data. Our approach incorporates domain knowledge in the form of preconditions and effects of events, and we show that it outperforms sta...
Article
Full-text available
We present a simple EM-based grammar induction algorithm for Combinatory Categorial Grammar (CCG) that achieves state-of-the-art performance by relying on a minimal number of very general linguistic principles. Unlike previous work on unsupervised parsing with CCGs, our approach has no prior language-specific knowledge, and discovers all categories...
Conference Paper
Full-text available
We study a novel shallow information extraction problem that involves extracting sentences of a given set of topic categories from medical forum data. Given a corpus of medical forum documents, our goal is to extract two related types of sentences that describe a biomedical case (i.e., medical problem descriptions and medical treatment descriptions...
Conference Paper
Full-text available
Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence....
Conference Paper
Crowd-sourcing approaches such as Amazon's Mechanical Turk (MTurk) make it possible to annotate or collect large amounts of linguistic data at a relatively low cost and high speed. However, MTurk offers only limited control over who is allowed to particpate in a particular task. This is particularly problematic for tasks requiring free-form text en...
Conference Paper
Full-text available
We propose and implement a modification of the Eisner (1996) normal form to account for generalized composition of bounded degree, and an extension to deal with grammatical type-raising.
Conference Paper
Full-text available
This paper proposes a novel topic model, Citation-Author-Topic (CAT) model that addresses a semantic search task we define as expert search - given a research area as a query, it returns names of experts in this area. For example, Michael Collins would be one of the top names retrieved given the query Syntactic Parsing. Our contribution in this pap...
Article
Full-text available
Recent work in computer vision has aimed to associate image regions with keywords describing the depicted entities, but actual image 'understanding' would also require identifying their attributes, relations and activities. Since this information cannot be conveyed by simple keywords, we have collected a corpus of "action" photos each associated wi...
Article
Collections of digital pictures are now very common. Collections can range from a small set of family pictures, to the entire contents of a picture site like Flickr. Such collections differ from what one might see if one simply attached a camera to a robot and recorded everything, because the pictures have been selected by people. They are not nece...
Article
Full-text available
It is well known that standard TAG can- not deal with certain instances of long- distance scrambling in German (Rambow, 1994). That CCG can deal with many instances of non-local scrambling in lan- guages such as Turkish has previously been observed (e.g. by Hoffman (1995a) and Baldridge (2002)). We show here that CCG can derive German scrambling ca...
Article
Full-text available
This article presents an algorithm for translating the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations augmented with local and long-range word-word dependencies. The resulting corpus, CCGbank, includes 99.4% of the sentences in the Penn Treebank. It is available from the Linguistic Data Consortium, and has been used...
Article
Full-text available
Unlike homopolymers, biopolymers are composed of specific sequences of different types of monomers. In proteins and RNA molecules, one-dimensional sequence information encodes a three-dimensional fold, leading to a corresponding molecular function. Such folded structures are not treated adequately through traditional methods of polymer statistical...
Article
Full-text available
As the organizers of the ACL 2007 Deep Linguistic Processing workshop (Baldwin et al., 2007), we were asked to discuss our perspectives on the role of current trends in deep linguistic processing for parsing technology. We are particularly interested in the ways in which efficient, broad coverage parsing systems for linguistically expressive gramma...
Article
An important puzzle in structural biology is the question of how proteins are able to fold so quickly into their unique native structures. There is much evidence that protein folding is hierarchic. In that case, folding routes are not linear, but have a tree structure. Trees are commonly used to represent the grammatical structure of natural langua...
Conference Paper
Full-text available
We present an algorithm which creates a German CCGbank by translating the syn- tax graphs in the German Tiger corpus into CCG derivation trees. The resulting cor- pus contains 46,628 derivations, covering 95% of all complete sentences in Tiger. Lexicons extracted from this corpus con- tain correct lexical entries for 94% of all known tokens in unse...
Conference Paper
Full-text available
How can proteins fold so quickly into their unique native structures? We show here that there is a natural analogy between parsing and the protein folding problem, and demonstrate that CKY can find the na- tive structures of a simplified lattice model of proteins with high accuracy.
Conference Paper
Full-text available
This paper presents a corpus-based ac- count of structural priming in human sen- tence processing, focusing on the role that syntactic representations play in such an account. We estimate the strength of struc- tural priming effects from a corpus of spontaneous spoken dialogue, annotated syntactically with Combinatory Catego- rial Grammar (CCG) der...
Article
Full-text available
We demonstrate ways to enhance the coverage of a symbolic NLP system through data-intensive and machine learning techniques, while preserving the advantages of using a principled symbolic grammar formalism. We automatically acquire a large syntactic CCG lexicon from the Penn Treebank and combine it with semantic and morphological information from a...
Article
Full-text available
This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, co-training, in which two parsers are iteratively re-trained on each other's output; and a semi-supervised approach, corrected co-training, in which a human corrects each parser...
Article
Full-text available
This paper shows how to construct semantic representations from the derivations produced by a wide-coverage CCG parser. Unlike the de-pendency structures returned by the parser it-self, these can be used directly for semantic in-terpretation. We demonstrate that well-formed semantic representations can be produced for over 97% of the sentences in u...
Article
Full-text available
This dissertation is concerned with the creation of training data and the development of probability models for statistical parsing of English with Combinatory Categorial Grammar (CCG). Parsing, or syntactic analysis, is a prerequisite for semantic interpretation, and forms therefore an integral part of any system which requires natural language un...
Article
Full-text available
This paper serves as documentation to a unification-based implementation of a Combinatory Categorial Grammar (CCG) of the fragment of English described in Sag and Wasow (1999). The implementation of this grammar was done in the Type Description Language (TDL) using the Linguistic Knowledge Building (LKB) grammar development system, both of which we...
Article
Full-text available
We present a system for automatically identifying PropBank-style semantic roles based on the output of a statistical parser for Combinatory Categorial Grammar.
Article
Full-text available
The model used by the CCG parser of Hockenmaier and Steedman (2002b) would fail to capture the correct bilexical dependencies in a language with freer word order, such as Dutch. This paper argues that probabilistic parsers should therefore model the dependencies in the predicate-argument structure, as in the model of Clark et al. (2002), and define...
Article
Full-text available
We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition, we consider the problem of bootstrapping pa...
Article
Full-text available
This paper describes a wide-coverage statistical parser that uses Combinatory Categorial Grammar (CCG) to derive dependency structures. The parser differs from most existing wide-coverage treebank parsers in capturing the long-range dependencies inherent in constructions such as coordination, extraction, raising and control, as well as the standard...
Article
Full-text available
This paper compares a number of generative probability models for a widecoverage Combinatory Categorial Grammar (CCG) parser. These models are trained and tested on a corpus obtained by translating the Penn Treebank trees into CCG normal-form derivations. According to an evaluation of unlabeled word-word dependencies, our best model achieves a perf...
Article
This paper compares three evaluation metrics for a CCG parser trained and tested on a CCG version of the Penn Treebank. The standard Parseval metrics can be applied to the output of this parser; however, these metrics are problematic for CCG, and a comparison with scores given for standard Penn Treebank parsers is uninformative. As an alternative,...
Article
Full-text available
We demonstrate ways to preserve the advantages of using a symbolic grammar formalism as the basis of an NLP system while enhancing its robustness. We automatically acquire a CCG lexicon, combine it with semantic and morphological information from another hand-built, underspecified lexicon, and integrate it with statistical preprocessing methods.
Article
Full-text available
We present an algorithm which translates the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations. To do this we have needed to make several systematic changes to the Treebank which have to effect of cleaning up a number of errors and inconsistencies. This process has yielded a cleaner treebank that can potentially be use...
Article
Full-text available
This paper presents a statistical parser for a wide-coverage Combinatory Categorial Grammar (CCG) derived from the Penn Treebank. The Treebank is translated to a corpus of canonical CCG derivations. We de ne a generative statistical model over CCG derivations and train it on the transformed Treebank.
Article
Palmer ([4]) demonstrated how Brill's Transformation-based Error-Driven Learning can be applied to word segmentation in various languages. We present experimental results which show that such algorithms can achieve satisfactory performance even with a a very simple initial state annotator We also present two preliminary studies, which suggest that...

Network

Cited By