James H. Martin

James H. Martin
University of Colorado Boulder | CUB · Department of Computer Science (CS)

University of California at Berkeley

About

119
Publications
108,955
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,888
Citations
Citations since 2017
14 Research Items
4230 Citations
20172018201920202021202220230200400600
20172018201920202021202220230200400600
20172018201920202021202220230200400600
20172018201920202021202220230200400600
Additional affiliations
January 2004 - December 2010
University of Colorado at Boulder
January 1995 - present
University of Colorado Colorado Springs

Publications

Publications (119)
Preprint
Full-text available
Transcripts of teaching episodes can be effective tools to understand discourse patterns in classroom instruction. According to most educational experts, sustained classroom discourse is a critical component of equitable, engaging, and rich learning environments for students. This paper describes the TalkMoves dataset, composed of 567 human-annotat...
Article
In times of mass emergency, vast amounts of data are generated via computer-mediated communication (CMC) that are difficult to manually cull and organize into a coherent picture. Yet valuable information is broadcast, and can provide useful insight into time- and safety-critical situations if captured and analyzed properly and rapidly. We describe...
Chapter
Full-text available
Inclusion in mathematics education is strongly tied to discourse rich classrooms, where students ideas play a central role. Talk moves are specific discursive practices that promote inclusive and equitable participation in classroom discussions. This paper describes the development of the TalkMoves application, which provides teachers with detailed...
Article
In this paper we present Uniform Meaning Representation (UMR), a meaning representation designed to annotate the semantic content of a text. UMR is primarily based on Abstract Meaning Representation (AMR), an annotation framework initially designed for English, but also draws from other meaning representations. UMR extends AMR to other languages, p...
Preprint
Full-text available
TalkMoves is an innovative application designed to support K-12 mathematics teachers to reflect on, and continuously improve their instructional practices. This application combines state-of-the-art natural language processing capabilities with automated speech recognition to automatically analyze classroom recordings and provide teachers with pers...
Preprint
Full-text available
Event coreference continues to be a challenging problem in information extraction. With the absence of any external knowledge bases for events, coreference becomes a clustering task that relies on effective representations of the context in which event mentions appear. Recent advances in contextualized language representations have proven successfu...
Presentation
Full-text available
Explanation of the ClearEarth project
Article
Full-text available
The cTAKES package (using the ClearTK Natural Language Processing toolkit Bethard et al. 2014,http://cleartk.github.io/cleartk/) has been successfully used to automatically read clinical notes in the medical field (Albright et al. 2013, Styler et al. 2014). It is used on a daily basis to automatically process clinical notes and extract relevant inf...
Conference Paper
Full-text available
Report on Results of a Hackathon to Progress with the Training Resources for Natural Language Processing (NLP) in Ecology
Poster
Full-text available
This project is funded by NSF-Award ACI 1443085. ClearEarth aims to bring semantic technologies from the biomedical field into the earth-surface earth, ice and life sciences. The products will be applied to operations such as query and reasoning.
Article
Full-text available
Efficient learning from Web resources can depend on accurately assessing the quality of each resource. We present a methodology for developing computational models of quality that can assist users in assessing Web resources. The methodology consists of four steps: 1) a meta-analysis of previous studies to decompose quality into high-level dimension...
Article
Full-text available
Objective To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components. Methods Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information...
Conference Paper
Full-text available
Disaster-related research in human-centered computing has typically focused on the shorter-term, emergency period of a disaster event, whereas effects of some crises are long-term, lasting years. Social media archived on the Internet provides researchers the opportunity to examine societal reactions to a disaster over time. In this paper we examine...
Conference Paper
For the first time natural language processing approaches are applied on a large scale to psychometric methods. Psychometric methods have been applied in hundreds of thousands of published studies. This study examines automated approach to discovering behavioral knowledge that are encoded as constructs in social and behavioral science disciplines....
Article
Full-text available
This research proposes and evaluates a linguistically motivated approach to extracting temporal structure from text. Pairs of events in a verb-clause construction were considered, where the first event is a verb and the second event is the head of a clausal argument to that verb. All pairs of events in the TimeBank that participated in verb-clause...
Conference Paper
Full-text available
Assessing the quality of online educational resources in a cost effective manner is a critical issue for educational digital libraries. This study reports on the approach for extending the Open Educational Resource Assessments (OPERA) algorithm from assessing vetted to peer-produced content. This article reports details of changes to the algorithm,...
Article
Full-text available
The Multi-source Integrated Platform for Answering Clinical Questions (MiPACQ) is a QA pipeline that integrates a variety of information retrieval and natural language processing systems into an extensible question answering system. We present the system's architecture and an evaluation of MiPACQ on a human-annotated evaluation dataset based on the...
Article
Full-text available
We present a vision of the future of emergency management that better supports inclusion of activities and information from members of the public during disasters and mass emergency events. Such a vision relies on integration of multiple subfields of computer science, and a commitment to an understanding of the domain of application. It supports th...
Conference Paper
Full-text available
We present the software architecture for a coming commu­nity resource, the Multi-source Integrated Platform for An­swering Clinical Questions (MiPACQ). This system is de­signed to capitalize on state-of-the-art semantic annotation of text to answer complex clinical practice questions and to enable clinical investigators to perform pioneering data m...
Article
Full-text available
Disease progression and understanding relies on temporal concepts. Discovery of automated temporal relations and timelines from the clinical narrative allows for mining large data sets of clinical text to uncover patterns at the disease and patient level. Our overall goal is the complex task of building a system for automated temporal relation disc...
Article
Full-text available
This paper describes a new method for recognizing whether a student's response to an automated tutor's question entails that they understand the concepts being taught. We demonstrate the need for a finer-grained analysis of answers than is supported by current tutoring systems or entailment databases and describe a new representation for reference...
Conference Paper
Recent years have seen the rise of subject-themed digital li- braries, such as the NSDL pathways and the Digital Library for Earth System Education (DLESE). These libraries often need to manually verify that contributed resources cover top- ics that t within the theme of the library. We show that such scope judgments can be automated using a combin...
Conference Paper
Full-text available
With the rise of community-generated web content, the need for automatic characterization of resource quality has grown, particularly in the realm of educational digital libraries. We demonstrate how identifying concrete factors of quality for web-based educational resources can make machine learning approaches to automating quality characterizatio...
Conference Paper
Full-text available
With the rise of community-generated web content, the need for automatic assessment of resource quality has grown, par- ticularly in the realm of educational digital libraries. We demonstrate how developing a concrete denition of qual- ity for web-based resources can make machine learning ap- proaches to automating quality assessment tractable. Usi...
Conference Paper
Full-text available
Psycholinguistic studies of metaphor process- ing must control their stimuli not just for word frequency but also for the frequency with which a term is used metaphorically. Thus, we consider the task of metaphor fre- quency estimation, which predicts how often target words will be used metaphorically. We develop metaphor classifiers which represen...
Conference Paper
Full-text available
This paper describes a personalization approach for using online resources in digital libraries to support intentional learning. Personalized resource recommendations are made based on what learners currently know and what they should know within a targeted domain to support their learning process. We use natural language processing and graph based...
Article
Full-text available
This paper describes our progress towards automating adaptive personalized instruction based on student conceptual understandings using digital libraries. The reported approach merges conversational learning theory with advances in natural language processing to enable personalized pedagogical interactions. Multi-document summarization techniques...
Conference Paper
Personal name ambiguity is common in the fast growing web resource. This paper explores robust features for web personal name disambiguation, which is totally unsupervised and is not limited to the given web corpus. The experiments show that the broad features not only can improve the performance, but also increase the robustness of a disambiguatio...
Conference Paper
Full-text available
This paper presents a process for automatically extracting a fine-grained semantic representation of a learner’s response to a tutor’s question. The representation can be extracted using available natural language processing technologies and it allows a detailed assessment of the learner’s understanding and consequently will support the evaluation...
Chapter
The need for soft computing technologies to facilitate effective automated tutoring is pervasive – from machine learning techniques to predict content significance and generate appropriate questions, to interpretation of noisy spoken responses and statistical assessment of the response quality, through user modeling and determining how best to resp...
Article
Full-text available
Automatic semantic role labeling (SRL) is a natural language processing (NLP) technique that maps sentences to semantic representations. This technique has been widely studied in the recent years, but mostly with data in newswire domains. Here, we report on a SRL model for identifying the semantic roles of biomedical predicates describing protein t...
Conference Paper
While recent corpus annotation efforts cover a wide variety of semantic structures, work on temporal and causal relations is still in its early stages. Annotation efforts have typically considere d either temporal relations or causal relations, but not bot h, and no corpora currently exist that allow the relation between temporals and causals to be...
Conference Paper
Full-text available
This paper summarizes the annotation of fine-grained entailment relationships in the context of student answers to science assessment questions. We annotated a corpus of 15,357 answer pairs with 145,911 fine-grained entailment relationships. We provide the rationale for such fine-grained analysis and discuss its perceived benefits to an Intelligent...
Conference Paper
Full-text available
This paper describes an extractive summarizer for educational science content called COGENT. COGENT extends MEAD based on strategies elicited from an empirical study with domain and instructional experts. COGENT implements a hybrid approach inte- grating both domain independent sentence scoring features and domain-aware features. Initial evaluation...
Conference Paper
Full-text available
Finding temporal and causal relations is cru- cial to understanding the semantic structure of a text. Since existing corpora provide no parallel temporal and causal annotations, we annotated 1000 conjoined event pairs, achiev- ing inter-annotator agreement of 81.2% on temporal relations and 77.8% on causal re- lations. We trained machine learning m...
Conference Paper
Full-text available
We present a novel fine-grained semantic rep- resentation of text and an approach to con- structing it. This representation is largely extractable by today's technologies and facili- tates more detailed semantic analysis. We dis- cuss the requirements driving the representation, suggest how it might be of value in the automated tutoring domain, and...
Conference Paper
Full-text available
This paper describes the design and evaluation of an extractive summarizer for educational science content called COGENT. COGENT extends MEAD based on strategies elicited from an em- pirical study with science domain and in- structional design experts. COGENT identifies sentences containing pedagogi- cally relevant concepts for a specific sci- ence...
Conference Paper
Full-text available
This paper analyzes the impact of several lexical and gram- matical features in automated assessment of students' fine- grained understanding of tutored concepts. Truly effective dialog and pedagogy in Intelligent Tutoring Systems is only achievable when systems are able to understand the detailed relationships between a student's answer and the de...
Article
Full-text available
We present a domain-independent technique for assessing learners' constructed responses. The system exceeds the accuracy of the majority class baseline by 15.4% and a lexical baseline by 5.9%. The emphasis of this paper is to provide an error analysis of performance, describing the types of errors committed, their frequency, and some issues in thei...
Article
Full-text available
This chapter discusses an emerging theme in supporting effective multimedia learning: developing scalable, cognitively-grounded tools that customize learning interactions for individual students. We discuss the theoretical foundation for expected benefits of customization and current approaches in educational technology that leverage a learner's pr...
Chapter
This chapter discusses an emerging theme in supporting effective multimedia learning: developing scalable, cognitively-grounded tools that customize learning interactions for individual students. We discuss the theoretical foundation for expected benefits of customization and current approaches in educational technology that leverage a learner’s pr...
Conference Paper
Full-text available
We propose and evaluate a linguistically motivated approach to extracting temporal structure necessary to build a timeline. We considered pairs of events in a verb-clause construction, where the first event is a verb and the second event is the head of a clausal argument to that verb. We selected all pairs of events in the TimeBank that participate...
Article
The increasing number of web sources is exacerbating the named-entity ambiguity problem. This paper explores the use of various token-based and phrase-based fea- tures in unsupervised clustering of web pages containing personal names. From these experiments, we find that the use of rich features can significantly improve the disambiguation performa...
Conference Paper
Full-text available
Most semantic role labeling (SRL) research has been focused on training and evaluating on the same corpus. This strategy, although appropriate for initiating research, can lead to overtraining to the particular corpus. This article describes the operation of assert, a state-of-the art SRL system, and analyzes the robustness of the system when train...
Conference Paper
Full-text available
This paper describes the results of a study designed to validate the use of domain competency models to diagnose student scientific misconceptions and to generate personalized instruction plans using digital libraries. Digital library resources provided the content base for human experts to construct a domain competency model for earthquakes and pl...
Conference Paper
Full-text available
The increasing use of large open-domain document sources is exacerbating the problem of ambiguity in named entities. This paper explores the use of a range of syntactic and semantic features in unsupervised clustering of documents that result from ad hoc queries containing names. From these experiments, we find that the use of robust syntactic and...
Conference Paper
Full-text available
We define learning as the generation of meaningful knowledge representations which can be utilized in future decision making. Optimal learning entails that the se knowledge representations be integrated with prior knowledge. In this paper, we introduce a knowledge representation based on an integration of a variety of shallow semantic parsing techn...
Conference Paper
Full-text available
We approached the temporal relation identification tasks of TempEval 2007 as pair-wise classification tasks. We introduced a variety of syntactically and semantically motivated features, including temporal-logic-based features derived from running our Task B system on the Task A and C data. We trained support vector machine models and achieved the...
Conference Paper
We approached the temporal relation identification tasks of TempEval 2007 as pair-wise classification tasks. We introduced a variety of syntactically and semantically motivated features, including temporal-logic-based features derived from running our Task B system on the Task A and C data. We trained support vector machine models and achieved the...
Conference Paper
The increasing number of web sources is exacerbating the named-entity ambiguity problem. This paper explores the use of various token-based and phrase-based features in unsupervised clustering of web pages containing personal names. From these experiments, we find that the use of rich features can significantly improve the disambiguation performanc...
Article
The American Association for Artificial Intelligence, in cooperation with Stan- ford University’s Computer Science De- partment, was pleased to present its 2006 Spring Symposium Series held March 27–29, 2006, at Stanford Univer- sity, California. The titles of the eight symposia were (1) Argumentation for Consumers of Health Care (chaired by Nancy...
Conference Paper
Full-text available
Complex tasks like question answering need to be able to identify events in text and the relations among those events. We show that this event identification task and a related task, identifying the seman- tic class of these events, can both be for- mulated as classification problems in a word-chunking paradigm. We introduce a variety of linguistic...
Article
Full-text available
The natural language processing community has recently experienced a growth of interest in domain independent shallow semantic parsing—the process of assigning a Who did What to Whom, When, Where, Why, How etc. structure to plain text. This process entails identifying groups of words in a sentence that represent these semantic arguments and assigni...
Conference Paper
Full-text available
This paper describes a semantic role la- beling system that uses features derived from different syntactic views, and com- bines them within a phrase-based chunk- ing paradigm. For an input sentence, syn- tactic constituent structure parses are gen- erated by a Charniak parser and a Collins parser. Semantic role labels are assigned to the constitue...
Conference Paper
Full-text available
This paper is organized as follows: In Sections 2-3, we review the details of the vector space model and LSA. In Section 4, we outline our empirical methods. In Section 5, we compare the retrieval performances of LSA and the full-rank vector space model. In Section 6, we evaluate how the performance of LSA depends on its ability to handle synonyms....
Conference Paper
Full-text available
Semantic role labeling is the process of annotating the predicate-argument struc- ture in text with semantic labels. In this paper we present a state-of-the-art base- line semantic role labeling system based on Support Vector Machine classifiers. We show improvements on this system by: i) adding new features including fea- tures extracted from depe...
Article
Full-text available
In this paper, we present a semantic role labeler (or chunker) that groups syntactic chunks (i.e. base phrases) into the arguments of a predicate.
Article
Full-text available
In this paper, we use a machine learning framework for semantic argument parsing, and apply it to the task of parsing arguments of eventive nominalizations in the FrameNet database. We create a baseline system using a subset of features introduced by Gildea and Jurafsky (2002), which are directly applicable to nominal predicates. We then investigat...
Conference Paper
Full-text available
In this paper, we propose a machine learning al- gorithm for shallow semantic parsing, extend- ing the work of Gildea and Jurafsky (2002), Surdeanu et al. (2003) and others. Our al- gorithm is based on Support Vector Machines which we show give an improvement in perfor- mance over earlier classifiers. We show perfor- mance improvements through a nu...
Conference Paper
Full-text available
There is an ever-growing need to add structure in the form of semantic markup to the huge amounts of unstructured text data now available. We present the technique of shallow semantic parsing, the process of assigning a simple WHO did WHAT to WHOM, etc., structure to sentences in text, as a useful tool in achieving this goal. We formulate the seman...
Article
Full-text available
There is a ever-growing need to add structure in the form of semantic markup to the huge amounts of unstructured text data now available. We present the technique of shallow semantic parsing, the process of assigning a simple WHO did WHAT to WHOM, etc., structure to sentences in text, as a useful tool in achieving this goal. We formulate the semant...
Article
This paper describes a system for representing knowledge about conventional metaphors for use by natural language analysis, generation and acquisition systems. A system of hierarchically related structured associations is used. These associations are implemented as a part of the KODIAK representation language. Particular attention is paid in this p...
Conference Paper
Full-text available
Metaphor and other forms of non-literal language are essential parts of language which have direct bearing on theories of lexical semantics. Neither narrow theories of lexical semantics, nor theories relying solely on world knowledge are sufficient to account for our ability to generate and interpret non-literal language. This paper presents an eme...