Craig A. Knoblock

Craig A. Knoblock
University of Southern California | USC · Department of Computer Science

31.97
 · 
Ph.D.

About

387
Publications
72,258
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,001
Citations
Introduction
Skills and Expertise

Publications

Publications (387)
Chapter
Full-text available
Historical maps provide a rich source of information for researchers in the social and natural sciences. These maps contain detailed documentation of a wide variety of natural and human-made features and their changes over time, such as the changes in the transportation networks and the decline of wetlands. It can be labor-intensive for a scientist...
Preprint
Full-text available
Historical maps provide a rich source of information for researchers in the social and natural sciences. These maps contain detailed documentation of a wide variety of natural and human-made features and their changes over time, such as the changes in the transportation networks and the decline of wetlands. It can be labor-intensive for a scientist...
Chapter
Digital map processing has been an interest in the computer science and geographic information science communities since the early 1980s. With the increase of available map scans, a variety of researchers in the natural and social sciences developed a growing interest in using historical maps in their studies. The lack of an understanding of how hi...
Chapter
Historical map scans contain valuable information (e.g., historical locations of roads, buildings) enabling the analyses that require long-term historical data of the natural and built environment. Many online archives now provide public access to a large number of historical map scans, such as the historical USGS (United States Geological Survey)...
Chapter
Historical geographic data are essential for a variety of studies of cancer and environmental epidemiology, urbanization, and landscape ecology. However, existing data sources typically contain only contemporary information. Historical maps hold a great deal of detailed geographic information at various times in the past. Yet, finding relevant maps...
Chapter
This chapter summarizes the book and provides a brief outlook.
Book
This book illustrates the first connection between the map user community and the developers of digital map processing technologies by providing several applications, challenges, and best practices in working with historical maps. After the introduction chapter, in this book, Chapter 2 presents a variety of existing applications of historical maps...
Article
Full-text available
Information extraction from historical maps represents a persistent challenge due to inferior graphical quality and the large data volume of digital map archives, which can hold thousands of digitized map sheets. Traditional map processing techniques typically rely on manually collected templates of the symbol of interest, and thus are not suitable...
Article
With large amounts of digital map archives becoming available, automatically extracting information from scanned historical maps is needed for many domains that require long-term historical geographic data. Convolutional Neural Networks (CNN) are powerful techniques that can be used for extracting locations of geographic features from scanned maps...
Conference Paper
Publishing data sources to knowledge graphs is a complicated and laborious process as data sources are often heterogeneous, hierarchical and interlinked. As an example, food price datasets may contain product prices of various units at different markets and times, and different providers can have many choices of formats such as CSV, JSON or spreads...
Conference Paper
Full-text available
Scientific models often depend on complex, interrelated datasets, and finding, preparing, and cleaning these datasets often dominates the time devoted to scientific inquiry. We are addressing these problems by creating a Data Catalog that provides a central clearinghouse for metadata about scientific datasets, supports fuzzy searching for data vari...
Article
Most current software systems are not adaptable, making them less capable of achieving their objectives. We are developing a method to optimize software for new environments automatically. An independent evaluation has demonstrated that this method adapts to degraded sensors, changing environmental conditions, and a loss of power.
Conference Paper
Full-text available
Data-intensive models have become critical to understanding the world. In order to reuse or combine datasets to support modeling, scientists must select, understand, and align them manually, a laborious process that requires understanding different domains and formats. To assist the modeling process, we present an unsupervised approach that identif...
Article
Full-text available
Convolutional neural networks (CNNs) such as encoder–decoder CNNs have increasingly been employed for semantic image segmentation at the pixel-level requiring pixel-level training labels, which are rarely available in real-world scenarios. In practice, weakly annotated training data at the image patch level are often used for pixel-level segmentati...
Conference Paper
Understanding competition between businesses is essential for assessing the likely success of new ventures or products, for making decisions before investing capital in new businesses, and understanding the impacts of regulatory policy. One important resource for analyzing competitor relationships are business webpages, which can capture the missio...
Article
Full-text available
Historical maps are unique sources of retrospective geographical information. Recently, several map archives containing map series covering large spatial and temporal extents have been systematically scanned and made available to the public. The geographical information contained in such data archives makes it possible to extend geospatial analysis...
Article
Developing scalable, semi-automatic approaches to derive insights from a domain-specific Web corpus is a longstanding research problem in the knowledge discovery community. The problem is particularly challenging in illicit fields, such as human trafficking, where traditional assumptions concerning information representation are frequently violated...
Article
Full-text available
Developing scalable, semi-automatic approaches to derive insights from a domain-specific Web corpus is a longstanding research problem in the knowledge discovery community. The problem is particularly challenging in illicit fields, such as human trafficking, where traditional assumptions concerning information representation are frequently violated...
Conference Paper
With large amounts of digital map archives becoming available, the capability to automatically extracting information from historical maps is important for many domains that require long-term geographic data, such as understanding the development of the landscape and human activities. In the previous work, we built a system to automatically recogni...
Conference Paper
Full-text available
Organizations are awash in data. In many cases, they do not know what data exists within the organization and much information is not available when needed, or worse, information gets recreated from other sources. In this paper, we present an automatic approach to spatio-temporal indexing of the datasets within an organization. The indexing process...
Conference Paper
Linked Data has emerged as the preferred method for publishing and sharing cultural heritage data. One of the main challenges for museums is that the defacto standard ontology (CIDOC CRM) is complex and museums lack expertise in semantic web technologies. In this paper we describe the methodology and tools we used to create 5-star Linked Data for 1...
Conference Paper
We study the problem of improving a machine learning model by identifying and using features that are not in the training set. This is applicable to machine learning systems deployed in an open environment. For example, a prediction model built on a set of sensors may be improved when it has access to new and relevant sensors at test time. To effec...
Conference Paper
Information extraction from historical maps represents a persistent challenge due to inferior graphical quality and large data volume in digital map archives, which can hold thousands of digitized map sheets. In this paper, we describe an approach to extract human settlement symbols in United States Geological Survey (USGS) historical topographic m...
Conference Paper
Studies of market structure and product market competition are important in many disciplines, such as economics, finance, accounting and management. Reliable data for such studies is easily available for public firms (e.g., 10-K filings), but no reliable data exists for private firms. In this work we propose to mine the Internet Archive Wayback Mac...
Conference Paper
Semantic labeling is the process of mapping attributes in data sources to classes in an ontology and is a necessary step in heterogeneous data integration. Variations in data formats, attribute names and even ranges of values of data make this a very challenging task. In this paper, we present a novel domain-independent approach to automatic semant...
Conference Paper
Full-text available
Entity resolution is the task of identifying all mentions that represent the same real-world entity within a knowledge base or across multiple knowledge bases. We address the problem of performing entity resolution on RDF graphs containing multiple types of nodes, using the links between instances of different types to improve the accuracy. For exa...
Conference Paper
Mapping data to a shared domain ontology is a key step in publishing semantic content on the Web. Most of the work on automatically mapping structured and semi-structured sources to ontologies focuses on semantic labeling, i.e., annotating data fields with ontology classes and/or properties. However, a precise mapping that fully recovers the intend...
Conference Paper
We work on converting the metadata of 13 American art museums and archives into Linked Data, to be able to integrate and query the resulting data. While there are many good sources of artist data, no single source covers all artists. We thus address the challenge of building a comprehensive knowledge graph of artists that we can then use to link th...
Conference Paper
Assessing the relatedness of documents is at the core of many applications such as document retrieval and recommendation. Most similarity approaches operate on word-distribution-based document representations fast to compute, but problematic when documents differ in language , vocabulary or type, and neglecting the rich relational knowledge availab...
Article
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe their contents. Semantic models of data sources represent the implicit meaning of the data by sp...
Conference Paper
Full-text available
There is a huge amount of data spread across the web and stored in databases that we can use to build knowledge graphs. However, exploiting this data to build knowledge graphs is difficult due to the heterogeneity of the sources, scale of the amount of data, and noise in the data. In this paper we present an approach to building knowledge graphs by...
Conference Paper
Full-text available
Programming-by-Example approaches allow users to transform data by simply entering the target data. However, current methods do not scale well to complicated examples, where there are many examples or the examples are long. In this paper, we present an approach that exploits the fact that users iteratively provide examples. It reuses the previous s...
Conference Paper
There is a huge demand to be able to find and integrate heterogeneous data sources, which requires mapping the attributes of a source to the concepts and relationships defined in a domain ontology. In this paper, we present a new approach to find these mappings, which we call semantic labeling. Previous approaches map each data value individually,...
Conference Paper
An analyst today has a tremendous amount of data available, but each of the various data sources typically exists in their own silos, so an analyst has limited ability to see an integrated view of the data and has little or no access to contextual information that could help in understanding the data. We have developed the Domain-Insight Graph (DIG...
Article
Full-text available
There is a great deal of interest in big data, focusing mostly on data set size. An equally important dimension of big data is variety, where the focus is to process highly heterogeneous data sets. We describe how we use semantics to address the problem of big data variety. We also describe Karma, a system that implements our approach and show how...
Patent
Full-text available
A method for processing geospatial datasets corresponding to geospatial objects, the method having the steps of extracting geospatial attributes from the geospatial datasets, locating extracted geospatial attributes corresponding to a particular geospatial object at a particular point in time, and generating output indicative of the particular geos...
Conference Paper
Full-text available
Programming by example (PBE) enables users to transform data formats without coding. As data transformation often involves data with heterogeneous formats, it often requires learning a conditional statement to differentiate these different formats. However, to be practical, the method must learn the correct conditional statement efficiently and acc...
Article
Full-text available
In spite of the effectiveness of Constraint Programming lan-guages and tools, modeling remains an art and requires significant in-volvement from a CP expert. Our goal is to alleviate the load of the human user, and this paper is a first step in this direction. We propose a framework that enriches a 'generic' constraint model of a domain area with a...
Conference Paper
Given the increasing popularity and availability of location tracking devices, large quantities of spatiotemporal data are available from many different sources. Quick interactive analysis of such data is important in order to understand the data, identify patterns, and eventually make a marketable product. Since the data do not necessarily follow...
Conference Paper
A significant challenge in handling geographic datasets is that the datasets can come from heterogeneous sources with various data qualities and formats. Before these datasets can be used in a Geographic Information System (GIS) for spatial analysis or to create maps, a typical task is to clean the attribute data and transform the data into a unifo...
Conference Paper
Map labels provide valuable geographic information by annotating geographic phenomenona with text descriptions. However, many interesting and useful maps are only available as images and hence this information is not readily accessible in a Geographic Information System (GIS). Previous work on text recognition in maps considers maps as a special ty...
Conference Paper
Looting and theft of cultural property has been a problem for decades. While there are no exact figures, some agencies suggest it is a criminal industry grossing in the billions annually. Documentation is an essential and key component to finding lost or stolen cultural property and in establishing ownership in a court of law. However, the data on...
Article
Full-text available
Maps depict natural and human-induced changes on earth at a fine resolution for large areas and over long periods of time. In addition, maps-especially historical maps-are often the only information source about the earth as surveyed using geodetic techniques. In order to preserve these unique documents, increasing numbers of digital map archives h...
Conference Paper
Incorporating structured data in the Linked Data cloud is still complicated, despite the numerous existing tools. In particular, hierarchical structured data (e.g., JSON) are underrepresented, due to their processing complexity. A uniform mapping formalization for data in different formats, which would enable reuse and exchange between tools and ap...
Conference Paper
Full-text available
Semantic models of data sources describe the meaning of the data in terms of the concepts and relationships defined by a domain ontology. Building such models is an important step toward integrating data from different sources, where we need to provide the user with a unified view of underlying sources. In this paper, we present a scalable approach...
Patent
Methods for locating a feature on geospatial imagery and systems for performing those methods are disclosed. An accuracy level of each of a plurality of geospatial vector datasets available in a database can be determined. Each of the plurality of geospatial vector datasets corresponds to the same spatial region as the geospatial imagery. The geosp...
Patent
A method, computer program, and system for linking content to individual image features are provided. A section of an image is identified. A plurality of features associated with the section of the image is determined. Each of the plurality of features corresponds to at least one position within the section of the image. Content associated with the...
Article
Museums around the world have built databases with metadata about millions of objects, their history, the people who created them, and the entities they represent. This data is stored in proprietary databases and is not readily available for use. Recently, museums embraced the Semantic Web as a means to make this data available to the world, but th...
Conference Paper
Full-text available
Programming by example enables users to transform data formats without coding. To be practical, the method must synthesize the correct transformation with minimal user input. We present a method that minimizes user effort by color-coding the transformation result and recommending specific records where the user should provide examples. Simulation r...
Patent
Document relevance is determined with respect to a region of interest (ROI). A set of location references may be associated with a set of documents. The system selects location references associated with an ROI and then selects documents corresponding to the selected location references. The selected documents can be reported or processed further....
Article
Full-text available
Vast amounts of text on the Web are unstructured and ungrammatical, such as classified ads, auction listings, forum postings, etc. We call such text "posts." Despite their inconsistent structure and lack of grammar, posts are full of useful information. This paper presents work on semi-automatically building tables of relational information, called...
Article
Text labels in maps provide valuable geographic information by associating place names with locations. This information from historical maps is especially important since historical maps are very often the only source of past information about the earth. Recognizing the text labels is challenging because heterogeneous raster maps have varying image...
Conference Paper
Full-text available
In this position paper, we describe a vision for the future of a so-called "Spatial-Health CyberGIS Marketplace". We first situate this proposed new computing ecosystem within the set of currently-available enabling technologies and techniques. We next provide a detailed vision of the capabilities and features of an ecosystem that will benefit indi...
Conference Paper
Semantic models of data sources and services provide support to automate many tasks such as source discovery, data integration, and service composition, but writing these semantic descriptions by hand is a tedious and time-consuming task. Most of the related work focuses on automatic annotation with classes or properties of source attributes or inp...
Conference Paper
Museums around the world have built databases with meta-data about millions of objects, their history, the people who created them, and the entities they represent. This data is stored in proprietary databases and is not readily available for use. Recently, museums embraced the Semantic Web as a means to make this data available to the world, but t...
Conference Paper
Full-text available
There is a tremendous amount of geospatial data available, and there are numerous methods for extracting, processing and integrating geospatial sources. However, end-users' ability to retrieve, combine, and integrate heterogeneous geospatial data is limited. This paper presents a new semantic approach that allows users to easily extract, link, and...
Conference Paper
Recently, large amounts of data are being published using Semantic Web standards. Simultaneously, there has been a steady rise in links between objects from multiple sources. However, the ontologies behind these sources have remained largely disconnected, thereby challenging the interoperability goal of the Semantic Web. We address this problem by...
Conference Paper
The relevance of many types of data perishes or degrades over time; to support timely decision-making, data integration systems must provide access to live data and should make it easy to incorporate new sources. We outline methods, based on web architecture that enable (near) real-time access to data sources in a variety of formats and access moda...
Article
Full-text available
Historical maps contain rich cartographic information, such as road networks, but this information is "locked" in images and inaccessible to a geographic information system (GIS). Manual map digitization requires intensive user effort and cannot handle a large number of maps. Previous approaches for automatic map processing generally require expert...
Conference Paper
Full-text available
The amount of data available in the Linked Data cloud continues to grow. Yet, few services consume and produce linked data. There is recent work that allows a user to define a linked service from an online service, which includes the specifications for consuming and producing linked data, but building such models is time consuming and requires spec...
Conference Paper
Despite the increase in the number of linked instances in the Linked Data Cloud in recent times, the absence of links at the concept level has resulted in heterogenous schemas, challenging the interoperability goal of the Semantic Web. In this paper, we address this problem by finding alignments between concepts from multiple Linked Data sources. I...
Conference Paper
Raster maps contain rich road information, such as the topology and names of roads, but this information is “locked” in images and inaccessible in a geographic information system (GIS). Previous approaches for road extraction from raster maps typically handle this problem as raster-to-vector conversion and hence the extracted road vector data are l...
Conference Paper
Full-text available
Linked data continues to grow at a rapid rate, but a limitation of a lot of the data that is being published is the lack of a semantic description. There are tools, such as D2R, that allow a user to quickly convert a database into RDF, but these tools do not provide a way to easily map the data into an existing ontology. This paper presents a semi-...
Conference Paper
Full-text available
Despite the recent growth in the size of the Linked Data Cloud, the absence of links between the vocabularies of the sources has resulted in heterogenous schemas. Our previous work tried to find con-ceptual mapping between two sources and was successful in finding align-ments, such as equivalence and subset relations, using the instances that are l...
Article
Full-text available
A key problem in many data integration tasks is that data is often in the wrong format and needs to be converted into a different format. This can be a very time consuming and tedious task. In this paper we propose an approach that can learn data transformations automatically from examples. Our approach not only identifies the transformations that...
Article
Full-text available
Maps are one of the most valuable documents for gathering geospatial information about a region. Yet, finding a collection of diverse, high-quality maps is a significant challenge because there is a dearth of content-specific metadata available to identify them from among other images on the Web. For this reason, it is desirous to analyze the conte...
Article
Full-text available
Environmental cyber-observatory (ECO) planning and implementation has been ongoing for more than a decade now, and several major efforts have recently come online or will soon. Some investigators in the relevant research communities will use ECO data, traditionally by developing their own client-side services to acquire data and then manually creat...
Conference Paper
Full-text available
Using today's GIS tools, users without programming expertise are unable to fully exploit the growing amount of geospatial data becoming available because today's tools limit them to displaying data as layers for a region on a map. Fusing the data in more complex ways requires the ability to invoke processing algorithms and to combine the data these...