Larry KerschbergGeorge Mason University | GMU · Department of Computer Science
Larry Kerschberg
PhD
About
182
Publications
20,756
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,633
Citations
Introduction
Skills and Expertise
Additional affiliations
August 1986 - present
Publications
Publications (182)
BACKGROUND
ChatGPT and other large language models (LLMs) are trained on extensive text data; they learn patterns and associations within the training text without an inherent understanding of underlying causal mechanisms. Establishing causation necessitates controlled experiments and observations, and as of May 2024, ChatGPT lacks access to experi...
Discovering causal relationships among symptoms is a topical issue in the analysis of observational patient datasets. A Causal Bayesian Network (CBN) is a popular analytical framework for causal inference. While there are many methods and algorithms capable of learning a Bayesian network, they are reliant on the complexity and thoroughness of the a...
A Bayesian Network (BN) is a popular framework for causal studies. Causal relationships and interactions can be captured in the topology of a BN, creating a Causal Bayesian Network (CBN). This framework enables us to reason under uncertainty and capture the strength of causal links as conditional probabilities. However, there currently is no quick...
This paper is based on my Keynote Address at the Tools with AI conference on November 4, 2013. The talk focussed on some factors used to determine a user's context — those attributes, both tacit and explicit, which help to ascertain a user's intensions for a search request in order to make a decision. I also explored how social semantic search can...
With the recent growth of the Social Web, an emerging challenge is how we can integrate information from the heterogeneity of current Social Web sites to improve semantic access to the information and knowledge across the entire World Wide Web, the Web. Interoperability across the Social Web sites make the simplest of inferences based on data from...
A service-oriented system architecture includes a computer-implemented search method and computer-implemented agent system for enabling efficient information searches on, for example, on XML databases, relational databases, and files located on intranets, the Internet, or other computer network systems. Referred to as the Knowledge Sifter architect...
In recent years, social media has expanded from a niche application with a student-focused user base to a mainstream tool used by individuals and business to maintain and expand their social and business networks. The result is a rich source of data that can help service consumers and service providers connect with each other in new and interesting...
There is a nexus between information technology and the physical world, where the developments of service science intersect with the technological innovations offered by modern communication systems. When completing a business process, users rarely consume just one type of service; most business processes are a combination of both physical and elec...
This paper introduces the concept, design, and implementation of the Personal Health Explorer (PHE), a semantic recommender system that allows an individual to perform semantic search and discovery related to conditions and diseases contained in his personal health record. The PHE system consults authoritative ontologies and reputable information s...
Composing services into executable workflows presents many challenges. One of the most difficult challenges is deciding which option among several available workflows is the best value for the user. Cost, reliability, user preferences, and other factors must be balanced to develop the best recommendation. In this paper we describe a strategy for de...
This paper proposes rethinking how ontologies are used to compose web services into business processes. Unlike handcrafted ontologies, we describe using a multi-agent system (MAS) to automatically generate semantic mappings from service interfaces. Comparing synonyms and contextual clues, we infer meanings of input and output parameters with no exp...
In this paper, we propose a secure and privacy-preserving Service Oriented Architecture (SOA) for health information integration and exchange in which patients are "part owners" of their medical records, have complete ownership of their integrated health information and decide when and how data is modified or exchanged between healthcare providers...
This chapter introduces a hybrid ontology mediation approach for deploying Semantic Web Services (SWS) using Multi-agent systems (MAS). The methodology that the authors have applied combines both syntactic and semantic matching techniques for mapping ontological schemas so as to 1)eliminate heterogeneity; 2)provide higher precision and relevance in...
This paper highlights an end-to-end framework and process methodology for developing a consistent knowledge model across enterprises. We demonstrate an improved matching algorithm i.e. the Semantic Relatedness Scores (SRS) and the Semantic Web Rule Language (SWRL) and how they can be coupled together to achieve better reliability and precision in m...
Data heterogeneity in the public sector is a serious problem and remains to be a key issue as different naming conventions are used to represent similar data labels. The e-government effort in many countries has provided a platform for government entities and their business partners to exchange data through Information Communication Technologies (I...
This article introduces a hybrid ontology mediation approach for the Semantic Web. It combines both syntactic and semantic matching measures to provide better results for matching data labels. Although ontologies are meant to provide a shared conceptualization of the world, the development practices, lack of standards, and subjective naming convent...
In this paper, we present the problem of ontology evolution and change management. We provide a systematic approach to solve the problem by adopting a multi-agent system (MAS). The core of our solution is the Semantic Relatedness Score (SRS) which is an aggregate score of five well-tested semantic as well as syntactic algorithms. The focus of this...
In this paper, we present a hybrid similarity matching algorithm i.e. Semantic Relatedness Score (SRS) that is used to match ontological concepts and instances in the context of ontology evolution. We combine five out of thirteen well-tested semantic and syntactic algorithms to produce SRS. Specifically we focus on the issue of ontology upgrade and...
This paper addresses the specification of a security policy ontology framework to mediate security policies between virtual
organizations (VO) and real organizations (RO). The goal is to develop a common domain model for security policy via semantic
mapping. This mitigates interoperability problems that exist due to heterogeneity in security policy...
This paper addresses the role of case-based reasoning in semantic search, and in particular, as it applies to Knowledge Sifter, an agent-based ontology-driven search system based on Web services. The Knowledge Sifter architecture is extended to include a case-based methodology for collaborative semantic search, including case creation, indexing and...
Agent ontology systems interact with other agents via automated service discovery to share data and provide services on-the-fly. Resources need to be annotated with meta-data so that they can be correctly discovered and invoked. Ontologies provide shared conceptualization of a domain but it's not easy for everyone to agree on the same ontology for...
This paper addresses the various facets of emergent semantics in content retrieval systems such as Knowledge Sifter, an architecture and system based on the use of specialized agents to coordinate the search for knowledge in heterogeneous sources, including the Web, semi-structured data, relational data and the Semantic Web. The goal is to provide...
This paper describes the specification, design and development of ACTIVE, a testbed for the testing and simulation of large-scale agent-based systems. ACTIVE is being developed as part of DARPA's Coordinators program. The goal of the ACTIVE testbed is to support the simulation of large collections of distributed cooperating agents in solving constr...
This paper proposes to model and represent an object's digital persona by means of a conceptual model that uses fragments of schemata that are termed data-DNA. The object may be a person, place or thing of interest; it may be either physical or virtual; and it supports interactions between and among objects. The goal is to provide a framework in wh...
This paper presents the requirements for just in time knowledge management (JIT-KM). In order to deliver high-value information to user for decision-making, one must understand the user's preferences, biases and decision context. A JIT-KM architecture is presented consisting of user, middleware and data services to search for information from heter...
Unlike traditional assembly lines, service-oriented processes require individualized and timely information during the execution of steps or phases. This paper discusses the Multi-layered Analytical Knowledge Organization (MAKO) Just-in-Time Process Model (JITPM), which is a framework and methodology for building service-oriented processes that are...
In seeking a new knowledge management paradigm, the First Workshop on Information Just-in-Time was held as a full day workshop
associated with the 3rd Conference on Professional Knowledge Management in Kaiserslautern, Germany.
This paper describes the current state of the OmniSeer system. OmniSeer supports intelligence analysts in the handling of massive amounts of data, the construction of scenarios, and the management of hypotheses. OmniSeer models analysts with dynamic user models that capture an analyst's context, interests, and preferences, thus enabling more effici...
The use of Web services as a means of dynamically discovering, negotiating, composing, executing and managing services to materialize enterprise-scale workflow is an active research topic. Existing approaches involve many disparate concepts, frameworks and technologies. What is needed is a comprehensive and overarching framework that handles the pr...
The concept of automating Web services, specifically the brokering activities, is an active research topic. We need a comprehensive and overarching framework that seamlessly incorporates intelligent middleware agents within the context of workflow management, and addresses the issues related to virtual organizations. The goal is to add semantics to...
Knowledge Sifter is a scaleable agent-based system that supports access to heterogeneous information sources such as the Web, open-source repositories, XML-databases and the emerging Semantic Web. User query specification is supported by a user agent that accesses multiple ontologies using an integrated conceptual model expressed in the Web Ontolog...
Knowledge Sifter is a scaleable agent-based system that supports access to heterogeneous information sources such as the Web, open-source repositories, XML-databases and the emerging Semantic Web. User query specification is supported by a user agent that accesses multiple ontologies using an integrated conceptual model. A collection of cooperating...
The problem considered is that of decomposing a global integrity constraint in a distributed database into local constraints
for every local site, such that the local constraints serve as a conservative approximation, i.e., satisfaction of the local
constraints by a database instance guarantees satisfaction of the global constraint. Verifying local...
The functional approach to computing has an important role in enabling the Internet-based applications such as the Semantic Web, e-business, web services, and agents for managing the evolving distributed knowledge space. This chapter examines the research issues and trends in these areas, and focuses on how the Functional Data Model and a functiona...
The Extensible Markup Language (XML) is a standard for information representation and exchange over the Internet. One of the applications where XML can be used is electronic business (e-business). E-business requires substantially large and frequent information exchanges among e-business parties. Such information exchange is required for search and...
It is over 20 years since the functional data model and functional programming languages were first introduced to the computing community. Although developed by separate research communities, recent work, presented in this book, suggests there is powerful synergy in their integration. As database technology emerges as central to yet more complex an...
We address how the XML Topic Map (XTM) 1.0 standard can be used to develop an analytical knowledge base comprised of multiple ontologies to support intelligence assessments. Termed the MultiOntology Analytical Knowledge Organizational (MAKO) framework, it incorporates a Multidimensional Ontology Model (MOM) that organizes subjects into separate con...
The concept of automating Web services, specifically the brokering activities, is an active research topic. We need a comprehensive and overarching framework that handles the discovery, differentiation, negotiation and selection processing within the context of workflow management, and addresses the issues related to virtual organizations. The goal...
This paper addresses several problems associated with the specification of Web searches, and the retrieval, filtering, and rating of Web pages in order to improve the relevance, precision and quality of search results. A methodology and architecture for an agent-based system, WebSifter is presented, that captures the semantics of a user's search in...
The INLEN system combines database, knowledge base, and machine learning technologies to provide a user with an integrated system of tools for conceptually analyzing data and searching for interesting relationships and regularities in them. Machine learning techniques are used for determining general descriptions from facts, creating conceptual cla...
The paper presents a method for integrating two different types of data analysis: symbolic inductive learning and statistical methods. The method concerns the problem of discovering rules characterizing the dependence between a group of dependent attributes and a group of independent groups, e.g., decision attributes and measurable attributes. The...
and CONCRETIZE, and GENERALIZE and SPECIALIZE, in addition to IMPROVE, an operator that improves knowledge by giving it new examples to learn from. ABSTRACT modifies its input knowledge segment by removing details from its description, and CONCRETIZE specifies details of an abstract concept description, while GENERALIZE and SPECIALIZE affect the se...
Among the central tasks in the development of expert systems is the formulation, debugging and implementation of a knowledge base. The knowledge encoded in the knowledge base is usually supplied by experts. There are, however, many application domains in which knowledge required by an expert system has to be extracted from facts collected in a data...
The architecture of a large-scale system, INLEN, for the discovery of knowledge from facts, is described and then illustrated by an exploratory application. INLEN combines database, knowledge base, and machine learning methods within a uniform user-oriented framework. Data and different forms of knowledge are managed in a uniform way by using knowl...
Providing highly relevant page hits to the user is a major concern in Web search. To accomplish this goal, the user must be allowed to express his intent precisely. Secondly, page hit rating mechanisms should be used that take the user’s intent into account. Finally, a learning mechanism is needed that captures a user’s preferences in his Web searc...
In this paper, we describe a new bitmap indexing technique to cluster XML documents. XML is a new standard for exchanging and representing information on the Internet. Documents can be hierarchically represented by XML-elements. XML documents are represented and indexed using a bitmap indexing technique. We define the similarity and popularity oper...
Caching issues in meta-search engines are considered. We propose a popularity-driven cache algorithm that utilizes both popularities and reference counters of queries to determine cache data to be purged. We show how to calculate query popularity. Empirical evaluations and performance comparisons of popularity-driven caching with the least recently...
This paper addresses the problem of specifying Web searches and retrieving, filtering, and rating Web pages so as to improve the relevance and quality of hits, based on the user's search intent and preferences. We present a methodology and architecture for an agent-based system, called WebSifter II, that captures the semantics of a user's decision-...
In practice, users may often want interesting rules that are also related with user goals . This paper describes a technique of mining useful rules both interesting and related to user goals. According to the degree of relevancy to a user goal, a database can be divided into the five views: from the view positively related to the user goal to the v...
This paper addresses the problem of specifying Web searches and retrieving, filtering, and rating Web pages so as to improve the relevance and quality of hits, based on the user?s search intent and preferences. We present a methodology and architecture for an agent-based system, called WebSifter II, that captures the semantics of a user?s decision-...
Cluster analysis aims at identifying groups of similar objects and, therefore helps to discover distribution of patterns and interesting correlations in large data sets. It has been subject of wide research since it arises in many application domains ...
This paper addresses issues related to Knowledge Management in the context of heterogeneous data warehouse environments. The traditional notion of data warehouse is evolving into a federated warehouse augmented by a knowledge repository, together with a set of processes and services to support enterprise knowledge creation, refinement, indexing, di...
This paper addresses several issues related to the use of conceptual modeling to support serviceoriented, advanced information systems. It shows how conceptual modeling of information resources can be used to integrate information obtained from multiple data sources, including both internal and external data. The notion of an intelligent thesaurus...
XML is a new standard for exchanging and representing information on the Internet. Documents can be hierarchically represented by XML-elements. In this paper, we propose that an XML document collection be represented and indexed using a bitmap indexing technique. We define the similarity and popularity operations suitable for bitmap indexes. We als...
This paper addresses the problem of specifying, retrieving, filtering and rating Web searches so as to improve the relevance and quality of hits, based on the user's search intent and preferences. We present a methodology and architecture for an agent-based system, called WebSifter II, that captures the semantics of a user's decision-oriented searc...
The World Wide Web provides access to a great deal of information on a vast array of subjects. A user can begin a search for information by selecting a Web page and following the embedded links from page to page looking for clues to the desired information. An alternative method is to use one of the Web-based search engines to select the Web pages...
Caching issues in meta-search engines are considered. We propose a popularity-driven cache algorithm that utilizes both popularities and reference counters of queries to determine cache data to be purged. We show how to calculate query popularity. Empirical evaluations and performance comparisons of popularity-driven caching with the least recently...
. Adaptation in open, multi-agent information gathering systems is important for several reasons. These reasons include the inability to accurately predict future problem-solving workloads, future changes in existing information requests, future failures and additions of agents and data supply resources, and other future task environment characteri...
The paper presents a novel application involving two important software engineering research areas: process modeling and software reuse. The Spiral Model is a risk-driven process model, which, depending on the specific risks associated with a given project, may be tailored to create a project-specific process model. The software reuse area is that...
The World Wide Web provides access to a great deal of information
on a vast array of subjects. In a typical Web search a vast amount of
information is retrieved. The quantity can be overwhelming, and much of
the information may be marginally relevant or completely irrelevant to
the user's request. We present a methodology, architecture, and
proof-o...
We describe how to express constraints in a functional
(semantic) data model, which has a working implementation in an
object database. We trace the development of such constraints from
being integrity checks embedded in procedural code to being something
declarative ...
In this presentation we review the current ongoing research within George Mason University's (GMU) Center for Information Systems Integration and Evolution (CISE). We define characteristics of advanced information systems, discuss a family of agents for such systems, and show how GMU's Domain modeling tools and techniques can be used to define a pr...
This paper presents an agent-based architecture for optimally managing resources in distributed environments. The agent-based approach allows for maximal autonomy in agent negotiation and decision-making. A three-layer architecture is proposed, consisting of the user, agency and application domain layers. Agencies correspond to enterprise functiona...
This paper presents an overview of current trends in Electronic Business (E-business), and how an enterprise can use the Electronic
Marketspace for strategic advantage. The role of cooperative information agents is discussed within the context of E-business.
An agency-based framework is presented for E-business in the class of logistics and supply...
The approach presented in this paper is to discover association rules based on a user’s query. Of the many issues in rule discovery, relevancy, interestingness, and supportiveness of association rules are considered in this paper. For a given user query, a database can be partitioned into three views: a positively-related-query view, a negatively-r...
Presents a methodology for data mining and knowledge discovery in
large, distributed and heterogeneous databases. In order to obtain
potentially interesting patterns, relationships and rules from such
large and heterogeneous data collections, it is essential that a
methodology be developed to take advantage of the suite of existing
methods and tool...
. This paper addresses the problem of semantic query reformulation in the context of object-oriented deductive databases. It extends the declarative object-oriented specifications of F-logic proposed by Kifer and Lausen using the semantic query optimization technique developed by Chakravarthy, Grant, and Minker. In general, query processing in obje...
We introduce the concept of rule schema in this paper in order to support constraints in the active object-oriented paradigm. The rule schema provides the meta-knowledge associated with constraints analogous to the way a database schema provides metadata about the database objects. It is used to compile the constraints into clauses which are then "...
Although knowledge discovery is increasingly important in databases, discovered knowledge is not always useful to users. It is mainly because the discovered knowledge does not fit user's interests, or it may be redundant or inconsistent with a priori knowledge. Knowledge discovery in databases depends critically on how well a database is characteri...
In an active database, an update may be constrained by integrity constraints, and may also trigger rules that, in turn, may affect the database state. The general problem is to effect the update while also managing the “side-effects” of constraint enforcement and rule execution. In this paper an update calculus is proposed by which updates, constra...
The Earth Observing System (EOS) Data and Information System (EOSDIS) is perhaps one of the most important examples of large-scale, geographically distributed, and data intensive systems. Designing such systems in a way that guarantees that the resulting design will satisfy all functional and performance requirements is not a trivial task. This pap...
We present results of providing database support to biomedicine
via federation of SDB Cooperation/Integration based upon the KEGG GUI
for molecular biology. The federation provides a common link to three
molecular biology databases. The added value of the federation is
freedom from consulting multiple references to ascertain the full set of
enzymat...
George Mason University began as an independent state university in 1972. Its development has been marked by rapid growth and innovative planning, resulting in an enrollment of more than 24,000 students in 1997. It is located in Fairfax, Virginia—about fifteen miles southwest of Washington, DC—near many governmental agencies and industrial firms sp...
Transportable agents are autonomous programs. They can move
through a heterogeneous network of computers migrating from host to
host under their own control. They can sense the state of the
network, monitor software conditions, and interact with other ...
Described is a prototype database and analysis system developed to support the specific domain of forest canopy research. This effort utilized a multidiscipline team comprised of information (database systems), statistical analysis and forest canopy scientists. Both large scale (Oracle) and smaller scale (Visual FoxPro) databases were prototyped. A...
The paper presents an information architecture consisting of the information interface, management and gathering layers. Intelligent active services are discussed for each layer, access scenarios are presented, and the role of knowledge rovers is discussed. Knowledge rovers represent a family of cooperating intelligent agents that may be configured...
This paper addresses the needs of application designers whowould like to tell an automated assistant the following: “Here is aquery that defines a view I want to materialize within myapplication. I need this view to remain approximately consistent with the state of the data sources from which the view isderived, in accordance with declaratively spe...
The paper presents an information architecture consisting of the information interface, management and gathering layers. Intelligent active services are discussed for each layer, access scenarios are presented, and the role of knowledge rovers is discussed. Knowledge rovers represent a family of cooperating intelligent agents that may be configured...
Scientists from widely varying communities are frequently called upon to work together, sharing their data and their expertise to investigate a common issue. Difficulties frequently arise in sharing data because each community, and sometimes each scientist, has their own conventions for structuring the data. This results in a data schema that is in...
Data Mining and knowledge Discovery in Databases (KDD) promise to play an important role in the way people interact with databases, especially scientific databases where analysis and exploration operations are essential. This is an extended abstract ...
Preface. Part I: Workflow Transactions. 1. Transactions in Transactional Workflows D. Worah, A. Sheth. 2. WFMS: The Next Generation of Distributed Processing Tools G. Alonso, C. Mohan. Part II: Tool-Kit Approaches. 3. The Reflective Transaction Framework R.S. Barge, C. Pu. 4. Flexible Commit Protocols for Advanced Transaction Processing L. Mancini,...
This paper describes a software architectural design method for large-scale distributed information systems. The method, which is part of an integrated design and performance evaluation method, addresses the design of client/server software architectures, where the servers need to cooperate with each other to service client requests. The goal of th...
This paper describes a prototype Knowledge-Based Software Engineering Environment used to demonstrate the concepts of reuse of software requirements and software architectures. The prototype environment, which is application-domain independent, is used to support the development of domain models and to generate target system specifications from the...