About
39
Publications
7,776
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
434
Citations
Introduction
Skills and Expertise
Current institution
AiLive
Publications
Publications (39)
As belief networks are used to model increasingly complex situations, the
need to automatically construct them from large databases will become
paramount. This paper concentrates on solving a part of the belief network
induction problem: that of learning the quantitative structure (the conditional
probabilities), given the qualitative structure. In...
Online learning in commercial computer games allows computer-controlled opponents to adapt to the way the game is being played. As such it provides a mechanism to deal with weaknesses in the game AI, and to respond to changes in human player tactics. ...
The history of the interaction of machine learning and computer game-playing goes back to the earliest days of Artificial Intelligence, when Arthur Samuel worked on his famous checker-playing program, pioneering many machine-learning and game-playing techniques (Samuel, 1959, 1967). Since then, both fields have advanced considerably, and research i...
Mesh data has been a common form of data produced and searched in scientific simulations, and has been growing rapidly in the size thanks to the increasing computing power. Today, there are visualization tools that assist scientists to explore and examine the data, but their query capabilities are limited to a small set of fixed visualization opera...
AQSim is a system intended to enable scientists to query and analyze a large volume of scientific simulation data. The system uses the state of the art in approximate query processing techniques to build a novel framework for progressive data analysis. These techniques are used to define a multi-resolution index, where each node contains mul-tiple...
We present our approach to enabling approximate ad hoc queries on terabyte-scale mesh data generated from large scientific simulations through the extension and integration of database, statistical, and data mining techniques. There are several significant barriers to overcome in achieving this objective. First, large-scale simulation data is alrea...
Bioinformatics is facing the daunting challenge of providing geneticists and biologists effective, efficient access to data currently distributed among dynamic, heterogeneous data sources. Complicating the problem is the speed at which the underlying science and technology evolve, leaving the terminology, databases and interfaces to catch up. As th...
We present our approach to enabling approximate ad hoc queries on terabyte-scale mesh data generated from large scientific simulations through the extension and integration of database, statistical, and data mining techniques. There are several significant barriers to overcome in achieving this objective. First, large-scale simulation data is alrea...
Bioinformatics is facing the daunting challenge of providing geneticists and biologists effective, efficient access to data currently distributed among dynamic, heterogeneous data sources. Complicating the problem is the speed at which the underlying science and technology evolve, leaving the terminology, databases and interfaces to catch up. As th...
As simulation is gaining popularity as an inexpensive means of experimentation in diverse fields of industry and government, the attention to the data generated by scientific simulation is also increasing. Scientific simulation generates mesh data, i.e. data configured in a grid structure, in a sequence of time steps. Its model is complex - underst...
In this paper, we describe AQSim, an ongoing effort to design and impl ement a system to manage terabytes of scientific simulation data. The goal of this project is to reduce data storage requirements and access times while permitting ad-hoc queries using statistical and mathematical models of the data. In order to facilitate data exchange between...
We present our approach to enabling approximate ad hoc queries on terabyte-scale mesh data generatedfrom large scientific simulations through the extension and integration of database, statistical, anddata mining techniques. There are several significant barriers to overcome in achieving this objective.First, large-scale simulation data is already...
Depending on who you ask, bioinformatics can refer to almost any collaborative effort between biologists or geneticists and computer scientists -- from database development, to simulating the chemical reaction between proteins, to automatically identifying tumors in MRI images. At Lawrence Livermore National Laboratory (LLNL), we have come to use a...
Data stored in a data warehouse must be kept consistent and up-to-date with respect to the underlying information sources. By providing the capability to identify, categorize and detect changes in these sources, only the modified data needs to be transfered and entered into the warehouse. Another alternative, periodically reloading from scratch, is...
Data warehouses and data marts have been successfully applied to a multitude of commercial business applications. They have proven to be invaluable tools by integrating information from distributed, heterogeneous sources and summarizing this data for use throughout the enterprise. Although the need for information dissemination is as vital in scien...
Data stored in a data warehouse must be kept consistent and up-to-date with respect to the underlying information sources. By providing the capability to identify, categorize and detect changes in these sources, only the modified data needs to be transferred and entered into the warehouse. Another alternative, periodically reloading from scratch, i...
This paper proposes a simple model for a timer-driven triggering and alerting system. Such a system can be used with relational and object-relational databases systems. Timer-driven trigger systems have a number of advantages over traditional trigger ...
The working document has been put together by the members of the Sapphire project at LLNL. The goal of Sapphire is to apply and extend techniques from data mining and pattern recognition in order to detect automatically the areas of interest in very large data sets. The intent is to help scientists address the problem of data overload by providing...
Mediators are a critical component of any data warehouse; they transform data from source formats to the warehouse representation while resolving semantic and syntactic conflicts. The close relationship between mediators and databases requires a mediator to be updated whenever an associated schema is modified. Failure to quickly perform these updat...
. Discovery of association rules is an important data mining task. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the set of frequent itemsets (a subset of database items), thus incurring high I/O overhead. In t...
Data warehouses created for dynamic scientific environments, such as genetics, face significant challenges to their long-term feasibility. One of the most significant of these is the high frequency of schema evolution resulting from both technological advances and scientific insight. Failure to quickly incorporate these modifications will quickly r...
On September 15th and 16th, 1997 the Second IEEE Metadata Conference was held at the National Oceanic and Atmospheric Administration (NOAA) complex in Silver Spring, Maryland. The main objectives of this conference series are to provide a forum to address metadata issues faced by various communities, promote the interchange of ideas on common techn...
Data warehouses and data marts have been successfully applied to a multitude of commercial business applications as tools for integrating and providing access to data located across an enterprise. Although the need for this capability is as vital in the scientific world as in the business domain, working warehouses in our community are scarce. A pr...
Scalable High Performance Computing for Knowledge Discovery and Data Mining brings together in one place important contributions and up-to-date research results in this fast moving area.
Scalable High Performance Computing for Knowledge Discovery and Data Mining serves as an excellent reference, providing insight into some of the most challenging r...
Large collections of images can be indexed by their projections on a few “primary” images. The optimal primary images are the eigenvectors of a large covariance matrix. We address the problem of computing primary images when access to the images is expensive. This is the case when the images cannot be kept locally, but must be accessed through slow...
Data and information intensive industries require advanced data management capabilities incorporated with large capacity storage. Performance in the environment is, in part, a function of individual storage and data management system performance, but most importantly a function of the level of their integration. This paper focuses on integration, i...
Data and information intensive industries require advanced data management capabilities incorporated with large capacity storage. Performance in the environment is, in part, a function of individual storage and data management system performance, but most importantly a function of the level of their integration. This paper focuses on integration, i...
Belief networks are a powerful tool for knowledge discovery that provide concise, understandable probabilistic models of data. There are methods grounded in probability theory to incrementally update the relationships described by the belief network when new information is seen, to perform complex inferences over any set of variables in the data, t...
Standard methods of constructing decision trees can be prohibitively expensive when induction algorithms are given very large training sets on which to compare attributes. This expense can often be avoided. By using a subsample for this calculation, we can get an approximation to the information gain used to assess each attribute. Selecting an attr...
We present a method for approximating the expected number of steps required by a heuristic search algorithm to reach a goal from any initial state in a problem space. The method is based on a mapping from the original state space to an abstract space in which states are characterized only by a syntactic "distance" from the nearest goal. Modeling th...
A system and method is disclosed for integrating and accessing multiple data sources within a data warehouse architecture. The metadata formed by the present method provide a way to declaratively present domain specific knowledge, obtained by analyzing data sources, in a consistent and useable way. Four types of information are represented by the m...
We present our approach to enabling approximate ad hoc queries on terabyte-scale mesh data generated from large scientific simulations through the extension and integration of database, statistical, and data mining techniques. There are several significant barriers to overcome in achieving this objective. First, large-scale simulation data is alrea...
Our ability to generate data far outstrips our ability to explore and understand it. The true value of this data lies not in its final size or complexity, but rather in our ability to exploit the data to achieve scientific goals. The data generated by programs such as ASCI have such a large scale that it is impractical to manually analyze, explore,...
This whitepaper briefly describes a new, aggressive effort in large- scale data Livermore National Labs. The implications of `large- scale` will be clarified Section. In the short term, this effort will focus on several @ssion-critical questions of Genome project. We will adapt current data mining techniques to the Genome domain, to quantify the ac...
Data warehousing is an approach for managing data from multiple sources by representing them with a single, coherent point of view. Commercial data warehousing products have been produced by companies such as RebBrick, IBM, Brio, Andyne, Ardent, NCR, Information Advantage, Informatica, and others. Other companies have chosen to develop their own in...
This dissertation describes BNI (Belief Network Inductor), a tool that automatically induces a belief network from a database. The fundamental thrust of this research program has been to provide a theoretically sound method of inducing a model from data, and performing inference over that model. Along with a solid grounding in probability theory, B...