• Home
  • Gregory Piatetsky-Shapiro
Gregory Piatetsky-Shapiro

Gregory Piatetsky-Shapiro
  • KDnuggets

About

80
Publications
18,631
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
21,962
Citations
Current institution
KDnuggets

Publications

Publications (80)
Article
Educational Data Mining (EDM) is an emerging multidisciplinary research area, in which methods and techniques for exploring data originating from various educational information systems have been developed. EDM is both a learning science, as well as ...
Article
In this article, we study the problem of Web user profiling, which is aimed at finding, extracting, and fusing the “semantic”-based user profile from the Web. Previously, Web user profiling was often undertaken by creating a list of keywords ...
Article
An analytical framework for using powerlaw theory to estimate market size for niche products and consumer groups.
Article
Full-text available
I survey the transformation of the data mining and knowledge discovery field over the last 10 years from the unique vantage point of KDnuggets as a leading chronicler of the field. Analysis of the most frequent words in KDnuggets News leads to revealing observations.
Article
Interview with Simon Funk -- a Netflix prize leader, an outstanding hacker, and an original thinker.
Article
Full-text available
We discuss what makes exciting and motivating Grand Challenge problems for Data Mining, and propose criteria for a good Grand Challenge. We then consider possible GC problems from multimedia mining, link mining, large- scale modeling, text mining, and proteomics. This report is the result of a panel held at KDD-2006 conference.
Conference Paper
Full-text available
This panel will discuss possible exciting and motivating Grand Challenge problems for Data Mining, focusing on bioinformatics, multimedia mining, link mining, text mining, and web mining.
Chapter
This paper examines the current trends in business applications of data mining and knowledge discovery systems. The focus is on newly emerging Third Generation data mining systems, which are solution-oriented and integrate smoothly with existing business systems.
Conference Paper
We study an algorithm for feature selection that clusters attributes using a special metric and then makes use of the dendrogram of the resulting cluster hierarchy to choose the most relevant attributes. The main interest of our technique resides in the improved understanding of the structure of the analyzed data and of the relative importance of t...
Article
Full-text available
Animal models for human diseases are of crucial importance for studying gene expression and regulation. In the last decade the development of mouse models for cancer, diabetes, neuro-degenerative and many other diseases has been on steady rise. Microarray ...
Conference Paper
Analyzing gene expression data from microarray devices has many important application in medicine and biology, but presents significant challenges to data mining. Microarray data typically has many attributes (genes) and few examples (samples), making the process of correctly analyzing such data difficult to formulate and prone to common mistakes....
Article
Full-text available
At the 2001 IEEE International Conference on Data Mining in San Jose, California, on November 29 to December 2, 2001, there was a panel discussion on how data mining research meets practical development. One of the motivations for organizing the panel discussion was to provide useful advice for industrial people to explore their directions in data...
Article
Animal models for human diseases are of crucial importance for studying gene expression and regulation. In the last decade the development of mouse models for cancer, diabetes, neuro-degenerative and many other diseases has been on steady rise. Microarray ...
Conference Paper
This section introduces knowledge discovery in databases (KDD) as a field that is driven by the need for knowledge derived from massive and varied data. We explain both the necessary role of KDD and the conditions under which KDD methods can be profitably ...
Article
KDnuggets newsletter has a new section of interviews with leaders in the field. This article presents the interview with Usama Fayyad, President and CEO of digiMine.
Article
The KDnuggets newsletter has a new section of interviews with leaders in the field. This article presents the interview with Jesus Mena, CEO of WebMiner.
Conference Paper
Full-text available
In this paper we examine the problem of comparing real-time predictive models and propose a number of measures for selecting the best model, based on a combination of accuracy, timeliness, and cost. We apply the measure to the real-time attrition problem
Article
This paper presents a first step towards a unifying framework for Knowledge Discovery in Databases. We describe links between data mining, knowledge discovery, and other related fields. We then define the KDD process and basic data mining algorithms, discuss application issues and conclude with an analysis of challenges facing practitioners in the...
Article
ABSTRACT In this paper, we describe the past 10 years of KDD and outline predictions for the next 10 years. Keywords Knowledge Discovery in Databases, Data Mining, KDD, History. 1. THE PRE-HISTORY OF KDD
Article
The information revolution is generating mountains of data, from sources as diverse as credit card transactions, telephone calls, Web clickstreams, space science, and human genome research. At the same time, faster and cheaper storage technology lets us store greater amounts of data online, and better database management system software provides ea...
Article
We consider the use of Monte Carlo methods to obtain maximum likelihood estimates for random effects models and distinguish between the pointwise and functional approaches. We explore the relationship between the two approaches and compare them with ...
Conference Paper
In assessing the potential of data mining based marlceting campaigns one needs to estimate the payoff of applying modeling to the problem of predicting behavior of some target population (e.g. attriters, people likely to buy product X, people likely to default on a loan, etc). This assessment has two components: a) the financial estimate of the cam...
Article
Full-text available
The automated discovery of knowledge in databases is becoming increasingly important as the world's wealth of data continues to grow exponentially. Knowledge-discovery systems face challenging problems from real-world databases which tend to be dynamic, incomplete, redundant, noisy, sparse, and very large. This paper addresses these problems and de...
Article
Full-text available
The Key Findings Reporter (KEFIR) is a system for discovering and explaining "key findings" in large, relational databases. This paper describes an application of KEFIR to the analysis of health-care information. The system performs an automatic analysis of data along multiple dimensions to determine the most interesting deviations of specific quan...
Article
Full-text available
One of the most promising areas in Knowledge Discovery in Databases is the automatic analysis of changes and deviations. Several systems have recently been developed for this task. Success of these systems hinges on their ability to identify a few important and relevant deviations among the multitude of potentially interesting events. In this paper...
Article
Full-text available
The Key Findings Reporter (KEFIR) is a system for discovering and explaining "key findings" in large, relational databases. This paper describes an application of KEFIR to the analysis of health-care information. The system performs an automatic analysis of data along multiple dimensions to determine the most interesting deviations of specific quan...
Article
This chapter discusses the issues of imperfect information in the fields of knowledge discovery in databases (KDD) and knowledge acquisition for expert systems (KA) . My perspective on these issues is more of a practitioner motivated by pressing application needs and less of the researcher motivated by the desire to push the frontiers of science.
Article
Ad hoc techniques - no longer adequate for sifting through vast collections of data - are giving way to data mining and knowledge discovery for turning corporate data into competitive business advantage.
Article
Knowledge Discovery in Databases creates the context for developing the tools needed to control the flood of data facing organizations that depend on ever-growing databases of business, manufacturing, scientific, and personal information.
Conference Paper
The rapid and constant growth of databases in business, government, and science has far outpaced our ability to interpret and make sense of this data avalanche, creating a need for a new generation of tools and techniques for intelligent and automated database analysis. These tools and techniques are the subject of the rapidly emerging field of dat...
Article
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related field...
Conference Paper
Full-text available
In many database marketing applications the goal is to predict the customer behavior based on their previous actions. A usual approach is to develop models which maximize accuracy on the training and test sets and then apply these models on the unseen data. We show that in order to maximize business payoffs, accuracy optimization is insufficient by...
Conference Paper
This paper surveys the growing number of indu5 trial applications of data mining and knowledge discovery. We look at the existing tools, describe some representative applications, and discuss the major issues and problems for building and deploy- ing successful applications and their adoption by business users. Finally, we examine how to assess the...
Chapter
Full-text available
Information by itself is a pretty thin meal, if not mixed with other ingredients. { Internet quote One of the most promising areas in Knowledge Discovery in Databases is the automatic analysis of deviations. Success in this task hinges on the ability to identify a few important and relevant events among the multitude of potentially interesting devi...
Conference Paper
This paper presents a first step towards a unifying framework for Knowledge Discovery in Databases. We describe finks between data milfing, knowledge dis- covery, and other related fields. We then define the KDD process and basic data mining algorithms, dis- cuss application issues and conclude with an analysis of challenges facing practitioners in...
Article
Full-text available
n Over 60 researchers from 10 countries took part in the Third Knowledge Dis-covery in Databases (KDD) Workshop, held during the Eleventh National Con-ference on Artificial Intelligence in Washington, D. C. A major trend evi-dent at the workshop was the transition to applications in the core KDD area of discovery of relatively simple patterns in re...
Article
As the number and size of very large databases continues to grow rapidly, so does the need to make sense of them. This need is addressed by the field called knowledge Discovery in Databases (KDD), which combines approaches from machine learning, statistics, intelligent databases, and knowledge acquisition. KDD encompasses a number of different disc...
Article
Knowledge-discovery systems face challenging problems from real-world databases, which tend to be dynamic, incomplete, redundant, noisy, sparse, and very large. These problems are addressed and some techniques for handling them are described. A model of an idealized knowledge-discovery system is presented as a reference for studying and designing n...
Conference Paper
I examine the state of the art in Knowledge Discovery in Databases and review progress in several research areas, including discovery of models, multistrategy discovery systems, and detection of changes and deviations. I describe a number of successful applications and discuss the remaining challenges for further research and application developmen...
Article
We describe the Knowledge Discovery Workbench, an interactive system for database exploration. We then illustrate KDW capabilities in data clustering, summarization, classification, and discovery of changes. We also examine extracting dependencies from data and using them to order the multitude of data patterns. © 1992 John Wiley & Sons, Inc.
Article
this article. 0738-4602/92/$4.00 1992 AAAI 58 AI MAGAZINE for the 1990s (Silberschatz, Stonebraker, and Ullman 1990)
Conference Paper
CALIDA provides integrated retrieval from multiple heterogeneous databases. Its major features include a menu guided interface with user defined macros, automatic join generation, estimation of the query result size, warning of expensive queries, automatic generation of target database queries, and transparent network access to the databases. CALID...
Article
A user or a comptuter system which extracts information from heterogeneous DBMSs faces the hard problem of dealing with different target database languages for each DBMS. We propose a solution for this problem based on an intermediate database language and its rule-based transformation to target languages. This solution is implemented in the databa...
Article
An intelligent database assistant called FRED is described. It uses artificial intelligence techniques and gives users substantial help in database selection, query formulation, and data interpretation. FRED provides querying in a cooperative natural language dialogue, automatic database selection, automatic query generation, and portable access to...
Conference Paper
We present a new method for estimating the number of tuples satisfying a condition of the type (histograms where buckets, instead of having equal width, have equal height). These distribution steps provide an upper bound on the error when estimating the number of tuples satisfying a condition. The estimation error can be arbitrarily reduced by incr...
Article
The problem of selecting secondary indices for a file so as to minimize the expected transaction cost was frequently analyzed before. We prove that it is NP-complete by reducing the MINIMUM SET COVER problem to it.
Article
nowledge discovery in databases (KDD), also referred to as data mining, is an area of common interest to re- searchers in machine discovery, statistics, databases, knowledge acquisition, machine learning, data visualization, high performance computing, and knowledge-based systems. The rapid growth of data and information has created a need and an o...
Article
I am pleased to present an interview with Dr. Usama Fayyad, conducted in October 2005. This interview was first published in KDnuggets News (www.kdnuggets.com/news) (05:n20, 05:n21, and 05:n22). Dr. Usama Fayyad is probably familiar to most data miners and KDnuggets readers. He has many outstanding accomplishments, including publishing many signifi...
Article
Typescript. Thesis (Ph.D.)--New York University, Graduate School of Arts and Science, 1984. Includes bibliographic references.

Network

Cited By