About
91
Publications
16,899
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,112
Citations
Introduction
Additional affiliations
June 1983 - present
Publications
Publications (91)
In the current Big Data era, systems for collecting, storing and efficiently exploiting huge amounts of data are continually introduced, such as Hadoop, Apache Spark, Dremel, etc. Druid is one of theses systems especially designed to manage such data quantities, and allows to perform detailed real-time analysis on terabytes of data within sub-second...
Les index bitmap sont très utilisés dans les entrepôts de données et moteurs de recherche pour accélérer les requêtes d’interrogation. Leurs principaux avantages sont leur forme compacte et leur capacité à tirer profit du traitement parallèle de bits dans les CPU (bit-level parallelism). Dans l’ère actuelle du Big Data, les collections de données de...
RÉSUMÉ.Les index bitmap sont très utilisés dans les entrepôts de données et les moteurs de recherche. Leur capacité à exécuter efficacement des opérations binaires entre bitmaps améliore significativement les temps de réponse des requêtes. Cependant, sur des attributs de hautes cardinalités, ils consomment un espace mémoire important. Plusieurs techn...
Les index bitmap sont très utilisés dans les entrepôts de données et moteurs de recherche. Leur capacité à exécuter efficacement des opérations binaires entre bitmaps améliore significativement les temps de réponse des requêtes. Cependant, sur des attributs de hautes cardinalités, ils consomment un espace mémoire important. Ainsi, plusieurs technique...
Bitmap indexes are commonly used in databases and search engines. By
exploiting bit-level parallelism, they can significantly accelerate queries.
However, they can use much memory. Thus we might prefer compressed bitmap
indexes. Following Oracle's lead, bitmaps are often compressed using run-length
encoding (RLE). In this work, we introduce a new f...
The notion of information system initially introduced by Scott provides an efficient approach to represent various kinds of domains. In this note, a new type of information systems named finitely derived information systems is introduced. For this notion, the requirement for the consistency predicate used in Scott's information systems is simplifie...
In pattern mining and association rule mining, there is a variety of algorithms for mining frequent closed itemsets (FCIs) and frequent generators (FGs), whereas a smaller part further involves the precedence relation between FCIs. The interplay of these three constructs and their joint computation have been studied within the formal concept analys...
From the 1994 CAIS Conference: The Information Industry in Transition McGill University, Montreal, Quebec. May 25 - 27, 1994.Une méthode d'indexation automatique combinant une analyse syntaxique du texte avec les méthodes statistiques de ponderation est évaluée. Plusieurs combinaisons dans le choix des catégories de termes d'index et de méthodes de...
Formal concept analysis (FCA) provides an approach to restructuring important lattice structures such as complete lattices, distributive lattices and algebraic lattices. In this paper, we focus on the theoretical aspect of FCA and study the representation of algebraic domains by a special type of formal contexts. We first propose the notion of cons...
The effective construction of many association rule bases requires the computation of both frequent closed and frequent generator itemsets (FCIs/FGs). However, these two tasks are rarely combined. Most of the existing solutions apply levelwise breadth-first traversal, though depth-first traversal, depending on data characteristics, is often superio...
Beside its central place in FCA, the task of constructing the concept lattice, i.e., concepts plus Hasse diagram, has attracted some interest within the data mining (DM) field, primarily to support the mining of association rule bases. Yet most FCA algorithms do not pass the scalability test fundamental in DM. We are interested in the iceberg part...
The effective construction of many association rule bases re- quires the computation of both frequent closed and frequent generator itemsets (FCIs/FGs). However, only few miners address both concerns, typically by applying levelwise breadth-first traversal. As depth-first traversal is known to be superior, we examine here the depth-first FCI/FG-min...
Formal concept analysis (FCA) is increasingly applied to data mining problems, essentially as a formal framework for mining re- duced representations (bases) of target pattern families. Yet most of the FCA-based miners, closed pattern miners, would only extract the pat- terns themselves out of a dataset, whereas the generality order among patterns...
Increasingly, business projects are ephemeral. New Business Intelligence tools must support ad-lib data sources and quick perusal. Meanwhile, tag clouds are a popular community-driven visualization technique. Hence, we investigate tag-cloud views with support for OLAP operations such as roll-ups, slices, dices, clustering, and drill-downs. As a cas...
Recommender systems are considered as an answer to the information overload in a web environment. Such systems recommend items (movies, music, books, news, web pages, etc.) that the user should be interested in. Collaborative filtering recommender systems have a huge success in commercial applications. The sales in these applications follow a power...
The effective construction of many association rule bases requires the computation of both frequent closed and frequent generator itemsets (FCIs/FGs). However, these two tasks are rarely combined. Most of the existing solutions apply levelwise breadth-first traversal, though depth-first traversal, depending on data characteristics, is often superio...
Frequent closures (FCIs) and generators (FGs) as well as the precedence relation on FCIs are key components in the definition of a variety of association rule bases. Although their joint computation has been studied in concept analysis, no scalable algorithm exists for the task at present. We propose here to reverse a method from the latter field u...
Association rule mining from a transaction database (TDB) requires the detection of frequently occurring patterns, called frequent itemsets (FIs), whereby the number of FIs may be potentially huge. Recent approaches for FI mining use the closed itemset paradigm to limit the mining effort to a subset of the entire FI family, the frequent closed item...
An Automated Recommender System plays an essential role in e-commerce applications. Such systems try to recommend items (movies, music, books, news, etc.) which the user should be interested in. The spectrum of proposed recommendation algorithms are based on information including content of the items, ratings of the users, and demographic informati...
Increasingly, business projects are ephemeral. New Business Intelligence tools must support ad-lib data sources and quick perusal. Meanwhile, tag clouds are a popular community-driven visualization technique. Hence, we investigate tag-cloud views with support for OLAP operations such as roll-ups, slices, dices, clustering, and drill-downs. As a cas...
Automated recommender systems play an important role in e-commerce applications. Such systems recommend items (movies, music, books, news, web pages, etc.) that the user should be interested in. These systems hold the promise of delivering high quality recommendations. However, the incredible growth of users and applications poses some challenges f...
Increasingly, business projects are ephemeral. New Business Intelligence
tools must support ad-lib data sources and quick perusal. Meanwhile, tag
clouds are a popular community-driven visualization technique. Hence, we
investigate tag-cloud views with support for OLAP operations such as
roll-ups, slices, dices, clustering, and drill-downs. As a cas...
Many advanced models have been developed for information retrieval over the last years. These models are built on various artificial intelligence paradigms to improve the precision of the retrieval. Most of them exploit some form of term co-occurrences to improve retrieval quality. In this paper, we compare the retrieval performance of five of thes...
Neural network is an important paradigm that has received little attention from the community of researchers in information retrieval, especially the auto-associative neural networks. These networks are capable of discovering patterns of terms among documents. We propose an auto-associative neural network to model the classification and to perform...
In short-lived transactions, database systems ensure atomicity by either committing all of the elements of the transaction, or canceling all of them in case of an error. With long-running processes, the notion of transaction takes on a different meaning, and it is no longer possible to rely on database managed transactions: we need so called compen...
Minimal generators (mingens) of concept intents are valuable elements of the Formal Concept Analysis (FCA) landscape, which
are widely used in the database field, for data mining but also for database design purposes. The volatility of many real-world
datasets has motivated the study of the evolution in the concept set under various modifications o...
Minimal generators (or mingen) constitute a remarkable part of the closure space landscape since they are the antipodes of the closures, i.e., minimal sets in the underlying equivalence relation over the powerset of the ground set. As such, they appear in both theoretical and practical problem settings related to closures that stem from fields as d...
The class hierarchy is an important aspect of object-oriented software development. Design and maintenance of such a hierarchy is a difficult task that is often accomplished without any clear guidance or tool support. Formal con- cept analysis provides a natural theoretical framework for this problem because it can guarantee maximal factorization w...
Frequent Pattern Mining (FPM) is a very powerful paradigm for mining informative and useful patterns in massive, complex datasets. In this paper we propose the Data Mining Template Library, a collection of generic containers and algorithms for FPM, as well as persistency and database management classes. DMTL provides a systematic solution to a whol...
Information retrieval is concerned with the classification processes and the selective recovery of information. Improvements in this field are mainly sought at the core level of the engine's classification capabilities and by query enhancement processes. The later one became the prime interest of researchers since less progress has been made on the...
Data mining (DM) is the extraction of regularities from raw data, which are further transformed within the wider process of knowledge discovery in databases (KDD) into non-trivial facts intended to support decision making. Formal concept analysis (FCA) offers an appropriate framework for KDD, whereby our focus here is on its potential for DM suppor...
Galois (concept) lattice theory has been successfully applied to the resolution of the association rule problem in data mining. In particular, structural results about lattices have been used in the design of e#cient procedures for mining the frequent patterns (itemsets) in a transaction database.
Our research centers around exploring methodologies for developing reusable software, and developing methods and tools for building inter-enterprise information systems with reusable components. In this paper, we focus on an experiment in which different component indexing and retrieval methods were tested. The results are surprising. Earlier work...
Concept lattices provide a theoretical framework for the efficient resolution of the association rule problem. The paper describes an extension to the underlying approach as a contribution to the issue of incrementa data mining. In particular, we propose an incrementa agorithm for mining frequent closed itemsets (FCIs) that is based on our most rec...
Galois (concept) lattice theory has been successfully applied to the resolution of the association rule problem in data mining. In particular, structural results about lattices have been used in the design of efficient procedures for mining the frequent patterns (it emsets) in a transaction database. As transaction databases are often dynamic, we p...
This book focuses on recent developments in representational and processing aspects of complex data-intensive applications.
Until recently, information systems have been designed around different business functions, such as accounts payable and inventory control. Object-oriented modeling, in contrast, structures systems around the data—the objects—...
The next generation of task support systems aims at providing the relevant knowledge to knowledge-dependent activities, reducing the effort required to find or infer this knowledge from the data. In order to exploit knowledge associated with (or embedded in) databases and to migrate from data to knowledge management environments, conceptual modelin...
Classification is a central concept in object-oriented approaches such as object-oriented programming, object-oriented knowledge representation systems (including description logics), object-oriented databases, software engineering and information retrieval.
Nevertheless, research works on classification have often been carried out separately withi...
Classification is a central concept in object-oriented approaches such as object-oriented programming, object-oriented knowledge representation systems (including description logics), object-oriented databases, software engineering and information retrieval. Nevertheless, research works on classification have often been carried out separately withi...
In order to exploit knowledge embedded in databases and to migrate from data to knowledge management environments, conceptual modeling languages must offer more expressiveness than traditional modeling languages. This paper proposes the conceptual graph formalism as such a modeling language. It shows through an example and a comparison with Telos,...
During the evolution of object-oriented (OO) systems, the
preservation of a correct design should be a permanent quest. However,
for systems involving a large number of classes and that are subject to
frequent modifications, the detection and correction of design flaws may
be a complex and resource-consuming task. The use of automatic detection
and...
In (Godin et al., 1995a) we proposed an incremental conceptual clustering algorithm, derived from lattice theory (Godin et al., 1995b), which is fast to compute (Mineau & Godin, 1995). This algorithm is especially useful when dealing with large data or knowledge bases, making classification structures1 available to large size applications like thos...
Abstract Ever since the advent of the public network Internet, the quantity of available information is rapidly rising. One ofthe most important use of this public network is to find information. In such a huge,and unstable information collection, today’s greatest problem is to find relevant information. This paper presents the development of Intel...
During the evolution of object-oriented (OO) systems, the
preservation of correct design should be a permanent quest. However, for
systems involving a large number of classes and which are subject to
frequent modifications, the detection and correction of design flaws may
be a complex and resource-consuming task. Automating the detection and
correc...
Learning concepts and rules from structured (complex) objects is a quite challenging but very relevant problem in the area of machine learning and knowledge discovery. In order to take into account and exploit the semantic relationships that hold between atomic components of structured objects, we propose a knowledge discovery process, which starts...
A controlled experiment was conducted comparing information retrieval using a Galois lattice structure with two more conventional retrieval methods: navigating in a manually built hierarchical classification and Boolean querying with index terms. No significant performance difference was found between Boolean querying and the Galois lattice retriev...
. The Galois (or concept) lattice produced from a binary relation has been proved useful for many applications. Building the Galois lattice can be considered as a conceptual clustering method since it results in a concept hierarchy. This article presents incremental algorithms for updating the Galois lattice and corresponding graph, resulting in an...
An important structuring mechanism for knowledge bases is building an inheritance hierarchy of classes based on the content of their knowledge objects. The hierarchy can be used to handle several query processing tasks more efficiently. Building and maintaining this hierarchy is a difficult task for the knowledge engineer. The notion of knowledge s...
Building and maintaining the class hierarchy has been recognized as an important but one of the most difficult activities of object-oriented design. Concept (or Galois) lattices and related structures are presented as a framework for dealing with the design and maintenance of class hierarchies. Because the design of class hierarchies is inherently...
Many algorithms are proposed for building class hierarchies from the specification of their properties. Among those, the algorithms proposed in (Dicky 1994) and (Godin 1995a) preserve the Galois sub-hierarchy of the relationship between the classes and their properties. Furthermore, the algorithms are incremental and can therefore incorporate a new...
There has been a lot of interest recently in the problem of
building object oriented applications by somehow combining other
application fragments that provide their own overlapping definitions or
expectations of the same domain objects. We propose an approach based on
the split objects model of prototype languages whereby an application
object is...
This paper presents a methodology for handling an important step of database migration. The methodology is based on a set of techniques: (i) semantic clustering, (ii) metamodeling, and (iii) knowledge-based schema transformation. Semantic clustering (i.e., grouping based on semantic cohesion) is mainly used to facilitate the process of translating...
. With the advent of object-oriented database systems, there is an urgent need to define a methodology for mapping a conceptual schema into an object-oriented one, and migrating from a conventional database to an object-oriented database containing complex objects. This paper deals with an important step of the migration process by describing a tec...
Our research centers around exploring methodologies for developing reusable software, and developing methods and toofs for building with reusable software. In this paper, we focus on reusable software component retrieval methods that were developed and teated in the context of ClassServer, an experimental library tool developed at the University of...
An automatic indexing method based on syntactical text analysis combined with statistical analysis is proposed and evaluated. Many combinations for the choice of term categories and weighting methods are tested. The experiment conducted on a software engineering corpus shows systematic improvement in the use of syntactic term phrases compared to us...
An important structuring mechanism for knowledge bases is building
an inheritance hierarchy of classes based on the content of their
knowledge objects. This hierarchy facilitates group-related processing
tasks such as answering set queries, discriminating between objects,
finding similarities among objects, etc. Building this hierarchy is a
difficu...
Software reuse is one of the most advertised advantages of object-orientation. Inheritance, in all its forms, plays an important part in achieving greater reuse, at all stages of development. Class hierarchies start taking shape at the analysis level, where classes that share application-significant data and application-meaningful external behavior...
A tool for generating the interface hierarchy of a set of classes for the Smalltalk-80 library is described. The tool can be useful for analyzing the class library or simply for reuse purposes by providing an alternative view that is closer to the client's perspective than the inheritance hierarchy. The interface of each class to consider is extrac...
This paper describes an approach to software reuse that involves generating and retrieving abstractions from existing software systems using concept formation methods. The potential of the approach is illustrated through two important activities of the reuse process. First, the concept hierarchy generated by the concept formation methods is used fo...
this report, we discuss the methods and tools we implemented to support the various categorization (classification, or indexing)
potentielles. L'utilisation de ces structures est illustrée par diverses d'applications: recherche documentaire, réutilisation, conception des hiérarchies de classes, génération de règles d'implication à partir de bases de données, acquisition et organisation des connaissances. ABSTRACT. Several structures used for conceptual clustering are present...
Based on the Galois lattice theory, we propose a concept formation approach to discover new concepts and implication rules from data. The Galois lattice of a binary relation between a set of objects and a set of properties (descriptors) is a concept hierarchy in which each node represents a subset of objects with their common properties.
In this pa...
A tool for generating the interface hierarchy of a set of classes for the Smalltalk-80 library is described. The tool can be useful for analyzing the class library or simply for reuse purposes by providing an alternative view that is closer to the client's perspective than the inheritance hierarchy. The interface of each class to consider is extrac...
In addition to being a technique for classifying objects and defining concepts from data, the concept lattice may be exploited to discover relationships among descriptors or attributes. This paper addresses the problem of generating implication rules, and shows that the lattice is an interesting framework for functional dependency generation and ch...
In addition to being a technique for classifying objects and defining concepts from data, the concept lattice may be exploited to discover functional dependencies as well as exact and approximate (probabilistic) implication rules between properties (descriptors). This paper presents algorithms for rule generation and shows that the lattice is an in...
An incremental algorithm for updating the Galois lattice is
proposed where new objects may be dynamically added by modifying the
existing lattice. A large experimental application reveals that adding a
new object may be done in time proportional to the number of objects on
the average. When there is a fixed upper bound on the number of
properties r...
Addresses the role of specific types of semantic constraints in
the process of query optimization. A new type of dependency is
introduced, called inter-relational functional dependency, which is an
extension of functional dependencies to two or more relations having
common attributes. A complete axiomatization is proposed with an
O(n<sup>3</sup>) d...
The need to add an automatic learning phase to the construction
process of a knowledge base is stressed. This work introduces a new
technique based on machine learning methodologies which automatically
creates a particular knowledge representation structure called knowledge
space, from which common generalizations of knowledge objects may be
effici...
In this paper, we propose a graph theoretic approach to deal with the implication problem for inclusion dependencies. By analogy with functional dependencies, we define and present algorithms for computing the following concepts: the closure of a relation scheme R for X according to a set of inclusion dependencies and the minimal cover for inclusio...
This paper shows how automatic symbolic classification of all knowledge objects in a knowledge base can alleviate the task of knowledge acquisition. It presents a knowledge representation structure, called knowledge space, that permits such symbolic classification. Simple and efficient algorithms which create the structure are also presented1.
In conventional Boolean retrieval systems, users have difficulty controlling the amount of output obtained from a given query. This paper describes the design of a user interface which permits gradual enlargement or refinement of the user's query by browsing through a graph of term and document subsets. This graph is obtained from a lattice automat...
In conventional Boolean retrieval systems, users have difficulty controlling the amount of output obtained from a given query. This paper describes the design of a user interface which permits gradual enlargement or refinement of the user's query by browsing through a graph of term and document subsets. This graph is obtained from a lattice automat...
This article describes a new approach to database access suitable for browsing. The underlying data model consists of a number of objects, easily described by a variable number of keywords (simple or qualified). Navigation is performed in terms of certain subsets of keywords and objects (called contexts), which are shown to form a lattice. The comp...
RÉSUMÉ. L'article décrit un algorithme, appelé JEN, d'identification efficace des générateurs à partir du treillis de concepts (Galois) en vue de produire des règles d'association exactes et approximatives. Une analyse comparative empirique illustre la supériorité de JEN sur trois autres procédures de génération de règles et de générateurs, particu...