Damien FrançoisCatholic University of Louvain | UCLouvain
Damien François
Ph.D.
About
49
Publications
25,611
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,302
Citations
Publications
Publications (49)
This paper proposes a method for the automatic classification of heartbeats in an ECG signal. Since this task has specific characteristics such as time dependences between observations and a strong class unbalance, a specific classifier is proposed and evaluated on real ECG signals from the MIT arrhythmia database. This classifier is a weighted var...
One the earliest challenges a practitioner is faced with when using distance-based tools lies in the choice of the distance,
for which there often is very few information to rely on. This chapter proposes to find a compromise between an a priori unoptimized
choice (e.g. the Euclidean distance) and a fully-optimized, but computationally expensive, c...
Mode estimation is extensively studied in statistics. One of the most widely used methods of mode estimation is hill-climbing on a kernel density estimator with gradient ascent or a fixed-point approach. Within this framework, Gaussian kernels proves to be a natural and intuitive option for non-parametric density estimation. This paper shows that i...
This paper proposes a method to perform class-specific feature selection in multiclass support vector machines addressed with the one-against-all strategy. The main issue arises at the final step of the classification process, where binary classifier outputs must be compared one against another to elect the winning class. This comparison may be bia...
The diagnosis of cardiac dysfunctions requires the analysis of long-term ECG signal recordings, often containing hundreds to thousands of heart beats. In this work, automatic inter-patient classification of heart beats follow-ing AAMI guidelines is investigated. The prior of the normal class is by far larger than the other classes, and the classifi...
Supervised and interpatient classification of heart beats is primordial in many applications requiring long-term monitoring of the cardiac function. Several classification models able to cope with the strong class unbalance and a large variety of feature sets have been proposed for this task. In practice, over 200 features are often considered, and...
Data denoising can be achieved by approximating the data distribution and replacing each data item with an estimate of its closest mode. This idea has already been successfully applied to image denoising. The data then consists of pixel intensities or image patches, that is, vectorized groups of pixel intensities. The latter case raises the issue o...
Long-term ECG recordings are often required for the monitoring of the cardiac function in clinical applications. Due to the high number of beats to evaluate, inter-patient computer-aided heart beat classification is of great importance for physicians. The main difficulty is the extraction of discriminative features from the heart beat time series....
The selection of features that are relevant for a prediction or classification problem is an important problem in many domains involving high-dimensional data. Selecting features helps fighting the curse of dimensionality, improving the performances of prediction or classification methods, and interpreting the application. In a nonlinear context, t...
Aircraft engines are designed to be used during several tens of years. Their maintenance is a challenging and costly task, for obvious security reasons. The goal is to ensure a proper operation of the engines, in all conditions, with a zero probability of failure, while taking into account aging. The fact that the same engine is sometimes used on s...
Spectrometric data involve very high-dimensional observations representing sampled spectra. The correlation of the resulting spectral variables and their high number are two sources of difficulties in modeling. This paper proposes a supervised feature clustering algorithm that provides dimension reduction for this type of data in a classification c...
Prediction problems from spectra are largely encountered in chemometry. In addition to accurate predictions, it is often needed to extract information about which wavelengths in the spectra contribute in an effective way to the quality of the prediction. This implies to select wavelengths (or wavelength intervals), a problem associated to variable...
Many tools for data mining are complex and require skills and experience to be used successfully. Therefore, data mining is often considered an art as much as science. This paper presents some ideas on how to move forward from art to science, through the use of methodological standards and meta learning.
In many applications, like function approximation, pattern recognition, time series prediction, and data mining, one has to build a model relating some features describing the data to some response value. Often, the features that are relevant for building the model are not known in advance. Feature selection methods allow removing irrelevant and/or...
Combining the mutual information criterion with a forward feature selection strategy offers a good trade-off between optimality of the selected feature subset and computation time. However, it requires to set the parameter(s) of the mutual information estimator and to determine when to halt the forward procedure. These two choices are difficult to...
The large number of spectral variables in most data sets encountered in spectral chemometrics often renders the prediction of a dependent variable uneasy. The number of variables hopefully can be reduced, by using either projection techniques or selection methods; the latter allow for the interpretation of the selected variables. Since the optimal...
Data from spectrophotometers form vectors of a large number of exploitable variables. Building quantitative models using these variables most often requires using a smaller set of variables than the initial one. Indeed, a too large number of input variables to a model results in a too large number of parameters, leading to overfitting and poor gene...
Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, the relevance of the euclidean distance has been questioned...
Spectral data often have a large number of highly-correlated features, making feature selection both necessary and uneasy. A methodology combining hierarchical constrained clustering of spectral variables and selection of clusters by mutual information is proposed. The clustering allows reducing the number of features to be selected by grouping sim...
In spectrometric problems, objects are characterized by high-resolution spectra that correspond to hundreds to thousands of variables. In this context, even fast variable selection methods lead to high computational load. However, spectra are generally smooth and can therefore be accurately approximated by splines. In this paper, we propose to use...
Selecting relevant features in mass spectra analysis is important both for classification and search for causality. In this paper, it is shown how using mutual information can help answering to both objectives, in a model-free nonlinear way. A combination of ranking and forward selection makes it possible to select several feature groups that may l...
The estimation of mutual information for feature selection is often subject to inaccuracies due to noise, small sample size, bad choice of parameter for the estimator, etc. The choice of a threshold above which a feature will be considered useful is thus difficult to make. Therefore, the use of the permutation test to assess the reliability of the...
Nonlinear time-series prediction offers potential performance increases compared to linear models. Nevertheless, the enhanced complexity and computation time often prohibits an efficient use of nonlinear tools. In this paper, we present a simple nonlinear procedure for time-series forecasting, based on the use of vector quantization techniques; the...
Modern data analysis tools have to work on high-dimensional data, whose components are not independently distributed. High-dimensional spaces show surprising, counter-intuitive geometrical properties that have a large influence on the performances of data analysis tools. Among these properties, the concentration of the norm phenomenon results in th...
In the context of classification, the dissimilarity between data elements is often measured by a metric defined on the data space. Often, the choice of the metric is often disregarded and the Euclidean distance is used without further inquiries. This paper illustrates the fact that when other noise schemes than the white Gaussian noise are encounte...
Gaussian kernels are widely used in many data analysis tools such as Radial-Basis Function networks, Support Vector Machines and many others. Gaus-sian kernels are most often deemed to provide a local measure of similarity between vectors. In this paper, we show that Gaussian kernels are adequate measures of similarity when the representation dimen...
Spectrophotometric data often comprise a great number of numerical components or variables that can be used in calibration models. When a large number of such variables are incorporated into a particular model, many difficulties arise, and it is often necessary to reduce the number of spectral variables. This paper proposes an incremental (Forward–...
Un plan d’affaires est un document présentant de manière concise les éléments clefs qui décrivent un projet de création d’entreprise. Celui-ci est utilisé comme un outil parmi d’autres pour évaluer la faisabilité et la rentabilité d’un projet. Dans ce papier, nous étudierons la relation entre la qualité d’un plan d’affaires et la réussite de ce pro...
A business plan is a document presenting in a concise form the key elements (management, finance, marketing, ...) describing a percieved business opportunity. It is used among others as a tool for evaluating the feasibility and profitability of a project from the entrepreneur' or investor's point of view.
Classical nonlinear models for time series prediction exhibit improved capabilities compared to linear ones. Nonlinear regression has however drawbacks, such as overfitting and local minima problems, user-adjusted parameters, higher computation times, etc. There is thus a need for simple nonlinear models with a restricted number of learning paramet...
In line with the work of Delmar and Davidsson, (1998) which examines the types of distinct growth patterns that high-growth firms exhibit and how these growth patterns and corresponding firms differ from each other in terms of their demographic affiliation, this paper discusses the existence of different growth trajectories of start-ups. Using fina...
Modern data analysis often faces high-dimensional data. Nevertheless, most neural network data analysis tools are not adapted to high- dimensional spaces, because of the use of conventional concepts (as the Euclidean distance) that scale poorly with dimension. This paper shows some limitations of such concepts and suggests some research directions...
Classical nonlinear models for time series prediction exhibit im- proved capabilities compared to linear ones. Nonlinear regression has however drawbacks, such as overfitting and local minima problems, user-adjusted pa- rameters, higher computation times, etc. There is thus a need for simple nonlin- ear models with a restricted number of learning p...
The growth of firms created ex nihilo is a complex and barely understood process. Indeed, many firms do not evolve much after they are created. Others, a minority, do evolve and grow and hence contribute significantly to economic development, be it in terms of employment or value-added. It is therefore interesting to identify those promising firms...
Reducing the number of spectral variables in a calibration problem often allows building simpler, and more accurate, models. It furthermore brings information about the relevant frequency ranges, for which an interpretation can be sought. This paper proposes a hierarchical clustering approach, which takes into account the target variable, to merge...