Publications (36) View all
-
Article: Explaining the genetic basis of complex quantitative traits through prediction models.
[show abstract] [hide abstract]
ABSTRACT: The functional characterization of genes involved in many complex traits (phenotypes) of plants, animals, or humans can be studied from a computational point of view using different tools. We propose prediction--from the machine learning point of view--to search for the genetic basis of these traits. However, trying to predict an exact value of a phenotype can be too difficult to obtain a confident model, but predicting an approximation, in the form of an interval of values, can be easier. We shall see that trustable and useful models can be obtained from this relaxed formulation. These predictors may be built as extensions of conventional classifiers or regressors. Although the prediction performance in both cases are similar, we show that, from the classification field, it is straightforward to obtain a principled and scalable method to select a reduced set of features in these genetic learning tasks. We conclude by comparing the results so achieved in a real-world data set of barley plants with those obtained with state-of-the-art methods used in the biological literature.Journal of computational biology: a journal of computational molecular cell biology 12/2010; 17(12):1711-23. · 1.69 Impact Factor -
SourceAvailable from: Elena Montañés
Chapter: Ranked Tag Recommendation Systems Based on Logistic Regression
[show abstract] [hide abstract]
ABSTRACT: This work proposes an approach to tag recommendation based on a logistic regression based system. The goal of the method is to support users of current social network systems by providing a rank of new meaningful tags for a resource. This system provides a ranked tag set and it feeds on different posts depending on the resource for which the user requests the recommendation. The performance of this approach is tested according to several evaluation measures, one of them proposed in this paper (F1+F_1^+). The experiments show that this learning system outperforms certain benchmark recommenders.06/2010: pages 237-244; -
SourceAvailable from: Miguel Pérez-Enciso
Article: Genetical genomics: use all data.
Miguel Pérez-Enciso, José R Quevedo, Antonio Bahamonde[show abstract] [hide abstract]
ABSTRACT: Genetical genomics is a very powerful tool to elucidate the basis of complex traits and disease susceptibility. Despite its relevance, however, statistical modeling of expression quantitative trait loci (eQTL) has not received the attention it deserves. Based on two reasonable assertions (i) a good model should consider all available variables as potential effects, and (ii) gene expressions are highly interconnected, we suggest that an eQTL model should consider the rest of expression levels as potential regressors, in addition to the markers. It is shown that power can be increased with this strategy. We also show, using classical statistical and support vector machines techniques in a reanalysis of public data, that the external transcripts, i.e., transcripts other than the one being analysed, explain on average much more variability than the markers themselves. The presence of eQTL hotspots is reassessed in the light of these results. Model choice is a critical yet neglected issue in genetical genomics studies. Although we are far from having a general strategy for model choice in this area, we can at least propose that any transcript level is scanned not only for the markers genotyped but also for the rest of gene expression levels. Some sort of stepwise regression strategy can be used to select the final model.BMC Genomics 02/2007; 8:69. · 4.07 Impact Factor -
Conference Proceeding: Viability of an alarm predictor for coffee rust disease using interval regression
Oscar Luaces, Luiz Henrique A Rodrigues, Carlos Alberto Alves Meira, José R Quevedo, Antonio BahamondeProceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II; -
Article: Improving the discriminatory power of a near-infrared microscopy spectral library with a support vector machine classifier.
[show abstract] [hide abstract]
ABSTRACT: A multi-group classifier based on the support vector machine (SVM) has been developed for use with a library of 48,456 spectra measured by near-infrared reflection microscopy (NIRM) on 227 samples representing 26 animal feed ingredients and 4 possible contaminants of animal origin. The performance of the classifier was assessed by a five-fold cross-validation, dividing at the sample level. Although the overall proportion of misclassifications was 27%, almost all of these involved the confusion of pairs of similar ingredients of vegetable origin. Such confusions are unimportant in the context of the intended use of the library, which is the detection of banned ingredients in animal feed. The error rate in discrimination between permitted and banned ingredients was just 0.17%. The performance of the SVM classifier was substantially better than that of the K-nearest-neighbors method employed in previous work with the same library, for which the comparable error rates are 36% overall and 0.39% for permitted versus banned ingredients.Applied Spectroscopy 01/2010; 64(1):66-72. · 1.66 Impact Factor