Conference Paper

Discovering Relevancies in Very Difficult Regression Problems: Applications to Sensory Data Analysis.

Conference: Proceedings of the 16th Eureopean Conference on Artificial Intelligence, ECAI'2004, including Prestigious Applicants of Intelligent Systems, PAIS 2004, Valencia, Spain, August 22-27, 2004
Source: DBLP

ABSTRACT Learning preferences is a useful tool in application fields like information retrieval, or system configuration. In this paper we show a new application of this Machine Learning tool, the analysis of sensory data provided by consumer panels. These data sets collect the ratings given by a set of consumers to the quality or the acceptability of market products that are principally appreciated through sensory impressions. The aim is to improve the production processes of food industries. We show how these data sets can not be processed in a useful way by regression methods, since these methods can not deal with some subtleties implicit in the available knowledge. Using a collection of real world data sets, we illustrate the benefits of our approach, showing that it is possible to obtain useful models to explain the behavior of consumers where regression methods only predict a constant reaction in all consumers, what is unacceptable.

Download full-text

Full-text

Available from: Jorge Díez, Mar 27, 2014
0 Followers
 · 
114 Views
 · 
52 Downloads
  • Source
    • "Fortunately, despite the graders' biases, the ranking entailed by their assessments is coherent with the ground truth. In other words, the grades can be unreliable but the order is, in general, correctly assessed [11] [8] [9] [2]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Evaluating open-response assignments in Massive Open Online Courses is a difficult task because of the huge number of students involved. Peer grading is an effective method to address this problem. There are two basic approaches in the literature: cardinal and ordinal. The first case uses grades assigned by student-graders to a set of assignments of other colleagues. In the ordinal approach, the raw materials used by grading systems are the relative orders that graders appreciate in the assignments that they evaluate. In this paper we present a factorization method that seeks a trade-off between cardinal and ordinal approaches. The algorithm learns from preference judgments to avoid the subjectivity of the numeric grades. But in addition to preferences expressed by student-graders, we include other preferences: those induced from assignments with significantly different average grades. The paper includes a report of the results obtained using this approach in a real world dataset collected in 3 universities of Spain, A Coruña, Pablo de Olavide at Sevilla, and Oviedo at Gijón. Additionally, we studied the sensitivity of the method with respect to the number of assignments graded by each student. Our method achieves similar or better scores than staff instructors when we measure the discrepancies with other instructor’s grades.
    Knowledge-Based Systems 05/2015; DOI:10.1016/j.knosys.2015.05.019 · 3.06 Impact Factor
  • Source
    • "Unfortunately , this approach leads us to deal with datasets of size n 2 when the original size of S is only n. This mean that some applications become intractable, although other times this approach was successfully used [5] [6] [7] [8] [3] [9]. To alleviate the difficulties caused by the size of datasets, the main problem is that (as happens with the AUC) Herbrich's loss function can not be expressed as a sum of disagreements or errors produced by each input x i ∈ X. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Learning tasks where the set Y of classes has an ordering relation arise in a number of important application fields. In this context, the loss function may be defined in different ways, ranging from multiclass classification to ordinal or metric regression. However, to consider only the ordered structure of Y, a measure of goodness of a hypothesis h has to be related to the number of pairs whose relative ordering is swapped by h. In this paper, we present a method, based on the use of a multivariate version of Support Vector Machines (SVM) that learns to order minimizing the number of swapped pairs. Finally, using benchmark datasets, we compare the scores so achieved with those found by other alternative approaches.
  • Source
    • "In some applications, selection is probably too rigid, and what we really need is just a ranking of variables. Other times, ranking variables is used as a first step towards selection, as in (Guyon and Elisseeff, 2003; del Coz et al., 2005; Díez et al., 2004; Luaces et al., 2004). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The selection of a subset of input variables is often based on the previous construction of a ranking to order the variables according to a given criterion of relevancy. The objective is then to linearize the search, estimating the quality of subsets containing the topmost ranked variables. An algorithm devised to rank input variables according to their usefulness in the context of a learning task is presented. This algorithm is the result of a combination of simple and classical techniques, like correlation and orthogonalization, which allow the construction of a fast algorithm that also deals explicitly with redundancy. Additionally, the proposed ranker is endowed with a simple polynomial expansion of the input variables to cope with nonlinear problems. The comparison with some state-of-the-art rankers showed that this combination of simple components is able to yield high-quality rankings of input variables. The experimental validation is made on a wide range of artificial data sets and the quality of the rankings is assessed using a ROC-inspired setting, to avoid biased estimations due to any particular learning algorithm.
    Computational Statistics & Data Analysis 09/2007; 52(1):578-595. DOI:10.1016/j.csda.2007.02.003 · 1.15 Impact Factor
Show more