[Show abstract][Hide abstract] ABSTRACT: Cardiovascular disease risk increases when lipoprotein metabolism is dysfunctional. We have developed a computational model able to derive indicators of lipoprotein production, lipolysis, and uptake processes from a single lipoprotein profile measurement. This is the first study to investigate whether lipoprotein metabolism indicators can improve cardiovascular risk prediction and therapy management.
We calculated lipoprotein metabolism indicators for 1981 subjects (145 cases, 1836 controls) from the Framingham Heart Study offspring cohort in which NMR lipoprotein profiles were measured. We applied a statistical learning algorithm using a support vector machine to select conventional risk factors and lipoprotein metabolism indicators that contributed to predicting risk for general cardiovascular disease. Risk prediction was quantified by the change in the Area-Under-the-ROC-Curve (ΔAUC) and by risk reclassification (Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI)). Two VLDL lipoprotein metabolism indicators (VLDLE and VLDLH) improved cardiovascular risk prediction. We added these indicators to a multivariate model with the best performing conventional risk markers. Our method significantly improved both CVD prediction and risk reclassification.
Two calculated VLDL metabolism indicators significantly improved cardiovascular risk prediction. These indicators may help to reduce prescription of unnecessary cholesterol-lowering medication, reducing costs and possible side-effects. For clinical application, further validation is required.
PLoS ONE 01/2014; 9(3):e92840. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We propose a novel sparse preference learning/ranking algorithm. Our
algorithm approximates the true utility function by a weighted sum of basis
functions using the squared loss on pairs of data points, and is a
generalization of the kernel matching pursuit method. It can operate both in a
supervised and a semi-supervised setting and allows efficient search for
multiple, near-optimal solutions. Furthermore, we describe the extension of the
algorithm suitable for combined ranking and regression tasks. In our
experiments we demonstrate that the proposed algorithm outperforms several
state-of-the-art learning methods when taking into account unlabeled data and
performs comparably in a supervised learning scenario, while providing sparser
[Show abstract][Hide abstract] ABSTRACT: In this paper, an overview of state-of-the-art techniques for premise selection in large theory mathematics is provided, and new premise selection techniques are introduced. Several evaluation metrics are introduced, compared and their appropriateness is discussed in the context of automated reasoning in large theory mathematics. The methods are evaluated on the MPTP2078 benchmark, a subset of the Mizar library, and a 10% improvement is obtained over the best method so far.
Proceedings of the 6th international joint conference on Automated Reasoning; 06/2012
[Show abstract][Hide abstract] ABSTRACT: Activity regulated neurotransmission shapes the computational properties of a neuron and involves the concerted action of many proteins. Classical, intuitive working models often assign specific proteins to specific steps in such complex cellular processes, whereas modern systems theories emphasize more integrated functions of proteins. To test how often synaptic proteins participate in multiple steps in neurotransmission we present a novel probabilistic method to analyze complex functional data from genetic perturbation studies on neuronal secretion. Our method uses a mixture of probabilistic principal component analyzers to cluster genetic perturbations on two distinct steps in synaptic secretion, vesicle priming and fusion, and accounts for the poor standardization between different studies. Clustering data from 121 perturbations revealed that different perturbations of a given protein are often assigned to different steps in the release process. Furthermore, vesicle priming and fusion are inversely correlated for most of those perturbations where a specific protein domain was mutated to create a gain-of-function variant. Finally, two different modes of vesicle release, spontaneous and action potential evoked release, were affected similarly by most perturbations. This data suggests that the presynaptic protein network has evolved as a highly integrated supramolecular machine, which is responsible for both spontaneous and activity induced release, with a group of core proteins using different domains to act on multiple steps in the release process.
[Show abstract][Hide abstract] ABSTRACT: Smart premise selection is essential when using automated reasoning as a tool
for large-theory formal proof development. A good method for premise selection
in complex mathematical libraries is the application of machine learning to
large corpora of proofs. This work develops learning-based premise selection in
two ways. First, a newly available minimal dependency analysis of existing
high-level formal mathematical proofs is used to build a large knowledge base
of proof dependencies, providing precise data for ATP-based re-verification and
for training premise selection algorithms. Second, a new machine learning
algorithm for premise selection based on kernel methods is proposed and
implemented. To evaluate the impact of both techniques, a benchmark consisting
of 2078 large-theory mathematical problems is constructed,extending the older
MPTP Challenge benchmark. The combined effect of the techniques results in a
50% improvement on the benchmark over the Vampire/SInE state-of-the-art system
for automated reasoning in large theories.
Journal of Automated Reasoning 08/2011; 52(2). · 0.47 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In recent years, large corpora of formally expressed knowledge have become available in the fields of formal mathematics,
software verification, and real-world ontologies. The Learning2Reason project aims to develop novel machine learning methods
for computer-assisted reasoning on such corpora. Our global research goals are to provide good methods for selecting relevant
knowledge from large formal knowledge bases, and to combine them with automated reasoning methods.
Intelligent Computer Mathematics - 18th Symposium, Calculemus 2011, and 10th International Conference, MKM 2011, Bertinoro, Italy, July 18-23, 2011. Proceedings; 01/2011
[Show abstract][Hide abstract] ABSTRACT: Situations when only a limited amount of labeled data and a large amount
of unlabeled data are available to the learning algorithm are typical
for many real-world problems. To make use of unlabeled data in
preference learning problems, we propose a semisupervised algorithm that
is based on the multiview approach. Our algorithm, which we call Sparse
Co-RankRLS, minimizes a least-squares approximation of the ranking error
and is formulated within the co-regularization framework. It operates by
constructing a ranker for each view and by choosing such ranking
prediction functions that minimize the disagreement among all of the
rankers on the unlabeled data. Our experiments, conducted on real-world
dataset, show that the inclusion of unlabeled data can improve the
prediction performance significantly. Moreover, our semisupervised
preference learning algorithm has a linear complexity in the number of
unlabeled data items, making it applicable to large datasets.
[Show abstract][Hide abstract] ABSTRACT: In different fields like decision making, psychology, game theory and biology, it has been observed that paired-comparison data like preference relations defined by humans and animals can be intransitive. Intransitive relations cannot be modeled with existing machine learning methods like ranking models, because these models exhibit strong transitivity properties. More specifically, in a stochastic context, where often the reciprocity property characterizes probabilistic relations such as choice probabilities, it has been formally shown that ranking models always satisfy the well-known strong stochastic transitivity property. Given this limitation of ranking models, we present a new kernel function that together with the regularized least-squares algorithm is capable of inferring intransitive reciprocal relations in problems where transitivity violations cannot be considered as noise. In this approach it is the kernel function that defines the transition from learning transitive to learning intransitive relations, and the Kronecker-product is introduced for representing the latter type of relations. In addition, we empirically demonstrate on two benchmark problems, one in game theory and one in theoretical biology, that our algorithm outperforms methods not capable of learning intransitive reciprocal relations.
European Journal of Operational Research 11/2010; · 1.84 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this paper, we introduce a framework for regularized least-squares (RLS) type of ranking cost functions and we propose
three such cost functions. Further, we propose a kernel-based preference learning algorithm, which we call RankRLS, for minimizing
these functions. It is shown that RankRLS has many computational advantages compared to the ranking algorithms that are based
on minimizing other types of costs, such as the hinge cost. In particular, we present efficient algorithms for training, parameter
selection, multiple output learning, cross-validation, and large-scale learning. Circumstances under which these computational
benefits make RankRLS preferable to RankSVM are considered. We evaluate RankRLS on four different types of ranking tasks using
RankSVM and the standard RLS regression as the baselines. RankRLS outperforms the standard RLS regression and its performance
is very similar to that of RankSVM, while RankRLS has several computational benefits over RankSVM.
[Show abstract][Hide abstract] ABSTRACT: We propose the kernel principal component ranking algo- rithm (KPCRank) for learning preference relations. The algorithm can be considered as an extension of nonlinear principal component regres- sion applicable to preference learning task. It is particularly suitable for learning from noisy datasets where a lower dimensional data representa- tion preserves most expressive features. In many cases near-linear depen- dence of regressors (multicollinearity) can notably decrease performance of the learning algorithm, however, KPCRank can eectively deal with this situation. It is accomplished by projecting the data onto p-principal components in the feature space dened by a positive denite kernel and consecutive learning of the ranking function. Despite the fact that the number of the pairwise preferences is quadratic, the training time of KPCRank scales linearly with the number of data points in the training set and is equal to that of the principal component regression. We com- pare the algorithm to several ranking and regression methods, including probabilistic regression on pairwise comparison data. Our experiments demonstrate that the performance of KPCRank is better than that of the baseline methods, when learning to rank from the data corrupted by noise.
Solid State Communications - SOLID STATE COMMUN. 01/2009;
[Show abstract][Hide abstract] ABSTRACT: We propose a framework for constructing kernels that take advantage of local correlations in sequential data. The kernels
designed using the proposed framework measure parse similarities locally, within a small window constructed around each matching
feature. Furthermore, we propose to incorporate positional information inside the window and consider different ways to do
this. We applied the kernels together with regularized least-squares (RLS) algorithm to the task of dependency parse ranking
using the dataset containing parses obtained from a manually annotated biomedical corpus of 1100 sentences. Our experiments
show that RLS with kernels incorporating positional information perform better than RLS with the baseline kernel functions.
This performance gain is statistically significant.
[Show abstract][Hide abstract] ABSTRACT: During past decade, kernel methods have proved to be successful in different text analysis tasks. There are several reasons
that make kernel based methods applicable to many real world problems especially in domains where data is not naturally represented
in a vector form. Firstly, instead of manual construction of the feature space for the learning task, kernel functions provide
an alternative way to design useful features automatically, therefore, allowing very rich representations. Secondly, kernels
can be designed to incorporate a. prior knowledge about the domain. This property allows to notably improve performance of
the general learning methods and their simple adaptation to the specific problem. Finally, kernel methods are naturally applicable
in situations where data representation is not in a vectorial form, thus avoiding extensive preprocessing step. In this chapter,
we present the main ideas behind kernel methods in general and kernels for text analysis in particular as well as provide
an example of designing feature space for parse ranking problem with different kernel functions.
[Show abstract][Hide abstract] ABSTRACT: We propose kernels that take advantage of local correlations in sequential data and present their application to the protein
classification problem. Our locality kernels measure protein sequence similarities within a small window constructed around
matching amino acids. The kernels incorporate positional information of the amino acids inside the window and allow a range
of position dependent similarity evaluations. We use these kernels with regularized least-squares algorithm (RLS) for protein
classification on the SCOP database. Our experiments demonstrate that the locality kernels perform significantly better than
the spectrum and the mismatch kernels. When used together with RLS, performance of the locality kernels is comparable with
some state-of-the-art methods of protein classification and remote homology detection.