About
70
Publications
14,105
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,030
Citations
Introduction
Current institution
Publications
Publications (70)
Background:
Artificial neural networks (ANNs) can be a powerful tool for spectroscopic data analysis. Their ability to detect and model complex relations in the data may lead to outstanding predictive capabilities, but the predictions themselves are difficult to interpret due to the lack of understanding of the black box ANN models. ANNs and linea...
In the present paper we prove a new theorem, resulting in an exact updating formula for linear regression model residuals to calculate the segmented cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with...
A novel unified covariates selection algorithm called Swiss knife covariates selection (SKCovSel) is presented. It is suitable for selecting covariates in a wide range of data scenarios such as a single two-way data block, two-way multiblock, multiway, multiway multiblock, selection of covariates along different modes for multiway data blocks, and...
Multiway datasets arise in various situations, typically from specialised measurement technologies, as a result of measuring data over varying conditions in multiple dimensions, or simply as sets of possibly multichannel images. When such measurements are intended for predicting some external properties, the amount of methods available is limited....
Feature selection is an essential step in data science pipelines to reduce the complexity associated with large datasets. While much research on this topic focuses on optimizing predictive performance, few studies investigate stability in the context of the feature selection process. In this study, we present the Repeated Elastic Net Technique (REN...
Preprocessing is a mandatory step in most types of spectroscopy and spectrometry. The choice of preprocessing method depends on the data being analysed, and to get the preprocessing right, domain knowledge or trial and error is required. Given the recent success of deep learning-based methods in numerous applications and their ability to automatica...
Target volume delineation is a vital but time-consuming and challenging part of radiotherapy, where the goal is to deliver sufficient dose to the target while reducing risks of side effects. For head and neck cancer (HNC) this is complicated by the complex anatomy of the head and neck region and the proximity of target volumes to organs at risk. Th...
RENT is a feature selection method for binary classification and regression problems. At its core, RENT trains an ensemble of unique models using regularized elastic net to select features. Each model in the ensemble is trained with a unique and randomly selected subset from the full training data. From these models one can acquire weight distribut...
In this study we present the RENT feature selection method for binary classification and regression problems. We compare the performance of RENT to a number of other state-of-the-art feature selection methods on eight datasets (six for binary classification and two for regression) to illustrate RENT's performance with regard to prediction and reduc...
Addressing the need for high-quality, time efficient, and easy to use annotation tools, we propose SAnE, a semiautomatic annotation tool for labeling point cloud data. The contributions of this paper are threefold: (1) we propose a denoising pointwise segmentation strategy enabling a fast implementation of one-click annotation, (2) we expand the mo...
A novel formulation of the wide kernel algorithm for partial least squares regression (PLSR) is proposed. We show how the elimination of redundant calculations in the traditional applications of PLSR helps in speeding up any choice of cross‐validation strategy by utilizing precalculated lookup matrices.
The proposed lookup approach is combined with...
Feature selection is a challenging combinatorial optimization problem that tends to require a large number of candidate feature subsets to be evaluated before a satisfying solution is obtained. Because of the computational cost associated with estimating the regression coefficients for each subset, feature selection can be an immensely time‐consumi...
Advances in techniques for automated classification of point cloud data introduce great opportunities for many new and existing applications. However, with a limited number of labelled points, automated classification by a machine learning model is prone to overfitting and poor generalization. The present paper addresses this problem by inducing co...
An automatic segmentation algorithm for delineation of the gross tumour volume and pathologic lymph nodes of head and neck cancers in PET/CT images is described. The proposed algorithm is based on a convolutional neural network using the U-Net architecture. Several model hyperparameters were explored and the model performance in terms of the Dice s...
https://arxiv.org/pdf/1902.03088 .
Advances in techniques for automated classification of pointcloud data introduce great opportunities for many new and existing applications. However, with a limited number of labeled points, automated classification by a machine learning model is prone to overfitting and poor generalization. The present paper add...
Background:
For marker effect models and genomic animal models, computational requirements increase with the number of loci and the number of genotyped individuals, respectively. In the latter case, the inverse genomic relationship matrix (GRM) is typically needed, which is computationally demanding to compute for large datasets. Thus, there is a...
Extended multiplicative signal correction (EMSC) is a widely used framework for preprocessing spectral data. In the EMSC framework, spectra are scaled according to a given reference spectrum. Spectra that are far from collinear with the selected reference spectrum may not be scaled appropriately. An extension of the EMSC framework that allows for t...
Inspired by the success of deep learning techniques in dense-label prediction and the increasing availability of high precision airborne light detection and ranging (LiDAR) data, we present a research process that compares a collection of well-proven semantic segmentation architectures based on the deep learning approach. Our investigation conclude...
Application of different multivariate measurement technologies to the same set of samples is an interesting challenge in many fields of applied data analysis. Our proposal is a 2‐stage similarity index framework for comparing 2 matrices in this type of situation. The first step is to identify factors (and associated subspaces) of the matrices by me...
We present a methodology for distinguishing between three types of animal movement behavior (foraging, resting, and walking) based on high-frequency tracking data. For each animal we quantify an individual movement path. A movement path is a temporal sequence consisting of the steps through space taken by an animal. By selecting a set of appropriat...
Background
Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in larg...
Spectroscopic data are usually perturbed by noise from various sources that should be removed prior to model calibration. After conducting a preprocessing step to eliminate unwanted multiplicative effects (effects that scale the pure signal in a multiplicative manner), we discuss how to correct a model for unwanted additive effects in the spectra....
Algorithms for Partial Least Squares (PLS) modelling are placed into a sound theoretical context focusing on numerical precision and computational efficiency. NIPALS and other PLS algorithms that perform deflation steps of the predictors (X) may be slow or even compu-tationally infeasible for sparse and/or large-scale data sets. As alternatives we...
The separation of predictive and nonpredictive (or orthogonal) information in linear regression problems is considered to be an important issue in chemometrics. Approaches including net analyte preprocessing methods and various orthogonal signal correction (OSC) methods have been studied in a considerable number of publications. In the present pape...
Background:
Tumour delineation is a challenging, time-consuming and complex part of radiotherapy planning. In this study, an automatic method for delineating locally advanced cervical cancers was developed using a machine learning approach.
Materials and methods:
A method for tumour segmentation based on image voxel classification using Fisher?s...
We present the response-oriented sequential alternation (ROSA) method for multiblock data analysis. ROSA is a novel and transparent multiblock extension of the partial least squares regression (PLSR). According to a “winner takes all” approach, each component of the model is calculated from the block of predictors that most reduces the current resi...
In this paper, we propose a method to find inactive periods of a trajectory and employ it to livestock tracking.In contrast to the existing methods to find inactive periods in the domain of animal movement studies, the proposed method estimates inactive periods based on the position recordings only, without involving information from activity senso...
Background:
Dairy products account for approximately 60% of the iodine intake in the Norwegian population. The iodine concentration in cow's milk varies considerably, depending on feeding practices, season, and amount of iodine and rapeseed products in cow fodder. The variation in iodine in milk affects the risk of iodine deficiency or excess in t...
Long photoperiods characteristic of summers at high latitudes can increase ozone-induced foliar injury in subterranean clover (Trifolium subterraneum). This study compared the effects of long photoperiods on ozone injury in red and white clover cultivars adapted to shorter or longer daylengths of southern or northern Fennoscandia. Plants were expos...
Two-dimensional electrophoresis (2DE) is a traditional proteomics tool still used extensively to study differences in complex protein expression profiles between related biological samples. The methods can resolve thousands of intact proteins on a gel. However, the resulting image pattern is complex.Traditionally, each individual 2DE image, represe...
It is well known that the predictions of the single response orthogonal projections to latent structures (OPLS) and the single response partial least squares regression (PLS1) regression are identical in the single-response case. The present paper presents an approach to identification of the complete y-orthogonal structure by starting from the vie...
The insight from, and conclusions of this paper motivate efficient and numerically robust ‘new’ variants of algorithms for solving the single response partial least squares regression (PLS1) problem. Prototype MATLAB code for these variants are included in the Appendix. The analysis of and conclusions regarding PLS1 modelling are based on a rich an...
Dynamic models of biological systems often possess complex and multivariate mappings between input parameters and output state variables, posing challenges for comprehensive sensitivity analysis across the biologically relevant parameter space. In particular, more efficient and robust ways to obtain a solid understanding of how the sensitivity to e...
Data analysis at the pixel level instead of the protein spot level in the context of experiments generating two-dimensional gel electrophoresis (2DE) images requires a complete workflow description starting with an image analysis part (preprocessing and alignment), and ending with a statistical analysis. Here we describe the image analysis part of...
Background
Statistical approaches to describing the behaviour, including the complex relationships between input parameters and model outputs, of nonlinear dynamic models (referred to as metamodelling) are gaining more and more acceptance as a means for sensitivity analysis and to reduce computational demand. Understanding such input-output maps is...
Purpose: Idiopathic intracranial hypertension (IIH) is a condition of increased intracranial pressure of unknown aetiology. Patients with IIH usually suffer from headache and visual disturbances. High intracranial pressure despite normal ventricle size and negative MRI indicate perturbed water flux across cellular membranes, which is provided by th...
The technology for large-scale electrical recordings is rapidly improving, and extracellular recordings with multielectrode arrays now offer a unique window into neural activity at the population level. The high-frequency part of the recorded signal (MUA; multi-unit activity) is a measure of action-potential firing of neurons in the immediate vicin...
'Additional file.pdf' contains Appendix 1, which provides background theory on the multivariate analysis methodology used, and Appendix 2-4 with supplementary figures and tables for each of the three test cases.
Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation...
Neuroplasticity can be defined as the ability of the brain to adapt to environmental impacts. These adaptations include synapse formation and elimination, cortical reorganization, and neurogenesis. In epilepsy these mechanisms may become detrimental and contribute to disease progression. It has been proposed that Matrix Metalloproteinase 9 (MMP-9),...
Many Enterococcus faecalis strains display tolerance or resistance to many antibiotics, but genes that contribute to the resistance cannot be specified. The multiresistant E. faecalis V583, for which the complete genome sequence is available, survives and grows in media containing relatively high levels of chloramphenicol. No specific genes coding...
The etiopathogenesis of temporal lobe epilepsy (TLE) and its subgroups - mesial temporal lobe epilepsy with hippocampal sclerosis (MTLE-HS) and TLE with antecedent febrile seizures (TLE-FS) - is poorly understood. It has been proposed that the water channel aquaporin-4 (AQP4) and the potassium channel Kir4.1 (KCNJ10 gene) act in concert to regulate...
We propose a new data compression method for estimating optimal latent variables in multi-variate classification and regression problems where more than one response variable is available. The latent variables are found according to a common innovative principle combining PLS methodology and canonical correlation analysis (CCA). The suggested metho...
Based on a recently developed spot segmentation method, we here present a new approach to modelling of individual spots in digital images, e.g. images of DNA microarrays. From the model parameter estimates and residuals we have developed an expedient approach to automatic quality assessment and identification of corrupted spots. The suggested appro...
Based on the assumption that a microarray spot has an approximately circular shape we here introduce a robust method for detection of spots located inside defined regions ("spot boxes") of digital images. By appropriate vectorization of the spot box pixel positions, the spot detection is completed by maximization of a two-sample t-statistic. The me...
A new method is presented for extraction of population firing-rate models for both thalamocortical and intracortical signal transfer based on stimulus-evoked data from simultaneous thalamic single-electrode and cortical recordings using linear (laminar) multielectrodes in the rat barrel system. Time-dependent population firing rates for granular (l...
From the fundamental parts of PLS-DA, Fisher's canonical discriminant analysis (FCDA) and Powered PLS (PPLS), we develop the concept of powered PLS for classification problems (PPLS-DA). By taking advantage of a sequence of data reducing linear transformations (consistent with the computation of ordinary PLS-DA components), PPLS-DA computes each co...
Five methods for finding significant changes in proteome data have been used to analyze a two-dimensional gel electrophoresis data set. We used both univariate (ANOVA) and multivariate (Partial Least Squares with jackknife, Cross Model Validation, Power-PLS and CovProc) methods. The gels were taken from a time-series experiment exploring the change...
Different published versions of partial least squares discriminant analysis (PLS-DA) are shown as special cases of an approach exploiting prior probabilities in the estimated between groups covariance matrix used for calculation of loading weights. With prior probabilities included in the calculation of both PLS components and canonical variates, a...
A novel approach for revealing patterns of proteome variation among series of 2-DE gel images is presented. The approach utilises image alignment to ensure that each pixel represents the same information across all gels. Gel images are normalised, and background corrected, followed by unfolding of the images to 1-D pixel vectors and analysing pixel...
A modification of the PLS1 algorithm is presented. Stepwise optimization over a set of candidate loading weights obtained by taking powers of the y–X correlations and X standard deviations generalizes the classical PLS1 based on y–X covariances and hence adds flexibility to the modelling. When good linear predictions can be obtained, the suggested...
In this paper we present a new variable selection method designed for classification problems where the X data are discretely sampled from continuous curves. For such data the loading weight vectors of a PLS discriminant analysis inherit the continuous behaviour, making the idea of local peaks meaningful. For successive components the local peaks a...
Estimation by analogy is, simplified, the process of finding one or more projects that are similar to the one to be estimated and then derive the estimate from the values of these projects. If the selected projects have an unusual high or low productivity, then we should adjust the estimates toward productivity values of more average projects. The...
The present work investigates the possibility of constructing a multivariate calibration model for predicting the composition of ground beef with respect to different meat quality types, based on intensity profiles from isoelectric focusing of water-soluble proteins. Beef mixtures containing various amounts of mechanically recovered meat, head meat...
Estimation by analogy is, simplified, the process of finding one or more projects that are similar to the one to be estimated and then derive the estimate from the values of these projects. If the selected projects have an unusual high or low productivity, then we should adjust the estimates toward productivity values of more average projects. The...
With standard operators the fuzzy sets defined over a universe of discourse respects the structure of a De Morgan algebra. Via an injective canonical mapping we establish an isomorphism to a De Morgan algebra of crisp subsets of an extended universe. The canonical mapping gives a natural extension of any algebraic operator acting on fuzzy sets to a...
The goal of the presented study is two-fold. First, we want to emphasize the power of Near Infrared Reflectance (NIR) spectroscopy for discrimination between mayonnaise samples containing different vegetable oils. Secondly, we want to use our data to compare the performances of different classification procedures. The NIR spectra with 351 variables...
In order to compute the classical texture measures there is often a need to perform extensive calculations on the images and do a preprocessing in a specialised manner. Some of these texture measures are constructed to estimate specific information. Other texture measures seem to be more global in nature. The techniques presented in this paper defi...
Fast and automatic strategies for extraction of characteristic feature spectra from digital images are investigated. We present a study based on images from confocal laser scanning microscopy (CLSM) of mayonnaise. Based on principal component regression (PCR), six different methods are compared with respect to prediction of external measurements de...
In this paper a unified description of classification methods in situations with multicollinear data is proposed. It is shown that a number of the well-established methods can be derived by substituting different modified versions of the covariance matrix into either the classical Bayes method or Fisher’s linear (canonical) discriminant method. A p...
Professionals and consumers want to control the origin of meat, while producers can profit by mixing minced meat from low cost species into high value meat. Near infrared (NIR) spectroscopy on dry extracts was studied as a method for speciation of minced beef, pork, mutton and mechanically recovered poultry meat. This was divided into three levels:...