Mikko J. Sillanpää's research while affiliated with University of Oulu and other places

Publications (44)

Preprint
Full-text available
Inverse problems are in many cases solved with optimization techniques. When the underlying model is linear, first-order gradient methods are usually sufficient. With nonlinear models, due to nonconvexity, one must often resort to second-order methods that are computationally more expensive. In this work we aim to approximate a nonlinear model with...
Preprint
Lasso is a popular and efficient approach to simultaneous estimation and variable selection in high-dimensional regression models. In this paper, a robust LAD-lasso method for multiple outcomes is presented that addresses the challenges of non-normal outcome distributions and outlying observations. Measured covariate data from space or time, or spe...
Article
Full-text available
Attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy coupled with machine learning-based partial least squares discriminant analysis (PLS-DA) was applied to study if severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) could be detected from nasopharyngeal swab samples originally collected for polymerase chain reac...
Article
Full-text available
We propose a novel method for deconvolving incoherent scatter radar data to recover accurate reconstructions of backscattered powers. The problem is modelled as a hierarchical noise-perturbed deconvolution problem, where the lower hierarchy consists of an adaptive length-scale function that allows for a non-stationary prior and as such enables adap...
Article
Full-text available
Heritable variation in traits under natural selection is a prerequisite for evolutionary response. While it is recognized that trait heritability may vary spatially and temporally depending on which environmental conditions traits are expressed under, less is known about the possibility that genetic variance contributing to the expected selection r...
Article
Full-text available
Prediction of complex traits based on genome-wide marker information is of central importance for both animal and plant breeding. Numerous models have been proposed for the prediction of complex traits and still considerable effort has been given to improve the prediction accuracy of these models, because various genetics factors like additive, dom...
Article
Using unique administrative register data, we investigate old‐age retirement under the statutory pension scheme in Finland. The analysis is based on multi‐outcome modelling of pensions and working lives together with a range of explanatory variables. An adaptive multi‐outcome LAD‐lasso regression method is applied to obtain estimates of earnings an...
Article
Full-text available
We introduce a new model selection criterion for sparse complex gene network modeling where gene co-expression relationships are estimated from data. This is a novel formulation of the gap statistic and it can be used for the optimal choice of a regularization parameter in graphical models. Our criterion favors gene network structure which differs...
Preprint
Full-text available
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causing global COVID-19 pandemic since 2019 has led to increasing amount of research to study how to do fast screening and diagnosis to efficiently detect COVID-19 positive cases, and how to prevent spreading of the virus. Our research objective was to study whether SARS-CoV-2 could be de...
Preprint
Full-text available
Many interventional surgical procedures rely on medical imaging to visualise and track instruments. Such imaging methods not only need to be real-time capable, but also provide accurate and robust positional information. In ultrasound applications, typically only two-dimensional data from a linear array are available, and as such obtaining accurate...
Preprint
Full-text available
We propose a novel method for deconvolving incoherent scatter radar data to recover accurate reconstructions of backscattered powers. The problem is modelled as a hierarchical noise-perturbed deconvolution problem, where the lower hierarchy consists of an adaptive length-scale function that allows for a non-stationary prior and as such enables adap...
Preprint
Full-text available
Zero-inflated explanatory variables are common in fields such as ecology and finance. In this paper we address the problem of having excess of zero values in some explanatory variables which are subject to multioutcome lasso-regularized variable selection. Briefly, the problem results from the failure of the lasso-type of shrinkage methods to recog...
Article
Full-text available
Dispersal has a crucial role determining eco‐evolutionary dynamics through both gene flow and population size regulation. However, to study dispersal and its consequences, one must distinguish immigrants from residents. Dispersers can be identified using telemetry, capture‐mark‐recapture (CMR) methods, or genetic assignment methods. All of these me...
Article
Full-text available
A wide variety of 1) parametric regression models and 2) co-expression networks have been developed for finding gene-by-gene interactions underlying complex traits from expression data. While both methodological schemes have their own well-known benefits, little is known about their synergistic potential. Our study introduces their methodological f...
Preprint
Full-text available
Location-allocation and partitional spatial clustering both deal with spatial data, seemingly from different viewpoints. Partitional clustering analyses data points by partitioning them into separate groups, while location-allocation places facilities in locations that best meet the needs of demand points. However, both partitional clustering and l...
Preprint
Full-text available
Capacitated spatial clustering, a type of unsupervised machine learning method, is often used to tackle problems in compressing, classifying, logistic optimization and infrastructure optimization. Depending on the application at hand, a wide set of extensions may be necessary in clustering. In this article we propose a number of novel extensions...
Article
Full-text available
Advanced Backcross (AB) populations have been widely used to identify and utilize beneficial alleles in various crops such as rice, tomato, wheat and barley. For the development of an AB population, a controlled crossing scheme is used and this controlled crossing along with the selection (both natural and artificial) of agronomically-adapted allel...
Article
Full-text available
The deployment of edge computing infrastructure requires a careful placement of the edge servers, with an aim to improve application latencies and reduce data transfer load in opportunistic Internet of Things systems. In the edge server placement, it is important to consider computing capacity, available deployment budget, and hardware requirements...
Article
Full-text available
Spatiotemporal interpolation provides estimates of observations in unobserved locations and time slots. In smart cities, interpolation helps to provide a fine-grained contextual and situational understanding of the urban environment, in terms of both short-term (e.g., weather, air quality, traffic) or long term (e.g., crime, demographics) spatio-te...
Preprint
Full-text available
Additive and dominance genetic variances underlying the expression of quantitative traits are important quantities for predicting short-term responses to selection, but they are notoriously challenging to estimate in most wild animal populations. Using estimates of genome-wide identity-by-descent (IBD) sharing from autosomal SNP loci, we estimated...
Preprint
Full-text available
A wide variety of parametric approaches and co-expression networks have been developed for finding gene-by-gene interactions underlying complex traits from expression data. However, a little is known about the practical correspondence and synergistic potential of these different schemes. We provide a framework for parallel consideration of parametr...
Article
Full-text available
Motivation: Graphical lasso (Glasso) is a widely used tool for identifying gene regulatory networks in systems biology. However, its computational efficiency depends on the choice of regularization parameter (tuning parameter), and selecting this parameter can be highly time-consuming. Although fully Bayesian implementations of Glasso alleviate th...
Preprint
Full-text available
Deconvolution is a fundamental inverse problem in signal processing and the prototypical model for recovering a signal from its noisy measurement. Nevertheless, the majority of model-based inversion techniques require knowledge on the convolution kernel to recover an accurate reconstruction and additionally prior assumptions on the regularity of th...
Article
Stress has become a major health concern and there is a need to study and develop new digital means for real-time stress detection. Currently, majority of stress detection research is using population based approaches that lack the capability to adapt to individual differences. They also use supervised learning methods, requiring extensive labeling...
Article
Whereas non-linear relationships between genes are acknowledged there exist only a few methods for estimating non-linear gene co-expression networks or gene regulatory networks (GCNs/GRNs) with common deficiencies. These methods often consider only pairwise associations between genes and are therefore poorly capable of identifying higher-order regu...
Article
Full-text available
Background: In the past two decades, the number of maternity hospitals in Finland has been reduced from 42 to 22. Notwithstanding the benefits of centralization for larger units in terms of increased safety, the closures will inevitably impair geographical accessibility of services. Methods: This study aimed to employ a set of location-allocatio...
Conference Paper
Full-text available
In this article, we study the scaling up of edge computing deployments. In edge computing, deployments are scaled up by adding more computational capacity atop the initial deployment, as deployment budgets allow. However, without careful consideration, adding new servers may not improve proximity to the mobile users, crucial for the Quality of Expe...
Article
Full-text available
Motivation: Improved DNA technology has made it practical to estimate single nucleotide polymorphism (SNP)-heritability among distantly related individuals with unknown relationships. For growth and development related traits, it is meaningful to base SNP-heritability estimation on longitudinal data due to the time-dependency of the process. Howev...
Article
1.Dispersal, the movement of individuals between populations, is crucial in many ecological and genetic processes. However, direct identification of dispersing individuals is difficult or impossible in natural populations. By using genetic assignment methods, individuals with unknown genetic origin can be assigned to source populations. This knowle...
Article
The Gaussian process (GP) regression is theoretically capable of capturing higher-order gene-by-gene interactions important to trait variation non-exhaustively with high accuracy. Unfortunately, GP approach is scalable only for 100-200 genes and thus, not applicable for high... Gaussian process (GP)-based automatic relevance determination (ARD) is...
Preprint
Full-text available
Edge computing in the Internet of Things brings applications and content closer to the users by introducing an additional computational layer at the network infrastructure, between cloud and the resource-constrained data producing devices and user equipment. This way, the opportunistic nature of the operational environment is addressed by introduci...
Article
Full-text available
Norway spruce is a boreal forest tree species of significant ecological and economic importance. Hence there is a strong imperative to dissect the genetics underlying important wood quality traits in the species. We performed a functional Genome‐Wide Association Study (GWAS) of 17 wood traits in Norway spruce using 178101 single‐nucleotide polymorp...
Article
Full-text available
Motivation: Recent advances in high dimensional phenotyping bring time as an extra dimension into the phenotypes. This promotes the quantitative trait locus (QTL) studies of function-valued traits such as those related to growth and development. Existing approaches for analyzing functional traits utilize either parametric methods or semi-parametri...
Article
Full-text available
Mobile, vehicle-installed road weather sensors are becoming ubiquitous. While mobile sensors are often capable of making observations on a high frequency, their reliability and accuracy may vary. Large-scale road weather observation and forecasting are still mostly based on stationary road weather stations (RWS). Though expensive, sparsely located...
Data
Q-Q plot of a sample of 100 residuals. (TIF)
Data
Q-Q plot of the mobile sensor calibration level random effect. (TIF)
Data
Residual plot of the inference model. (TIF)
Data
Q-Q plot of the RWS sensor calibration level random effect. (TIF)
Article
Stochastic search variable selection (SSVS) is a Bayesian variable selection method that employs covariate‐specific discrete indicator variables to select which covariates (e.g., molecular markers) are included in or excluded from the model. We present a new variant of SSVS where, instead of discrete indicator variables, we use continuous‐scale wei...
Article
Models of excess mortality with random effects were used to estimate regional variation in relative or net survival of cancer patients. Statistical inference for these models based on the Markov chain Monte Carlo (MCMC) methods is computationally intensive and, therefore, not feasible for routine analyses of cancer register data. This study assesse...
Article
Full-text available
In plant breeding, one of the main purpose of multi-environment trial (MET) is to assess the intensity of genotype-by-environment (G×E) interactions in order to select high-performing lines of each environment. Most models to analyze such MET data consider only the additive genetic effects and the part of the non-additive genetic effects are confou...
Preprint
Full-text available
29 30 Norway spruce (Picea abies) is an important boreal forest tree species of significant 31 ecological and economic importance. Hence there is a strong imperative to dissect the 32 genetics controlling important wood quality traits in Norway spruce. 33 We performed a functional genome-wide association mapping of 17 wood quality 34 traits in Norw...

Citations

... To predict correlations among DEGs, we constructed a PPI network using the online STRING database (version 11.0, https://string-db.org/). To reduce the risk of overfitting, we utilized the least absolute shrinkage and selection operator (LASSO) Cox regression model ("glmnet" R package), and the minimum criteria were selected [13,14]. The independent variable in the LASSO regression analysis was the standardized expression matrix of candidate prognostic DEGs associated with EMT, and the dependent variable was the OS (or other survival) status of LUAD patients in The Cancer Genome Atlas cohort. ...
... Robustness against outliers of explanatory variables (i.e., leverage points) are rarely considered together with lasso. Exception being an adaptive LAD-lasso approach, which was recently proposed to handle excess number of zeros in covariate data, see [16], and a real data example in [9]. ...
... House sparrows generally show strong site fidelity and dispersal occurs mainly among juveniles in the autumn (i.e., natal dispersal, Altwegg et al., 2000) and over short distances (Anderson, 2006;Tufto et al., 2005). All islands surrounding Hestmannøy and Traena and the inhabited areas on the mainland shores ( Figure 2) were visited regularly to identify dispersers Saatoglu et al., 2021). To reduce effects of any selective disappearance of certain phenotypes before registration of dispersal, only individuals that survived until the following spring (i.e., recruits), were included in the analyses. ...
... Finally, we place the edge nodes using the PACK capacitated clustering algorithm, which is capable of balancing the cluster allocation. 20 The PACK method minimizes an objective function that sums the distances between base stations, weighted by their workload, and the locations of the edge servers. The minimization is conducted with constraints for workload balance (that is, the sum of the workloads of the base stations allocated to each edge node) and server homogeneity (ensured by a minimum number of workloads per edge node). ...
... method for finding the optimal edge server placement as it has been shown to provide improved results over other state-of-the-art methods. 19 We first divide the area into a grid of 100 × 100 cells and then count the number of observations in each cell. We assign the grid cells to their nearest base station, summing up the number of observations as the workload passes through the base station. ...
... Here, we analysed growth trajectories in three F 2 marinefreshwater crosses used in previous studies exploring the genetic architecture of various quantitative traits [8,[50][51][52]: HEL × RYT, HEL × PYÖ and HEL × BYN. Briefly (but see electronic supplementary material and [53]), grandparental (F 0 ) individuals were collected from the wild and mated in the laboratory to produce F 1 -offspring. F 2 -offspring were produced by mating single randomly chosen pairs of F 1 in each cross (see electronic supplementary material) and in total, 274 F 2 offspring were obtained for the HEL × RYT cross, 278 for the HEL × PYÖ cross and 307 for the HEL × BYN cross. ...
... In our examples the EBIC was overly conservative in selecting edges, which resulted in high mean-squared-error. Finally, we note that there are alternative approaches to choosing λ, see Kuismin and Sillanpää (2021), but they require more extensive computations that become prohibitive in our setting. ...
... While different approaches to estimating stress have been explored, one practical way to classify such approaches is to divide them into personalised and generalised models. Personalised stress detection models adapt their predictions to specific individuals using subjective ratings from different affective stimuli and a prerecorded baseline [10,52]. These ratings are commonly collected through self-reports or questionnaires from participants [45]. ...
... The most widely used correlation method is PCC due to its simplicity [69]. Although PCC measures the strength of the linear relationship between two variables, it can be sensitive to outliers that may result in false correlations [70]. ...
... However, this study focuses on the result of public transport being unavailable, in this case, due to the COVID-19 pandemic. Research on people's opportunities to well-being and health regarding geographical (or spatial) accessibility has focused on several themes in Finland, for example, spatial accessibility to maternity hospitals (Huotari et al. 2020). Kotavaara and colleagues (2021) recently researched geographical accessibility (synonymous with travel time) of primary health care, using the country of Finland as the case study. ...