Tapio Pahikkala

Tapio Pahikkala
University of Turku | UTU · Department of Computing

About

219
Publications
56,500
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,434
Citations
Citations since 2016
103 Research Items
3375 Citations
20162017201820192020202120220100200300400500600
20162017201820192020202120220100200300400500600
20162017201820192020202120220100200300400500600
20162017201820192020202120220100200300400500600

Publications

Publications (219)
Article
Full-text available
Receiver Operating Characteristic (ROC) curve analysis and area under the ROC curve (AUC) are commonly used performance measures in diagnostic systems. In this work, we assume a setting, where a classifier is inferred from multivariate data to predict the diagnostic outcome for new cases. Cross-validation is a resampling method for estimating the p...
Article
Full-text available
Pairwise learning corresponds to the supervised learning setting where the goal is to make predictions for pairs of objects. Prominent applications include predicting drug-target or protein-protein interactions, or customer-product preferences. In this work, we present a comprehensive review of pairwise kernels, that have been proposed for incorpor...
Article
Full-text available
While the effectiveness of fog computing in Internet of Things (IoT) applications has been widely investigated in various studies, there is still a lack of techniques to efficiently utilize the computing resources in a fog platform to maximize Quality of Service (QoS) and Quality of Experience (QoE). This paper presents a resource management model...
Article
Full-text available
Currently, popular methods for prenatal risk assessment of fetal aneuploidies are based on multivariate probabilistic modelling, that are built on decades of scientific research and large-scale multi-center clinical studies. These static models that are deployed to screening labs are rarely updated or adapted to local population characteristics. In...
Article
Full-text available
Motivation Combination therapies have emerged as a powerful treatment modality to overcome drug resistance and improve treatment efficacy. However, the number of possible drug combinations increases very rapidly with the number of individual drugs in consideration, which makes the comprehensive experimental screening infeasible in practice. Machine...
Article
Full-text available
Background Accurate detection of clinically significant prostate cancer (csPCa), Gleason Grade Group ≥ 2, remains a challenge. Prostate MRI radiomics and blood kallikreins have been proposed as tools to improve the performance of biparametric MRI (bpMRI). Purpose To develop and validate radiomics and kallikrein models for the detection of csPCa....
Preprint
Full-text available
Motivation: Combination therapies have emerged as a powerful treatment modality to overcome drug resistance and improve treatment efficacy. However, the number of possible drug combinations increases very rapidly with the number of individual drugs in consideration which makes the comprehensive experimental screening infeasible in practice. Machine...
Preprint
We study the combinatorics of cross-validation based AUC estimation under the null hypothesis that the binary class labels are exchangeable, that is, the data are randomly assigned into two classes given a fixed class proportion. In particular, we study how the estimators based on leave-pair-out cross-validation (LPOCV), in which every possible pai...
Preprint
Full-text available
The goal of recommender systems is to help users find useful items from a large catalog of items by producing a list of item recommendations for every user. Data sets based on implicit data collection have a number of special characteristics. The user and item interaction matrix is often complete, i.e. every user and item pair has an interaction va...
Article
Full-text available
We present comboFM, a machine learning framework for predicting the responses of drug combinations in pre-clinical studies, such as those based on cell lines or patient-derived cells. comboFM models the cell context-specific drug interactions through higher-order tensors, and efficiently learns latent factors of the tensor using powerful factorizat...
Article
Objectively determined single-number-quantities (SNQs) describing the airborne sound insulation of a fac¸ade should correspond to the subjective perception of annoyance to road traffic sounds transmitted through a fac¸ade. The reference spectra for spectrum adaptation terms C and Ctr in standard ISO 717-7 (International Organization for Standardiza...
Article
Full-text available
Forest harvesting operations with heavy machinery can lead to significant soil rutting. Risks of rutting depend on the soil bearing capacity which has considerable spatial and temporal variability. Trafficability prediction is required in the selection of suitable operation sites for a given time window and conditions, and for on-site route optimiz...
Article
While the development of oat products often requires altered molecular weight (MW) of β-glucan, the resulting health implications are currently unclear. This 3-leg crossover trial (n = 14) investigated the effects of the consumption of oat bran with High, Medium and Low MW β-glucan (average > 1000, 524 and 82 kDa respectively) with 3 consequent mea...
Preprint
Full-text available
Game recommendation is an important application of recommender systems. Recommendations are made possible by data sets of historical player and game interactions, and sometimes the data sets include features that describe games or players. Collaborative filtering has been found to be the most accurate predictor of past interactions. However, it can...
Article
Full-text available
Objective: Minority oversampling is a standard approach used for adjusting the ratio between the classes on imbalanced data. However, established methods often provide modest improvements in classification performance when applied to data with extremely imbalanced class distribution and to mixed-type data. This is usual for vital statistics data,...
Preprint
Full-text available
We present comboFM, a machine learning framework for predicting the responses of drug combinations in preclinical studies, such as those based on cell lines or patient-derived cells. comboFM models the cell context-specific drug interactions through higher-order tensors, and efficiently learns latent factors of the tensor using powerful factorizati...
Preprint
Pairwise learning corresponds to the supervised learning setting where the goal is to make predictions for pairs of objects. Prominent applications include predicting drug-target or protein-protein interactions, or customer-product preferences. Several kernel functions have been proposed for incorporating prior knowledge about the relationship betw...
Article
Free-to-play has become one of the most popular monetization models, and as a consequence game developers need to get the players to purchase in the game instead of getting players to buy the game. Game analytics and player monetization prediction are important parts in estimating the profitability of a free-to-play game. In this paper, we concentr...
Article
In this paper, we propose a generalized wrapper-based feature selection, called GeFeS, which is based on a parallel new intelligent genetic algorithm (GA). The proposed GeFeS works properly under different numerical dataset dimensions and sizes, carefully tries to avoid overfitting and significantly enhances classification accuracy. To make the GA...
Article
Full-text available
16 The automatic detection of facial expressions of pain has been needed to ensure accurate 17 pain assessment of patients who are unable to self-report pain. To overcome the challenges 18 of automatic systems for determining pain levels based on facial expressions in clinical patient 19 monitoring, a surface electromyography method was tested for...
Chapter
Many statistical models have been developed to understand the causes of unemployment, but predicting unemployment has received less attention. In this study, we develop a model to predict the labour market state of a person based on machine learning trained with a large administrative unemployment registry. The model specifies individuals as Markov...
Chapter
Peer-to-peer lending is a new lending approach gaining in popularity. These loans can offer high interest rates, but they are also exposed to credit risk. In fact, high default rates and low recovery rates are the norms. Potential investors want to know the expected profit in these loans, which means they need to model both defaults and recoveries....
Article
Over the last years, the consumption of unpasteurised milk has increased in popularity in the Western countries, despite the known risks associated with food-borne pathogens. Some people appear to experience milk-related gastrointestinal symptoms even when tested negative for lactose intolerance and milk allergy. In such cases, processing of milk,...
Article
Full-text available
The aim of this prospective single-institution clinical trial (NCT02002455) was to evaluate the potential of advanced post-processing methods for ¹⁸F-Fluciclovine PET and multisequence multiparametric MRI in the prediction of prostate cancer (PCa) aggressiveness, defined by Gleason Grade Group (GGG). 21 patients with PCa underwent PET/CT, PET/MRI a...
Preprint
Full-text available
In machine learning one often assumes the data are independent when evaluating model performance. However, this rarely holds in practise. Geographic information data sets are an example where the data points have stronger dependencies among each other the closer they are geographically. This phenomenon known as spatial autocorrelation (SAC) causes...
Preprint
Full-text available
Resting state electroencephalographic (EEG) recording could provide cost-effective means to aid in the detection of neurological disorders such as Parkinson's disease (PD). We examined how many electrodes are needed for classification of PD based on EEG, which electrode locations provide most value for classification, and whether data recorded eyes...
Preprint
Full-text available
This paper proposes a novel method for learning highly nonlinear, multivariate functions from examples. Our method takes advantage of the property that continuous functions can be approximated by polynomials, which in turn are representable by tensors. Hence the function learning problem is transformed into a tensor reconstruction problem, an inver...
Article
Background Multiparametric prostate magnetic resonance imaging (mpMRI) can be considered the gold standard in prostate magnetic resonance imaging (MRI). Biparametric prostate MRI (bpMRI) is faster and could be a feasible alternative to mpMRI. Objective To determine the negative predictive value (NPV) of Improved Prostate Cancer Diagnosis (IMPROD)...
Article
Full-text available
Introduction: Predictive survival modeling offers systematic tools for clinical decision-making and individualized tailoring of treatment strategies to improve patient outcomes while reducing overall healthcare costs. In 2015, a number of machine learning and statistical models were benchmarked in the DREAM 9.5 Prostate Cancer Challenge, based on...
Article
Background: Multiparametric MRI of the prostate has been shown to improve the risk stratification of men with an elevated prostate-specific antigen (PSA). However, long acquisition time, high cost, and inter-center/reader variability of a routine prostate multiparametric MRI limit its wider adoption. Purpose: To develop and validate nomograms ba...
Article
Background: Accurate risk stratification of men with a clinical suspicion of prostate cancer (cSPCa) remains challenging despite the increasing use of MRI. Purpose: To evaluate the diagnostic accuracy of a unique biparametric MRI protocol (IMPROD bpMRI) combined with clinical and molecular markers in men with cSPCa. Study type: Prospective sin...
Article
Background: Biochemical recurrence (BCR) affects a significant proportion of patients who undergo robotic-assisted laparoscopic prostatectomy (RALP). Purpose: To evaluate the performance of a routine clinical prostate multiparametric magnetic resonance imaging (mpMRI) and Decipher genomic classifier score for prediction of biochemical recurrence...
Article
Full-text available
Background Biparametric magnetic resonance imaging (bpMRI) combined with prostate-specific antigen density (PSAd) may be an effective strategy for selecting men for prostate biopsy. It has been shown that performing biopsy only for men with bpMRI Likert scores of 4–5 or PSAd ≥0.15 ng/ml/cm³ is the most efficient strategy. Objective To externally v...
Conference Paper
Patient self-reporting of pain is not always possible, in those cases automated objective pain assessment could lead to reliable pain assessment. In this context, physiological measurements have been studied and one of the promising signals is skin conductance (SC). In this study, 1Hz SC signal acquisition is performed while gradually increasing he...
Article
Full-text available
Purpose: To develop and validate a classifier system for prediction of prostate cancer (PCa) Gleason score (GS) using radiomics and texture features of T2-weighted imaging (T2w), diffusion weighted imaging (DWI) acquired using high b values, and T2-mapping (T2). Methods: T2w, DWI (12 b values, 0-2000 s/mm2), and T2 data sets of 62 patients with...
Article
Automatic locating of weeds from fields is an active research topic in precision agriculture. A reliable and practical plant identification technique would enable the reduction of herbicide amounts and lowering of production costs, along with reducing the damage to the ecosystem. When the seeds have been sown row-wise, most weeds may be located bet...
Article
Full-text available
Machine learning based classification methods are widely used in geoscience applications, including mineral prospectivity mapping. Typical characteristics of the data, such as small number of positive instances, imbalanced class distributions and lack of verified negative instances make ROC analysis and cross-validation natural choices for classifi...
Poster
Full-text available
We evaluated repeatability and diagnostic performance of commonly used radiomic features for prostate cancer (PCa) DWI obtained using b values up to 2000 s/mm2. Forty-eight men with diagnosed PCa under two repeated 3T MRI examinations performed on the same day. Whole mounts prostatectomy sections were manually matched with in-vivo MRI data. Fourtee...
Poster
Full-text available
Eighty men with a clinical suspicion of prostate cancer (PCa) were enrolled as a part of IMPROD trial (NCT01864135). The performance of 9 clinical parameters, 11 mRNA transcript levels and 4 IMPROD biparametric MRI (bpMRI) parameters for detection of PCa with Gleason score ≥3+4 was evaluated using GreedyRLS feature selection and nested cross-valida...
Poster
Full-text available
Nomograms for prediction of prostate biopsy outcomes incorporating qualitative and quantitative findings of IMPROD biparametric MRI (IMPROD bpMRI consists of T2 weighted imaging and three separate DWI acquisitions) were developed using data of 161 men enrolled as a part of the single-institutional IMPROD (NCT01864135) trial and externally validated...
Poster
Full-text available
In this prospective single institutional trial (NCT01864135), we evaluated the accuracy of a unique prostate MRI acquisition and reporting protocol, IMPROD biparametric MRI, in men with a clinical suspicion of prostate cancer who were subsequently diagnosed with prostate cancer and underwent prostatectomy. IMPROD biparametric MRI correctly detected...
Article
Background Prostate MRI is increasingly being used in men with a clinical suspicion of prostate cancer (PCa). However, development and validation of methods for focal therapy planning are still lagging. Purpose To evaluate the diagnostic accuracy on lesion, region‐of‐interest (ROI), and voxel level of IMPROD biparametric prostate MRI (bpMRI) for P...
Article
Full-text available
Remote health monitoring is an effective method to enable tracking of at-risk patients outside of conventional clinical settings, providing early-detection of diseases and preventive care as well as diminishing healthcare costs. Internet of Things (IoT) technology facilitates development of such monitoring systems although significant challenges ne...
Article
Full-text available
Background The molecular mechanisms mediating postnatal loss of cardiac regeneration in mammals are not fully understood. We aimed to provide an integrated resource of mRNA, protein, and metabolite changes in the neonatal heart for identification of metabolism‐related mechanisms associated with cardiac regeneration. Methods and Results Mouse ventr...
Article
Full-text available
Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Rece...
Data
Appendix S1. RNA sequencing and differential expression analysis.
Data
Appendix S2. Gene set enrichment analysis.
Data
Appendix S5. Fuzzy clustering (transcripts, proteins, and metabolites in each cluster) and upstream regulator analysis (Ingenuity pathway analysis; Qiagen).
Data
Data S1. Supplemental methods. Table S1. Expression of Selected Genes Linked to Cardiac Regeneration and the Postnatal Metabolic Switch Table S2. Proteomics Figure S1. Individual factor maps of the RNA sequencing data principal component (PC) analysis shows perfect separation of sample groups. Figure S2. Top 10 down‐ and upregulated genes in th...
Article
A novel microtiter plate array for the quantification and identification of metal ions in drinking water was compared to human taste panel analysis. The array is based on nonspecific interactions between analyte metal ions and lanthanide chelates with non-antenna and antenna ligands, leading to a luminescence signal profile unique to sample compone...
Article
Full-text available
Motivation: Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially...
Article
Full-text available
Digital maps of forest resources are a crucial factor in successful forestry applications. Since manual measurement of this data on large areas is infeasible, maps must be constructed using a sample field data set and a prediction model constructed from remote sensing materials, of which airborne laser scanning (ALS) data and aerial images are curr...
Article
Full-text available
Prenatal screening generates a great amount of data that is used for predicting risk of various disorders. Prenatal risk assessment is based on multiple clinical variables and overall performance is defined by how well the risk algorithm is optimized for the population in question. This article evaluates machine learning algorithms to improve perfo...
Preprint
Full-text available
Rationale Mammals lose the ability to regenerate their hearts within one week after birth. During this regenerative window, cardiac energy metabolism shifts from glycolysis to fatty acid oxidation, and recent evidence suggests that metabolism may participate in controlling cardiomyocyte cell cycle. However, the molecular mechanisms mediating the lo...