
Tapio PahikkalaUniversity of Turku | UTU · Department of Computing
Tapio Pahikkala
Professor
About
246
Publications
65,330
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,113
Citations
Publications
Publications (246)
The prediction of interactions between novel drugs and biological targets is a vital step in the early stage of the drug discovery pipeline. Many deep learning approaches have been proposed over the last decade, with a substantial fraction of them sharing the same underlying two-branch architecture. Their distinction is limited to the use of differ...
The prediction of interactions between novel drugs and biological targets is a vital step in the early stage of the drug discovery pipeline. Many deep learning approaches have been proposed over the last decade, with a substantial fraction of them sharing the same underlying two-branch architecture. Their distinction is limited to the use of differ...
Early detection is vital for future neuroprotective treatments of Parkinson's disease (PD). Resting state electroencephalographic (EEG) recording has shown potential as a cost-effective means to aid in detection of neurological disorders such as PD. In this study, we investigated how the number and placement of electrodes affects classifying PD pat...
Receiver Operating Characteristic (ROC) curve analysis and area under the ROC curve (AUC) are commonly used performance measures in diagnostic systems. In this work, we assume a setting, where a classifier is inferred from multivariate data to predict the diagnostic outcome for new cases. Cross-validation is a resampling method for estimating the p...
Pairwise learning corresponds to the supervised learning setting where the goal is to make predictions for pairs of objects. Prominent applications include predicting drug-target or protein-protein interactions, or customer-product preferences. In this work, we present a comprehensive review of pairwise kernels, that have been proposed for incorpor...
While the effectiveness of fog computing in Internet of Things (IoT) applications has been widely investigated in various studies, there is still a lack of techniques to efficiently utilize the computing resources in a fog platform to maximize Quality of Service (QoS) and Quality of Experience (QoE). This paper presents a resource management model...
Currently, popular methods for prenatal risk assessment of fetal aneuploidies are based on multivariate probabilistic modelling, that are built on decades of scientific research and large-scale multi-center clinical studies. These static models that are deployed to screening labs are rarely updated or adapted to local population characteristics. In...
Motivation
Combination therapies have emerged as a powerful treatment modality to overcome drug resistance and improve treatment efficacy. However, the number of possible drug combinations increases very rapidly with the number of individual drugs in consideration, which makes the comprehensive experimental screening infeasible in practice. Machine...
Background
Accurate detection of clinically significant prostate cancer (csPCa), Gleason Grade Group ≥ 2, remains a challenge. Prostate MRI radiomics and blood kallikreins have been proposed as tools to improve the performance of biparametric MRI (bpMRI).
Purpose
To develop and validate radiomics and kallikrein models for the detection of csPCa....
A/B testing is a popular tool for guiding mobile game development. The developer releases different versions of a game to different test cohorts, and observes which version has the best player retention or monetization. Correctly determining whether the differences are statistically significant is however challenging. Typically the analysis needs t...
Motivation: Combination therapies have emerged as a powerful treatment modality to overcome drug resistance and improve treatment efficacy. However, the number of possible drug combinations increases very rapidly with the number of individual drugs in consideration which makes the comprehensive experimental screening infeasible in practice. Machine...
We study the combinatorics of cross-validation based AUC estimation under the null hypothesis that the binary class labels are exchangeable, that is, the data are randomly assigned into two classes given a fixed class proportion. In particular, we study how the estimators based on leave-pair-out cross-validation (LPOCV), in which every possible pai...
The goal of recommender systems is to help users find useful items from a large catalog of items by producing a list of item recommendations for every user. Data sets based on implicit data collection have a number of special characteristics. The user and item interaction matrix is often complete, i.e. every user and item pair has an interaction va...
We present comboFM, a machine learning framework for predicting the responses of drug combinations in pre-clinical studies, such as those based on cell lines or patient-derived cells. comboFM models the cell context-specific drug interactions through higher-order tensors, and efficiently learns latent factors of the tensor using powerful factorizat...
Objectively determined single-number-quantities (SNQs) describing the airborne sound insulation of a fac¸ade should correspond to the subjective perception of annoyance to road traffic sounds transmitted through a fac¸ade. The reference spectra for spectrum adaptation terms C and Ctr in standard ISO 717-7 (International Organization for
Standardiza...
Forest harvesting operations with heavy machinery can lead to significant soil rutting. Risks of rutting depend on the soil bearing capacity which has considerable spatial and temporal variability. Trafficability prediction is required in the selection of suitable operation sites for a given time window and conditions, and for on-site route optimiz...
While the development of oat products often requires altered molecular weight (MW) of β-glucan, the resulting health implications are currently unclear. This 3-leg crossover trial (n = 14) investigated the effects of the consumption of oat bran with High, Medium and Low MW β-glucan (average > 1000, 524 and 82 kDa respectively) with 3 consequent mea...
Game recommendation is an important application of recommender systems. Recommendations are made possible by data sets of historical player and game interactions, and sometimes the data sets include features that describe games or players. Collaborative filtering has been found to be the most accurate predictor of past interactions. However, it can...
Objective:
Minority oversampling is a standard approach used for adjusting the ratio between the classes on imbalanced data. However, established methods often provide modest improvements in classification performance when applied to data with extremely imbalanced class distribution and to mixed-type data. This is usual for vital statistics data,...
We present comboFM, a machine learning framework for predicting the responses of drug combinations in preclinical studies, such as those based on cell lines or patient-derived cells. comboFM models the cell context-specific drug interactions through higher-order tensors, and efficiently learns latent factors of the tensor using powerful factorizati...
Pairwise learning corresponds to the supervised learning setting where the goal is to make predictions for pairs of objects. Prominent applications include predicting drug-target or protein-protein interactions, or customer-product preferences. Several kernel functions have been proposed for incorporating prior knowledge about the relationship betw...
Free-to-play has become one of the most popular monetization models, and as a consequence game developers need to get the players to purchase in the game instead of getting players to buy the game. Game analytics and player monetization prediction are important parts in estimating the profitability of a free-to-play game. In this paper, we concentr...
In this paper, we propose a generalized wrapper-based feature selection, called GeFeS, which is based on a parallel new intelligent genetic algorithm (GA). The proposed GeFeS works properly under different numerical dataset dimensions and sizes, carefully tries to avoid overfitting and significantly enhances classification accuracy. To make the GA...
16 The automatic detection of facial expressions of pain has been needed to ensure accurate 17 pain assessment of patients who are unable to self-report pain. To overcome the challenges 18 of automatic systems for determining pain levels based on facial expressions in clinical patient 19 monitoring, a surface electromyography method was tested for...
Many statistical models have been developed to understand the causes of unemployment, but predicting unemployment has received less attention. In this study, we develop a model to predict the labour market state of a person based on machine learning trained with a large administrative unemployment registry. The model specifies individuals as Markov...
Peer-to-peer lending is a new lending approach gaining in popularity. These loans can offer high interest rates, but they are also exposed to credit risk. In fact, high default rates and low recovery rates are the norms. Potential investors want to know the expected profit in these loans, which means they need to model both defaults and recoveries....
Over the last years, the consumption of unpasteurised milk has increased in popularity in the Western countries, despite the known risks associated with food-borne pathogens. Some people appear to experience milk-related gastrointestinal symptoms even when tested negative for lactose intolerance and milk allergy. In such cases, processing of milk,...
The aim of this prospective single-institution clinical trial (NCT02002455) was to evaluate the potential of advanced post-processing methods for ¹⁸F-Fluciclovine PET and multisequence multiparametric MRI in the prediction of prostate cancer (PCa) aggressiveness, defined by Gleason Grade Group (GGG). 21 patients with PCa underwent PET/CT, PET/MRI a...
In machine learning one often assumes the data are independent when evaluating model performance. However, this rarely holds in practise. Geographic information data sets are an example where the data points have stronger dependencies among each other the closer they are geographically. This phenomenon known as spatial autocorrelation (SAC) causes...
Resting state electroencephalographic (EEG) recording could provide cost-effective means to aid in the detection of neurological disorders such as Parkinson's disease (PD). We examined how many electrodes are needed for classification of PD based on EEG, which electrode locations provide most value for classification, and whether data recorded eyes...
This paper proposes a novel method for learning highly nonlinear, multivariate functions from examples. Our method takes advantage of the property that continuous functions can be approximated by polynomials, which in turn are representable by tensors. Hence the function learning problem is transformed into a tensor reconstruction problem, an inver...
Background
Multiparametric prostate magnetic resonance imaging (mpMRI) can be considered the gold standard in prostate magnetic resonance imaging (MRI). Biparametric prostate MRI (bpMRI) is faster and could be a feasible alternative to mpMRI.
Objective
To determine the negative predictive value (NPV) of Improved Prostate Cancer Diagnosis (IMPROD)...
Introduction:
Predictive survival modeling offers systematic tools for clinical decision-making and individualized tailoring of treatment strategies to improve patient outcomes while reducing overall healthcare costs. In 2015, a number of machine learning and statistical models were benchmarked in the DREAM 9.5 Prostate Cancer Challenge, based on...
Background:
Multiparametric MRI of the prostate has been shown to improve the risk stratification of men with an elevated prostate-specific antigen (PSA). However, long acquisition time, high cost, and inter-center/reader variability of a routine prostate multiparametric MRI limit its wider adoption.
Purpose:
To develop and validate nomograms ba...
Background:
Accurate risk stratification of men with a clinical suspicion of prostate cancer (cSPCa) remains challenging despite the increasing use of MRI.
Purpose:
To evaluate the diagnostic accuracy of a unique biparametric MRI protocol (IMPROD bpMRI) combined with clinical and molecular markers in men with cSPCa.
Study type:
Prospective sin...
Background:
Biochemical recurrence (BCR) affects a significant proportion of patients who undergo robotic-assisted laparoscopic prostatectomy (RALP).
Purpose:
To evaluate the performance of a routine clinical prostate multiparametric magnetic resonance imaging (mpMRI) and Decipher genomic classifier score for prediction of biochemical recurrence...
Background
Biparametric magnetic resonance imaging (bpMRI) combined with prostate-specific antigen density (PSAd) may be an effective strategy for selecting men for prostate biopsy. It has been shown that performing biopsy only for men with bpMRI Likert scores of 4–5 or PSAd ≥0.15 ng/ml/cm³ is the most efficient strategy.
Objective
To externally v...
Patient self-reporting of pain is not always possible, in those cases automated objective pain assessment could lead to reliable pain assessment. In this context, physiological measurements have been studied and one of the promising signals is skin conductance (SC). In this study, 1Hz SC signal acquisition is performed while gradually increasing he...
Purpose:
To develop and validate a classifier system for prediction of prostate cancer (PCa) Gleason score (GS) using radiomics and texture features of T2-weighted imaging (T2w), diffusion weighted imaging (DWI) acquired using high b values, and T2-mapping (T2).
Methods:
T2w, DWI (12 b values, 0-2000 s/mm2), and T2 data sets of 62 patients with...
Automatic locating of weeds from fields is an active research topic in precision agriculture. A reliable and practical plant identification technique would enable the reduction of herbicide amounts and lowering of production costs, along with reducing the damage to the ecosystem. When the seeds have been sown row-wise, most weeds may be located bet...
Machine learning based classification methods are widely used in geoscience applications, including mineral prospectivity mapping. Typical characteristics of the data, such as small number of positive instances, imbalanced class distributions and lack of verified negative instances make ROC analysis and cross-validation natural choices for classifi...
We evaluated repeatability and diagnostic performance of commonly used radiomic features for prostate cancer (PCa) DWI obtained using b values up to 2000 s/mm2. Forty-eight men with diagnosed PCa under two repeated 3T MRI examinations performed on the same day. Whole mounts prostatectomy sections were manually matched with in-vivo MRI data. Fourtee...
Eighty men with a clinical suspicion of prostate cancer (PCa) were enrolled as a part of IMPROD trial (NCT01864135). The performance of 9 clinical parameters, 11 mRNA transcript levels and 4 IMPROD biparametric MRI (bpMRI) parameters for detection of PCa with Gleason score ≥3+4 was evaluated using GreedyRLS feature selection and nested cross-valida...
Nomograms for prediction of prostate biopsy outcomes incorporating qualitative and quantitative findings of IMPROD biparametric MRI (IMPROD bpMRI consists of T2 weighted imaging and three separate DWI acquisitions) were developed using data of 161 men enrolled as a part of the single-institutional IMPROD (NCT01864135) trial and externally validated...
In this prospective single institutional trial (NCT01864135), we evaluated the accuracy of a unique prostate MRI acquisition and reporting protocol, IMPROD biparametric MRI, in men with a clinical suspicion of prostate cancer who were subsequently diagnosed with prostate cancer and underwent prostatectomy. IMPROD biparametric MRI correctly detected...
Background
Prostate MRI is increasingly being used in men with a clinical suspicion of prostate cancer (PCa). However, development and validation of methods for focal therapy planning are still lagging.
Purpose
To evaluate the diagnostic accuracy on lesion, region‐of‐interest (ROI), and voxel level of IMPROD biparametric prostate MRI (bpMRI) for P...
Remote health monitoring is an effective method to enable tracking of at-risk patients outside of conventional clinical settings, providing early-detection of diseases and preventive care as well as diminishing healthcare costs. Internet of Things (IoT) technology facilitates development of such monitoring systems although significant challenges ne...
Background
The molecular mechanisms mediating postnatal loss of cardiac regeneration in mammals are not fully understood. We aimed to provide an integrated resource of mRNA, protein, and metabolite changes in the neonatal heart for identification of metabolism‐related mechanisms associated with cardiac regeneration.
Methods and Results
Mouse ventr...
Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Rece...
Appendix S1. RNA sequencing and differential expression analysis.
Appendix S2. Gene set enrichment analysis.
Appendix S4. Metabolomics.
Appendix S5. Fuzzy clustering (transcripts, proteins, and metabolites in each cluster) and upstream regulator analysis (Ingenuity pathway analysis; Qiagen).
Data S1. Supplemental methods.
Table S1. Expression of Selected Genes Linked to Cardiac Regeneration and the Postnatal Metabolic Switch
Table S2. Proteomics
Figure S1. Individual factor maps of the RNA sequencing data principal component (PC) analysis shows perfect separation of sample groups.
Figure S2. Top 10 down‐ and upregulated genes in th...
A novel microtiter plate array for the quantification and identification of metal ions in drinking water was compared to human taste panel analysis. The array is based on nonspecific interactions between analyte metal ions and lanthanide chelates with non-antenna and antenna ligands, leading to a luminescence signal profile unique to sample compone...
Motivation:
Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially...
Digital maps of forest resources are a crucial factor in successful forestry applications. Since manual measurement of this data on large areas is infeasible, maps must be constructed using a sample field data set and a prediction model constructed from remote sensing materials, of which airborne laser scanning (ALS) data and aerial images are curr...
Prenatal screening generates a great amount of data that is used for predicting risk of various disorders. Prenatal risk assessment is based on multiple clinical variables and overall performance is defined by how well the risk algorithm is optimized for the population in question. This article evaluates machine learning algorithms to improve perfo...
Rationale
Mammals lose the ability to regenerate their hearts within one week after birth. During this regenerative window, cardiac energy metabolism shifts from glycolysis to fatty acid oxidation, and recent evidence suggests that metabolism may participate in controlling cardiomyocyte cell cycle. However, the molecular mechanisms mediating the lo...
This is a poster presented at the Frontiers of Cardiovascular Biology conference in Vienna, April 2018. The study has been published in J Am Heart Assoc and can be accessed online at https://doi.org/10.1161/JAHA.118.010378.
Raman spectroscopy is widely used for quantitative pharmaceutical analysis, but a common obstacle to its use is sample fluorescence masking the Raman signal. Time-gating provides an instrument-based method for rejecting fluorescence through temporal resolution of the spectral signal, and allows Raman spectra of fluorescent materials to be obtained....
Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction or network inference problems. During the last decade kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive perfor...
Receiver operating characteristic (ROC) analysis is widely used for evaluating diagnostic systems. Recent studies have shown that estimating an area under ROC curve (AUC) with standard cross-validation methods suffers from a large bias. The leave-pair-out (LPO) cross-validation has been shown to correct this bias. However, while LPO produces an alm...
Motivation
Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between...
The rut formation during forest operations is an undesirable phenomenon. A methodology is being proposed to measure the rut depth distribution of a logging site by photogrammetric point clouds produced by unmanned aerial vehicles (UAV). The methodology includes five processing steps that aim at reducing the noise from the surrounding trees and unde...
Some people experience gastrointestinal symptoms related to cow's milk consumption even if neither lactose intolerance nor cow's milk allergy can be diagnosed. To investigate whether milk homogenization could cause gastrointestinal problems, homogenized and pasteurized milk and native milk were served to eleven volunteers who reported such sensitiv...