Martin PavlovskiYahoo Research
Martin Pavlovski
Ph.D. in Computer and Information Science
About
35
Publications
3,609
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
329
Citations
Introduction
Skills and Expertise
Publications
Publications (35)
Disagreement among text annotators as a part of a human (expert) labeling process produces noisy labels, which affect the performance of supervised learning algorithms for natural language processing. Using only high agreement annotations introduces another challenge: the data imbalance problem. We study this challenge within the problem of relatin...
Datasets with imbalanced class distribution are available in various real-world applications. A great number of approaches has been proposed to address the class imbalance challenge, but most of these models perform poorly when datasets are characterized with high class imbalance, class overlap and low data quality. In this study, we propose an eff...
Machine learning (ML) models for analyzing medical data are critical for both accelerating development of novel diagnostic and treatment strategies and improving the accuracy of medical care delivery. Our objective was to comprehensively review supervised ML models for diagnosis or treatment prediction. Publications indexed in PubMed were reviewed...
In multi-view learning, leveraging features from various views in an optimal manner to improve the performance on predictive tasks is a challenging objective. For this purpose, a broad range of approaches have been proposed. However, existing works focus either on capturing (1) the common and complementary information across views, or (2) the under...
Anomaly detection has been a lasting yet active research area for decades. However, the existing methods are generally biased towards capturing the regularities of high-density normal instances with insufficient learning of peripheral instances. This may cause a failure in finding a representative description of the normal class, leading to high fa...
Cancer is one of the most common causes of death in the world. It is characterized by the multi-stage transformation of normal cells into tumor cells. Early cancer detection can significantly reduce its consequences, which was the objective of many machine learning (ML) published studies. However, most of them focused on microarray, gene expression...
Beyond their primary diagnostic purpose, radiology reports have been an invaluable source of information in medical research. Given a corpus of radiology reports, researchers are often interested in identifying a subset of reports describing a particular medical finding. Because the space of medical findings in radiology reports is vast and potenti...
An end-to-end supervised learning method is proposed for fault detection in the electric grid using Big Data from multiple Phasor Measurement Units (PMUs). The approach consists of preprocessing steps aimed at reducing data noise and dimensionality, followed by utilization of six classification models considered for detecting faults. Three of the m...
Event classification is one of the central components of automated disturbance analysis based on PMU measurements. Obtaining high-quality event labels remains a challenge for supervised learning-based classification of local and system-wide events in power grids due to its labor-intensive requirement. We present a sensitivity study considering rapi...
Event detection in electrical grids is a challenging problem for machine learning methods due to spatiotemporally nonstationary systems and the inability to automate event labeling in high-volume data such as PMU measurements. As a result, the existing historical event logs created manually do not correlate well with the corresponding PMU measureme...
Introduction
Thyroid cancer represents 3.1 % of diagnosed cancers in the United States. The objective of this research was to identify comorbidities and discover additional genes potentially related to thyroid cancer and improve current knowledge of genetics and comorbidities associated with this cancer.
Methods
Healthcare Cost and Utilization Pro...
As the prevalence of drones increases, understanding and preparing for possible adversarial uses of drones and drone swarms is of paramount importance. Correspondingly, developing defensive mechanisms in which swarms can be used to protect against adversarial Unmanned Aerial Vehicles (UAVs) is a problem that requires further attention. Prior work o...
Background and objective:
Alzheimer's disease (AD) is the most common type of dementia that can seriously affect a person's ability to perform daily activities. Estimates indicate that AD may rank third as a cause of death for older people, after heart disease and cancer. Identification of individuals at risk for developing AD is imperative for te...
Objective:
We sought to predict if patients with type 2 diabetes mellitus (DM2) would develop 10 selected complications. Accurate prediction of complications could help with more targeted measures that would prevent or slow down their development.
Materials and methods:
Experiments were conducted on the Healthcare Cost and Utilization Project St...
Drone swarms are becoming a new tool for many tasks including surveillance, search, rescue, construction, and defense related activities. As their usage increases, so does the possibility of adversarial attacks on their contribution to these use cases. One possible avenue, whether deliberate or not, is to deny access to the position feedback offere...
Background
Colorectal cancer (CRC) is the third most common cancer in the United States and the second leading cause of cancer death. The goal was to identify comorbidities and genes associated with CRC.
Methods
A novel social network model was developed on the Healthcare Cost and Utilization Project (HCUP) - State Inpatient Databases (SID) Califo...
Background Colorectal cancer (CRC) is the third most common cancer in the United States, and the second leading cause of cancer death. Comorbidity network analyses of CRC can help understanding of the illness progression. About 10%-30% of patients have a family history of CRC that suggests a hereditary contribution, including pathogenic variants of...
Objective:
Clinical trials, prospective research studies on human participants carried out by a distributed team of clinical investigators, play a crucial role in the development of new treatments in health care. This is a complex and expensive process where investigators aim to enroll volunteers with predetermined characteristics, administer trea...
Introduction:
The objective of this study is to improve the understanding of spatial spreading of complicated cases of influenza that required hospitalizations, by creating heatmaps and social networks. They will allow to identify critical hubs and routes of spreading of Influenza, in specific geographic locations, in order to contain infections a...
Decision trees and logistic regression are one of the most popular and well-known machine learning algorithms, frequently used to solve a variety of real-world problems. Stability of learning algorithms is a powerful tool to analyze their performance and sensitivity and subsequently allow researchers to draw reliable conclusions. The stability of t...
Stacking is a general approach for combining multiple models toward greater predictive accuracy. It has found various application across different domains, ensuing from its meta-learning nature. Our understanding, nevertheless, on how and why stacking works remains intuitive and lacking in theoretical insight. In this paper, we use the stability of...
Attaining the proper balance between underfitting and overfitting is one of the central challenges in machine learning. It has been approached mostly by deriving bounds on generalization risks of learning algorithms. Such bounds are, however, rarely controllable. In this study, a novel bias-variance balancing objective function is introduced in ord...
Ensemble generation is a natural and convenient way of achieving better generalization performance of learning algorithms by gathering their predictive capabilities. Here, we nurture the idea of ensemble-based learning by combining bagging and boosting for the purpose of binary classification. Since the former improves stability through variance re...
Efficient control of power systems is becoming increasingly difficult as they gain in complexity and size. By considering a power grid and a communication infrastructure as a multiplex network, we propose an automatic control strategy that regulates the mechanical power output of the generators based on information obtained via communication links...