• Home
  • Martin Pavlovski
Martin Pavlovski

Martin Pavlovski
Yahoo Research

Ph.D. in Computer and Information Science

About

35
Publications
3,609
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
329
Citations

Publications

Publications (35)
Article
Full-text available
Disagreement among text annotators as a part of a human (expert) labeling process produces noisy labels, which affect the performance of supervised learning algorithms for natural language processing. Using only high agreement annotations introduces another challenge: the data imbalance problem. We study this challenge within the problem of relatin...
Article
Full-text available
Datasets with imbalanced class distribution are available in various real-world applications. A great number of approaches has been proposed to address the class imbalance challenge, but most of these models perform poorly when datasets are characterized with high class imbalance, class overlap and low data quality. In this study, we propose an eff...
Preprint
Full-text available
Machine learning (ML) models for analyzing medical data are critical for both accelerating development of novel diagnostic and treatment strategies and improving the accuracy of medical care delivery. Our objective was to comprehensively review supervised ML models for diagnosis or treatment prediction. Publications indexed in PubMed were reviewed...
Chapter
In multi-view learning, leveraging features from various views in an optimal manner to improve the performance on predictive tasks is a challenging objective. For this purpose, a broad range of approaches have been proposed. However, existing works focus either on capturing (1) the common and complementary information across views, or (2) the under...
Chapter
Anomaly detection has been a lasting yet active research area for decades. However, the existing methods are generally biased towards capturing the regularities of high-density normal instances with insufficient learning of peripheral instances. This may cause a failure in finding a representative description of the normal class, leading to high fa...
Article
Cancer is one of the most common causes of death in the world. It is characterized by the multi-stage transformation of normal cells into tumor cells. Early cancer detection can significantly reduce its consequences, which was the objective of many machine learning (ML) published studies. However, most of them focused on microarray, gene expression...
Preprint
Full-text available
Beyond their primary diagnostic purpose, radiology reports have been an invaluable source of information in medical research. Given a corpus of radiology reports, researchers are often interested in identifying a subset of reports describing a particular medical finding. Because the space of medical findings in radiology reports is vast and potenti...
Article
An end-to-end supervised learning method is proposed for fault detection in the electric grid using Big Data from multiple Phasor Measurement Units (PMUs). The approach consists of preprocessing steps aimed at reducing data noise and dimensionality, followed by utilization of six classification models considered for detecting faults. Three of the m...
Article
Event classification is one of the central components of automated disturbance analysis based on PMU measurements. Obtaining high-quality event labels remains a challenge for supervised learning-based classification of local and system-wide events in power grids due to its labor-intensive requirement. We present a sensitivity study considering rapi...
Article
Full-text available
Event detection in electrical grids is a challenging problem for machine learning methods due to spatiotemporally nonstationary systems and the inability to automate event labeling in high-volume data such as PMU measurements. As a result, the existing historical event logs created manually do not correlate well with the corresponding PMU measureme...
Article
Introduction Thyroid cancer represents 3.1 % of diagnosed cancers in the United States. The objective of this research was to identify comorbidities and discover additional genes potentially related to thyroid cancer and improve current knowledge of genetics and comorbidities associated with this cancer. Methods Healthcare Cost and Utilization Pro...
Article
As the prevalence of drones increases, understanding and preparing for possible adversarial uses of drones and drone swarms is of paramount importance. Correspondingly, developing defensive mechanisms in which swarms can be used to protect against adversarial Unmanned Aerial Vehicles (UAVs) is a problem that requires further attention. Prior work o...
Article
Background and objective: Alzheimer's disease (AD) is the most common type of dementia that can seriously affect a person's ability to perform daily activities. Estimates indicate that AD may rank third as a cause of death for older people, after heart disease and cancer. Identification of individuals at risk for developing AD is imperative for te...
Article
Objective: We sought to predict if patients with type 2 diabetes mellitus (DM2) would develop 10 selected complications. Accurate prediction of complications could help with more targeted measures that would prevent or slow down their development. Materials and methods: Experiments were conducted on the Healthcare Cost and Utilization Project St...
Chapter
Drone swarms are becoming a new tool for many tasks including surveillance, search, rescue, construction, and defense related activities. As their usage increases, so does the possibility of adversarial attacks on their contribution to these use cases. One possible avenue, whether deliberate or not, is to deny access to the position feedback offere...
Article
Full-text available
Background Colorectal cancer (CRC) is the third most common cancer in the United States and the second leading cause of cancer death. The goal was to identify comorbidities and genes associated with CRC. Methods A novel social network model was developed on the Healthcare Cost and Utilization Project (HCUP) - State Inpatient Databases (SID) Califo...
Preprint
Full-text available
Background Colorectal cancer (CRC) is the third most common cancer in the United States, and the second leading cause of cancer death. Comorbidity network analyses of CRC can help understanding of the illness progression. About 10%-30% of patients have a family history of CRC that suggests a hereditary contribution, including pathogenic variants of...
Article
Objective: Clinical trials, prospective research studies on human participants carried out by a distributed team of clinical investigators, play a crucial role in the development of new treatments in health care. This is a complex and expensive process where investigators aim to enroll volunteers with predetermined characteristics, administer trea...
Article
Introduction: The objective of this study is to improve the understanding of spatial spreading of complicated cases of influenza that required hospitalizations, by creating heatmaps and social networks. They will allow to identify critical hubs and routes of spreading of Influenza, in specific geographic locations, in order to contain infections a...
Preprint
Full-text available
Decision trees and logistic regression are one of the most popular and well-known machine learning algorithms, frequently used to solve a variety of real-world problems. Stability of learning algorithms is a powerful tool to analyze their performance and sensitivity and subsequently allow researchers to draw reliable conclusions. The stability of t...
Preprint
Full-text available
Stacking is a general approach for combining multiple models toward greater predictive accuracy. It has found various application across different domains, ensuing from its meta-learning nature. Our understanding, nevertheless, on how and why stacking works remains intuitive and lacking in theoretical insight. In this paper, we use the stability of...
Conference Paper
Full-text available
Attaining the proper balance between underfitting and overfitting is one of the central challenges in machine learning. It has been approached mostly by deriving bounds on generalization risks of learning algorithms. Such bounds are, however, rarely controllable. In this study, a novel bias-variance balancing objective function is introduced in ord...
Article
Full-text available
Ensemble generation is a natural and convenient way of achieving better generalization performance of learning algorithms by gathering their predictive capabilities. Here, we nurture the idea of ensemble-based learning by combining bagging and boosting for the purpose of binary classification. Since the former improves stability through variance re...
Article
Efficient control of power systems is becoming increasingly difficult as they gain in complexity and size. By considering a power grid and a communication infrastructure as a multiplex network, we propose an automatic control strategy that regulates the mechanical power output of the generators based on information obtained via communication links...

Network

Cited By