Lior Rokach

Lior Rokach
Ben-Gurion University of the Negev | bgu · Department of Information Systems Engineering

Professor

About

415
Publications
266,027
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
22,297
Citations

Publications

Publications (415)
Article
Full-text available
The algorithm selection problem is defined as identifying the best-performing machine learning (ML) algorithm for a given combination of dataset, task, and evaluation measure. The human expertise required to evaluate the increasing number of ML algorithms available has resulted in the need to automate the algorithm selection task. Various approache...
Preprint
Full-text available
In this paper, we propose an innovative Transfer learning for Time series classification method. Instead of using an existing dataset from the UCR archive as the source dataset, we generated a 15,000,000 synthetic univariate time series dataset that was created using our unique synthetic time series generator algorithm which can generate data with...
Article
Basketball is one of the most popular types of sports in the world. Recent technological developments have made it possible to collect large amounts of data on the game, analyze it, and discover new insights. We propose a novel approach for modeling basketball games using deep reinforcement learning. By analyzing multiple aspects of both the player...
Preprint
Full-text available
Discovering the existence of universal adversarial perturbations had large theoretical and practical impacts on the field of adversarial learning. In the text domain, most universal studies focused on adversarial prefixes which are added to all texts. However, unlike the vision domain, adding the same perturbation to different inputs results in not...
Article
Full-text available
In recent years, due to the complementary action of drug combinations over mono-therapy, the multiple-drugs for multiple-targets paradigm has received increased attention to treat bacterial infections and complex diseases. Although new drug combinations screening has benefited from experimental tests like automated high throughput screening, it is...
Article
Discovering the existence of universal adversarial perturbations had large theoretical and practical impacts on the field of adversarial learning. In the text domain, most universal studies focused on adversarial prefixes which are added to all texts. However, unlike the vision domain, adding the same perturbation to different inputs results in not...
Article
Predicting application usage is useful for offering personalized services, improving mobile energy consumption, and mobile system resource management optimization. Currently, however, there are many possible applications, and each user has his/her own preferences and usage patterns, which makes the application prediction task very challenging. In t...
Article
Full-text available
Driving under the influence of alcohol is a widespread phenomenon in the US where it is considered a major cause of fatal accidents. In this research, we present Virtual Breathalyzer, a novel approach for detecting intoxication from the measurements obtained by the sensors of smartphones and wrist-worn devices. We formalize the problem of intoxicat...
Article
Adversarial examples have proven to be a concerning threat to deep learning models, particularly in the image domain. While many studies have examined adversarial examples in the real world, most of them relied on 2D photos of the attack scene. As a result, the attacks proposed may have limited effectiveness when implemented in realistic environmen...
Article
Assessing the information security awareness (ISA) of users is crucial for protecting systems and organizations from social engineering attacks. Current methods do not consider the context of use when assessing users’ ISA, and therefore they cannot accurately reflect users’ actual behavior, which often depends on that context. In this study, we pro...
Article
Background Chronic lymphocytic leukemia (CLL) is one of the most common types of leukemia in the western world which affects mainly the elderly population. Progress of the disease is very heterogeneous both in terms of necessity of treatment and life expectancy. The current scoring system for prognostic evaluation of patients with CLL is called CLL...
Article
One of the most common types of crime, burglary often results in serious psychological trauma and has financial consequences. Predicting burglaries is a challenging task due to their high degree of randomness. In this study, we propose predicting burglaries based on various contextual factors and incorporating these factors in a unique deep learnin...
Preprint
Full-text available
This paper presents Deepchecks, a Python library for comprehensively validating machine learning models and data. Our goal is to provide an easy-to-use library comprising of many checks related to various types of issues, such as model predictive performance, data integrity, data distribution mismatches, and more. The package is distributed under t...
Article
Full-text available
Reconstructing the circuit model presents a challenge for circuits with unknown functional specifications. The circuit is thought of as a black box that, given an input, produces an output. The model of the circuit, on the other hand, is unknown. Given a set of inputs and their corresponding outputs, the goal is then to recover the circuit specific...
Article
Improving the robustness of neural nets in regression tasks is key to their application in multiple domains. Deep learning-based approaches aim to achieve this goal either by improving their prediction of specific values (i.e., point prediction), or by producing prediction intervals (PIs) that quantify uncertainty. We present IPIV, a deep neural ne...
Chapter
Full-text available
Although imperfect, recent advances in artificial intelligence—which are mostly based on well-known principles of probability and linear algebra—help programmers build software that was previously impossible or very hard to construct. However, such software is susceptible to various classes of attacks, can leak sensitive information, succumb to bia...
Article
Full-text available
Highlights Novel artificial nose based upon electrode-deposited carbon dots (C-dots). Significant selectivity and sensitivity determined by “polarity matching” between the C-dots and gas molecules. The C-dot artificial nose facilitates, for the first time, real-time, continuous monitoring of bacterial proliferation and discrimination among bacteria...
Article
Motivation Teratogenic drugs can cause severe fetal malformation and therefore have critical impact on the health of the fetus, yet the teratogenic risks are unknown for most approved drugs. This paper proposes an explainable machine learning model for classifying pregnancy drug safety based on multimodal data and suggests an orthogonal ensemble fo...
Article
Background: Patients with chronic lymphocytic leukemia (CLL) are known to have a suboptimal immune response of both humoral and cellular arms. Recently, a BNT162b2 mRNA COVID-19 vaccine was introduced with a high efficacy of 95% in immunocompetent individuals. Approximately half of the patients with CLL fail to mount a humoral response to the vacci...
Preprint
Anomaly detection is a well-known task that involves the identification of abnormal events that occur relatively infrequently. Methods for improving anomaly detection performance have been widely studied. However, no studies utilizing test-time augmentation (TTA) for anomaly detection in tabular data have been performed. TTA involves aggregating th...
Article
A context-aware recommender system (CARS) utilizes users’ context to provide personalized services. Contextual information can be derived from sensors in order to improve the accuracy of the recommendations. In this work, we focus on CARSs with high-dimensional contextual information that typically impacts the recommendation model, for example, by...
Article
The widespread adoption of machine learning (ML) techniques and the extensive expertise required to apply them have led to increased interest in automated ML solutions that reduce the need for human intervention. One of the main challenges in applying ML to previously unseen problems is algorithm selection - the identification of high-performing al...
Article
Deep learning algorithms for anomaly detection, such as autoencoders, point out the outliers, saving experts the time-consuming task of examining normal cases in order to find anomalies. Most outlier detection algorithms output a score for each instance in the database. The top-k most intense outliers are returned to the user for further inspection...
Article
Full-text available
Patients with chronic lymphocytic leukemia (CLL) have a suboptimal humoral response to vaccination. Recently, a BNT162b2 mRNA COVID-19 vaccine was introduced with a high efficacy of 95% in immunocompetent individuals. We investigated the safety and efficacy of BNT162b2 mRNA Covid-19 vaccine in patients with CLL from nine medical centers in Israel,...
Article
Full-text available
This work presents a novel method for applying test-time augmentation (TTA) to tabular data. We used TTA along with an ensemble of 42 models to achieve higher performance on the MIT Global Open Source Severity of Illness Score dataset consisting of 131,051 ICU visits and outcomes. This method achieved an AUC of 0.915 on the private test set (19,669...
Article
The data clustering problem can be described as the task of organizing data into groups, where in each group the objects share some similar attributes. Most of the problems clustering algorithms address do not have a prior solution. This paper addresses the algorithm selection challenge for data clustering, while taking the difficulty in evaluating...
Article
Deep neural nets (DNNs) mostly tend to outperform other machine learning (ML) approaches when the training data is abundant, high-dimensional, sparse, or consisting of raw data (e.g., pixels). For datasets with other characteristics – for example, dense tabular numerical data – algorithms such as Gradient Boosting Machines and Random Forest often a...
Article
In recent years, machine learning algorithms, and more specifically deep learning algorithms, have been widely used in many fields, including cyber security. However, machine learning systems are vulnerable to adversarial attacks, and this limits the application of machine learning, especially in non-stationary, adversarial environments, such as th...
Article
The ability to predict human mobility, i.e., transitions between a user's significant locations (the home, workplace, etc.) can be helpful in a wide range of applications, including targeted advertising, personalized mobile services, and transportation planning. Most studies on human mobility prediction have focused on the algorithmic perspective r...
Article
The increasing usage of machine-learning models in critical domains has recently stressed the necessity of interpretable machine-learning models. In areas like healthcare, finance, and judiciary - the model consumer must understand the rationale behind the model output in order to use it when making a decision. For this reason, it is impossible to...
Article
In recent years, keystroke dynamics has gained popularity as a reliable means of verifying user identity in remote systems. Due to its high performance in verification and the fact that it does not require additional effort from the user, keystroke dynamics has become one of the most preferred second factor of authentication. Despite its prominence...
Preprint
Genetic studies of Mendelian and rare diseases face the critical challenges of identifying pathogenic gene variants and their modes-of-action. Previous efforts rarely utilized the tissue-selective manifestation of these diseases for their elucidation. Here we introduce an interpretable machine learning (ML) platform that utilizes heterogeneous and...
Preprint
Full-text available
Although many studies have examined adversarial examples in the real world, most of them relied on 2D photos of the attack scene; thus, the attacks proposed cannot address realistic environments with 3D objects or varied conditions. Studies that use 3D objects are limited, and in many cases, the real-world evaluation process is not replicable by ot...
Article
Full-text available
In the original publication of the article, the equation 1 − ((number of queries)/n × m) under the section “5 Experimental Setup” was published incorrectly.
Preprint
Full-text available
The widespread adoption of machine learning (ML) techniques and the extensive expertise required to apply them have led to increased interest in automated ML solutions that reduce the need for human intervention. One of the main challenges in applying ML to previously unseen problems is algorithm selection - the identification of high-performing al...
Preprint
Full-text available
Despite continuous investments in data technologies, the latency of querying data still poses a significant challenge. Modern analytic solutions require near real-time responsiveness both to make them interactive and to support automated processing. Current technologies (Hadoop, Spark, Dataflow) scan the dataset to execute queries. They focus on pr...
Article
Objective: To use machine learning-based methods in designing a predictive model of rehabilitation outcomes for post-acute hip-fractured patients. Design: A retrospective analysis using linear models, AdaBoost, CatBoost, ExtraTrees, K-Nearest Neighbors, RandomForest, Support vector machine, XGBoost, and voting of all models to develop and valida...
Preprint
Full-text available
One of the challenges in the NLP field is training large classification models, a task that is both difficult and tedious. It is even harder when GPU hardware is unavailable. The increased availability of pre-trained and off-the-shelf word embeddings, models, and modules aim at easing the process of training large models and achieving a competitive...
Article
Motivation: High-resolution microbial strain typing is essential for various clinical purposes, including disease outbreak investigation, tracking of microbial transmission events, and epidemiological surveillance of bacterial infections. The widely used approach for multilocus sequence typing that is based on the core genome, cgMLST, has the adva...
Preprint
A context-aware recommender system (CARS) applies sensing and analysis of user context to provide personalized services. The contextual information can be driven from sensors in order to improve the accuracy of the recommendations. Yet, generating accurate recommendations is not enough to constitute a useful system from the users' perspective, sinc...
Preprint
Full-text available
Testing is an important part of tackling the COVID-19 pandemic. Availability of testing is a bottleneck due to constrained resources and effective prioritization of individuals is necessary. Here, we discuss the impact of different prioritization policies on COVID-19 patient discovery and the ability of governments and health organizations to use t...
Preprint
The click-through rate (CTR) reflects the ratio of clicks on a specific item to its total number of views. It has significant impact on websites' advertising revenue. Learning sophisticated models to understand and predict user behavior is essential for maximizing the CTR in recommendation systems. Recent works have suggested new methods that repla...
Preprint
In recent years, machine learning algorithms, and more specially, deep learning algorithms, have been widely used in many fields, including cyber security. However, machine learning systems are vulnerable to adversarial attacks, and this limits the application of machine learning, especially in non-stationary, adversarial environments, such as the...
Article
Image understanding heavily relies on accurate multi-label classification. In recent years, deep learning algorithms have become very successful for such tasks, and various commercial and open-source APIs have been released for public use. However, these APIs are often trained on different datasets, which, besides affecting their performance, might...
Article
Ensembles, especially ensembles of decision trees, are one of the most popular and successful techniques in machine learning. Recently, the number of ensemble-based proposals has grown steadily. Therefore, it is necessary to identify which are the appropriate algorithms for a certain problem. In this paper, we aim to help practitioners to choose th...
Article
Study question: Can a machine-learning-based model trained in clinical and biological variables support the prediction of the presence or absence of sperm in testicular biopsy in non-obstructive azoospermia (NOA) patients? Summary answer: Our machine-learning model was able to accurately predict (AUC of 0.8) the presence or absence of spermatozo...
Preprint
Full-text available
Improving the robustness of neural nets in regression tasks is key to their application in multiple domains. Deep learning-based approaches aim to achieve this goal either by improving the manner in which they produce their prediction of specific values (i.e., point prediction), or by producing prediction intervals (PIs) that quantify uncertainty....
Chapter
Database activity monitoring systems aim to protect organizational data by logging users’ activity to Identify and document malicious activity. High-velocity streams and operating costs, restrict these systems to examining only a sample of the activity. Current solutions use manual policies to decide which transactions to monitor. This limits the d...
Preprint
Full-text available
No effective drugs targeting COVID-19 were currently found in clinical trials but more than 90 antiviral drugs approved for the use of humans. Extensive efforts are made to identify existing drugs that are effective in treating COVID-19. However, the pregnancy safety of most of these drugs is not known. According to estimations, the clinical charac...
Article
Full-text available
During the process of rehabilitation after stroke, it is important that patients know how well they perform their exercise, so they can improve their performance in future repetitions. Standard clinical rating conducted by human observation is the prevailing way today to monitor motor recovery of the patient. Therefore, a patient cannot know whethe...
Article
High throughput coherent optical transmitters are key components in future optical communication infrastructure. However, these transmitters are often distorted with the nonlinearity of their components. A potential approach for compensating nonlinearity is by applying digital pre-distortion methods based on the Volterra series or one of its deriva...
Preprint
One of the challenging aspects of applying machine learning is the need to identify the algorithms that will perform best for a given dataset. This process can be difficult, time consuming and often requires a great deal of domain knowledge. We present Sommelier, an expert system for recommending the machine learning algorithms that should be appli...
Article
Decision forests are considered the best practice in many machine learning challenges, mainly due to their superior predictive performance. However, simple models like decision trees may be preferred over decision forests in cases in which the generated predictions must be efficient or interpretable (e.g. in insurance or health-related use cases)....
Article
Full-text available
Bearing spall detection and predicting its size are great challenges. Model-based simulation is a well-known traditional approach to physically model the influence of the spall on the bearing. Building a physical model is challenging due to the bearing complexity and the expert knowledge required to build such a model. Obviously, building a partial...
Preprint
Full-text available
We present the Network Traffic Generator (NTG), a framework for perturbing recorded network traffic with the purpose of generating diverse but realistic background traffic for network simulation and what-if analysis in enterprise environments. The framework preserves many characteristics of the original traffic recorded in an enterprise, as well as...
Preprint
Full-text available
Sequential data is everywhere, and it can serve as a basis for research that will lead to improved processes. For example, road infrastructure can be improved by identifying bottlenecks in GPS data, or early diagnosis can be improved by analyzing patterns of disease progression in medical data. The main obstacle is that access and use of such data...