Lior Rokach

Lior Rokach
  • Professor
  • Ben-Gurion University of the Negev

About

479
Publications
409,428
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
34,088
Citations
Current institution
Ben-Gurion University of the Negev

Publications

Publications (479)
Preprint
Full-text available
As new products are emerging daily, recommendation systems are required to quickly adapt to possible new domains without needing extensive retraining. This work presents ``X-Cross'' -- a novel cross-domain sequential-recommendation model that recommends products in new domains by integrating several domain-specific language models; each model is fi...
Article
Full-text available
Background One of the limiting toxicities of BTKi is the development of atrial fibrillation (AF), with an incidence of 3%–16%. Aim This study aimed to identify patients with chronic lymphocytic leukemia (CLL) starting both first‐ and second‐generation BTKis who are at high risk of developing AF using a machine learning approach. Methods The CLL c...
Article
Full-text available
Children with Autism Spectrum Disorder (ASD) often face unique risks during sports activities due to challenges such as motor coordination difficulties, sensory sensitivities, and communication impairments. This paper provides a comprehensive review of the use of wearable sensor technologies to enhance the safety and participation of children with...
Preprint
Full-text available
Large language models (LLMs) often appear to excel on public benchmarks, but these high scores may mask an overreliance on dataset-specific surface cues rather than true language understanding. We introduce the Chameleon Benchmark Overfit Detector (C-BOD), a meta-evaluation framework that systematically distorts benchmark prompts via a parametric t...
Preprint
Full-text available
Large Language Models (LLMs) have shown remarkable capabilities across various natural language processing tasks but often struggle to excel uniformly in diverse or complex domains. We propose a novel ensemble method - Diverse Fingerprint Ensemble (DFPE), which leverages the complementary strengths of multiple LLMs to achieve more robust performanc...
Preprint
Full-text available
Algorithmic decision-making has become deeply ingrained in many domains, yet biases in machine learning models can still produce discriminatory outcomes, often harming unprivileged groups. Achieving fair classification is inherently challenging, requiring a careful balance between predictive performance and ethical considerations. We present FairTT...
Preprint
As machine learning (ML) systems increasingly impact critical sectors such as hiring, financial risk assessments, and criminal justice, the imperative to ensure fairness has intensified due to potential negative implications. While much ML fairness research has focused on enhancing training data and processes, addressing the outputs of already depl...
Article
Full-text available
This study introduces BagStacking, an innovative ensemble learning framework designed to enhance the detection of freezing of gait (FOG) in Parkinson’s disease (PD) using accelerometer data. By synergistically combining bagging’s variance reduction with stacking’s sophisticated blending mechanisms, BagStacking achieves superior predictive performan...
Article
Full-text available
The widespread use of machine and deep learning algorithms for anomaly detection has created a critical need for robust explanations that can identify the features contributing to anomalies. However, effective evaluation methodologies for anomaly explanations are currently lacking, especially those that compare the explanations against the true und...
Article
Background: One of the limiting toxicities of BTKi therapy is the development of atrial fibrillation (AF), with an incidence of 3% to 16%. Aim: To identify patients with chronic lymphocytic leukemia (CLL) who are at high risk of developing AF, using a machine learning approach. Methods: The CLL cohort was based on data obtained from electronic medi...
Chapter
Full-text available
The increasing application of machine learning (ML) in critical areas such as healthcare and finance highlights the importance of fairness in ML models, challenged by biases in training data that can lead to discrimination. We introduce ‘FairUS’, a novel pre-processing method for reducing bias in ML models utilizing the Conditional Generative Adver...
Article
Introduction: Proton pump inhibitors (PPIs) are one of the most widely used drugs worldwide [Gut Liver. 2017;11(1):27-37]. The use of PPI has become a common practice and is overprescribed for all patients with cancer including patients with hematological malignancies. In the current study, we aimed to explore retrospectively the effect of PPI, on...
Article
Pathogenic variants underlying Mendelian diseases often disrupt the normal physiology of a few tissues and organs. However, variant effect prediction tools that aim to identify pathogenic variants are typically oblivious to tissue contexts. Here we report a machine-learning framework, denoted "Tissue Risk Assessment of Causality by Expression for v...
Article
Introduction Artificial intelligence is becoming a useful tool in clinical practice. We have previously published an explainable machine-learning algorithm for drug use in pregnancy based on multimodal data and suggest an orthogonal ensemble for modeling multimodal data. The model was trained with a set of labeled drugs and processed over 100,000 t...
Preprint
Embedding news articles is a crucial tool for multiple fields, such as media bias detection, identifying fake news, and news recommendations. However, existing news embedding methods are not optimized for capturing the latent context of news events. In many cases, news embedding methods rely on full-textual information and neglect the importance of...
Article
Full-text available
Low levels of vitamin D are associated with a shorter time to first treatment (TTFT) and inferior overall survival in patients with Chronic Lymphocytic leukemia. But whether vitamin D supplement affects the clinical course of CLL patients, remains an open question. In the current study, we aimed to retrospectively explore the clinical benefit of Vi...
Article
Background/aim: The treatment for chronic lymphocytic leukemia (CLL) has changed dramatically over the last two decades. The current study aimed to investigate the impact on overall survival (OS) and time to next treatment (TTT) among CLL patients from 1998 to 2022. Patients and methods: The cohort was based on data obtained from electronic medi...
Article
Background Pancreatic ductal adenocarcinoma (PDAC) remains a serious threat to health, with limited effective therapeutic options, especially due to advanced stage at diagnosis and its inherent resistance to chemotherapy, making it one of the leading causes of cancer-related deaths worldwide. The lack of clear treatment directions underscores the u...
Article
Full-text available
Value Investing stands as one of the most time-honored strategies for long-term equity investment in financial markets, specifically in the domain of stocks. The essence of this approach lies in the estimation of a company's "intrinsic value," which serves as an investor's most refined gage of the company's true worth. Once the investor arrives at...
Article
Full-text available
Background Aneuploidy, an abnormal number of chromosomes within a cell, is a hallmark of cancer. Patterns of aneuploidy differ across cancers, yet are similar in cancers affecting closely related tissues. The selection pressures underlying aneuploidy patterns are not fully understood, hindering our understanding of cancer development and progressio...
Article
Full-text available
Decision trees are widely used for addressing learning tasks involving tabular data. Yet, they are susceptible to adversarial attacks. In this paper, we present Tree Test Time Simulation (TTTS), a novel inference-time methodology that incorporates Monte Carlo simulations into decision trees to enhance their robustness. TTTS introduces a probabilist...
Article
Classifying medical reports written in Hebrew is challenging due to the ambiguity and complexity of the language. This study proposes Text Test Time Augmentation (TTTA), a novel method to improve the classification accuracy of cancer severity levels from PET-CT diagnostic reports in Hebrew. Hebrew, being a morphologically rich language, often leads...
Article
Full-text available
Drug-drug interactions (DDIs) are a critical component of drug safety surveillance. Laboratory studies aimed at detecting DDIs are typically difficult, expensive, and time-consuming; therefore, developing in-silico methods is critical. Machine learning-based approaches for DDI prediction have been developed; however, in many cases, their ability to...
Article
Full-text available
Fruit cracking is a preharvest physiological rind disorder in citrus, sometimes causing considerable yield loss. In recent years, reports from Israel and other countries suggest that cracking incidence has increased, which might indicate that climate change intensifies the phenomena. The study aims to develop a machine learning (ML) model for predi...
Article
Usually, image-and radar-based data are used to perform environmental characteristics related tasks in autonomous cars, while the use of valuable sensor data from the Controller Area Network (CAN) bus has been limited. The vehicle’s CAN bus data consist of multivariate time series data, such as velocity, RPM, and acceleration, which contain meaning...
Article
Full-text available
Motivation: The process of drug discovery is notoriously complex, costing an average of 2.6 billion dollars and taking approximately 13 years to bring a new drug to the market. The success rate for new drugs is alarmingly low (around 0.0001%), and severe adverse drug reactions (ADRs) frequently occur, some of which may even result in death. Early...
Article
Excessive drinking is a major risk factor that leads to many health complications. The diagnosis of alcoholism is challenging, especially when the standard diagnostic tests rely on blood tests and questionnaires that are subjective to the patient and the examiner. The study’s major goal is to find new EEG classification methods to improve past find...
Article
Full-text available
Introduction: Haemato-oncologic patients are more susceptible to severe infections with SARS-CoV-2. We aimed to assess the clinical outcomes of SARS-CoV-2 infection among patients with Mycosis Fungoides and Sezary Syndrome (MF/SS). Methods: The data was retrieved from anonymized electronic medical records of Maccabi Healthcare Services (MHS), th...
Article
In this study, we aim to explore the outcomes of Covid-19 infection in patients with Hairy cell leukemia (HCL). The cohort is based on data obtained from electronic medical records. It includes 218 consecutive patients diagnosed with HCL between 16 June 1998, and 20 September 2022, out of which the coronavirus has infected 85 patients during the Om...
Preprint
Full-text available
Aneuploidy, an abnormal number of chromosomes within a cell, is considered a hallmark of cancer. Patterns of aneuploidy differ across cancers, yet are similar in cancers affecting closely-related tissues. The selection pressures underlying aneuploidy patterns are not fully understood, hindering our understanding of cancer development and progressio...
Article
Background/aim: Last year was characterized by the appearance of novel SARS-CoV-2 virus variants, mainly the omicron sub-lineages BA.2.12.1, BA.4, and BA.5, which have confirmed resistance to the acquired immune response developed following first-generation mRNA vaccines. Given the ability to use mRNA technology to respond quickly to variant strai...
Article
Full-text available
Pregnancies following diagnosis of Chronic Lymphocytic Leukemia (CLL) are rare events, mainly because the disease is typically diagnosed in the elderly. Literature on the topic is based only on case reports, and limited data are available on the influence of pregnancy on CLL course. In this retrospective study, we aimed to summarize the clinical an...
Article
Full-text available
Generating novel valid molecules is often a difficult task, because the vast chemical space relies on the intuition of experienced chemists. In recent years, deep learning models have helped accelerate this process. These advanced models can also help identify suitable molecules for disease treatment. In this paper, we propose Taiga, a transformer-...
Article
How do aberrations in widely expressed genes lead to tissue-selective hereditary diseases? Previous attempts to answer this question were limited to testing a few candidate mechanisms. To answer this question at a larger scale, we developed "Tissue Risk Assessment of Causality by Expression" (TRACE), a machine learning approach to predict genes tha...
Article
Full-text available
Machine learning-based Network Intrusion Detection Systems (NIDS) are designed to protect networks by identifying anomalous behaviors or improper uses. In recent years, advanced attacks, such as those mimicking legitimate traffic, have been developed to avoid alerting such systems. Previous works mainly focused on improving the anomaly detector its...
Article
Patients with CLL, even in the omicron era and post-vaccination, suffer from persistent covid, higher complications and mortality compared to the general population. In the current study, we evaluated retrospectively the effectiveness of Nirmatrelvir plus ritonavir among 1,080 patients with CLL that were infected with SARS-CoV-2. Nirmatrelvir admin...
Preprint
Full-text available
Adverse drug interactions are largely preventable causes of medical accidents, which frequently result in physician and emergency room encounters. The detection of drug interactions in a lab, prior to a drug's use in medical practice, is essential, however it is costly and time-consuming. Machine learning techniques can provide an efficient and acc...
Preprint
We propose a stealthy and powerful backdoor attack on neural networks based on data poisoning (DP). In contrast to previous attacks, both the poison and the trigger in our method are stealthy. We are able to change the model's classification of samples from a source class to a target class chosen by the attacker. We do so by using a small number of...
Preprint
Full-text available
Software Defect Prediction aims at predicting which software modules are the most probable to contain defects. The idea behind this approach is to save time during the development process by helping find bugs early. Defect Prediction models are based on historical data. Specifically, one can use data collected from past software distributions, or V...
Article
Full-text available
Background Drug–drug interactions (DDIs) are preventable causes of medical injuries and often result in doctor and emergency room visits. Previous research demonstrates the effectiveness of using matrix completion approaches based on known drug interactions to predict unknown Drug–drug interactions. However, in the case of a new drug, where there i...
Article
Full-text available
The algorithm selection problem is defined as identifying the best-performing machine learning (ML) algorithm for a given combination of dataset, task, and evaluation measure. The human expertise required to evaluate the increasing number of ML algorithms available has resulted in the need to automate the algorithm selection task. Various approache...
Preprint
Full-text available
In this paper, we propose an innovative Transfer learning for Time series classification method. Instead of using an existing dataset from the UCR archive as the source dataset, we generated a 15,000,000 synthetic univariate time series dataset that was created using our unique synthetic time series generator algorithm which can generate data with...
Article
Basketball is one of the most popular types of sports in the world. Recent technological developments have made it possible to collect large amounts of data on the game, analyze it, and discover new insights. We propose a novel approach for modeling basketball games using deep reinforcement learning. By analyzing multiple aspects of both the player...
Chapter
Full-text available
In this paper, we propose an innovative Transfer learning for Time series classification method. Instead of using an existing dataset from the UCR archive as the source dataset, we generated a 15,000,000 synthetic univariate time series dataset that was created using our unique synthetic time series generator algorithm which can generate data with...
Preprint
Full-text available
Discovering the existence of universal adversarial perturbations had large theoretical and practical impacts on the field of adversarial learning. In the text domain, most universal studies focused on adversarial prefixes which are added to all texts. However, unlike the vision domain, adding the same perturbation to different inputs results in not...
Article
Full-text available
In recent years, due to the complementary action of drug combinations over mono-therapy, the multiple-drugs for multiple-targets paradigm has received increased attention to treat bacterial infections and complex diseases. Although new drug combinations screening has benefited from experimental tests like automated high throughput screening, it is...
Article
Discovering the existence of universal adversarial perturbations had large theoretical and practical impacts on the field of adversarial learning. In the text domain, most universal studies focused on adversarial prefixes which are added to all texts. However, unlike the vision domain, adding the same perturbation to different inputs results in not...
Article
Predicting application usage is useful for offering personalized services, improving mobile energy consumption, and mobile system resource management optimization. Currently, however, there are many possible applications, and each user has his/her own preferences and usage patterns, which makes the application prediction task very challenging. In t...
Article
Full-text available
Driving under the influence of alcohol is a widespread phenomenon in the US where it is considered a major cause of fatal accidents. In this research, we present Virtual Breathalyzer, a novel approach for detecting intoxication from the measurements obtained by the sensors of smartphones and wrist-worn devices. We formalize the problem of intoxicat...
Article
Adversarial examples have proven to be a concerning threat to deep learning models, particularly in the image domain. While many studies have examined adversarial examples in the real world, most of them relied on 2D photos of the attack scene. As a result, the attacks proposed may have limited effectiveness when implemented in realistic environmen...
Article
Assessing the information security awareness (ISA) of users is crucial for protecting systems and organizations from social engineering attacks. Current methods do not consider the context of use when assessing users’ ISA, and therefore they cannot accurately reflect users’ actual behavior, which often depends on that context. In this study, we pro...
Article
Background Chronic lymphocytic leukemia (CLL) is one of the most common types of leukemia in the western world which affects mainly the elderly population. Progress of the disease is very heterogeneous both in terms of necessity of treatment and life expectancy. The current scoring system for prognostic evaluation of patients with CLL is called CLL...
Article
One of the most common types of crime, burglary often results in serious psychological trauma and has financial consequences. Predicting burglaries is a challenging task due to their high degree of randomness. In this study, we propose predicting burglaries based on various contextual factors and incorporating these factors in a unique deep learnin...
Preprint
Full-text available
This paper presents Deepchecks, a Python library for comprehensively validating machine learning models and data. Our goal is to provide an easy-to-use library comprising of many checks related to various types of issues, such as model predictive performance, data integrity, data distribution mismatches, and more. The package is distributed under t...
Article
Full-text available
Reconstructing the circuit model presents a challenge for circuits with unknown functional specifications. The circuit is thought of as a black box that, given an input, produces an output. The model of the circuit, on the other hand, is unknown. Given a set of inputs and their corresponding outputs, the goal is then to recover the circuit specific...
Article
Improving the robustness of neural nets in regression tasks is key to their application in multiple domains. Deep learning-based approaches aim to achieve this goal either by improving their prediction of specific values (i.e., point prediction), or by producing prediction intervals (PIs) that quantify uncertainty. We present IPIV, a deep neural ne...
Chapter
Full-text available
Although imperfect, recent advances in artificial intelligence—which are mostly based on well-known principles of probability and linear algebra—help programmers build software that was previously impossible or very hard to construct. However, such software is susceptible to various classes of attacks, can leak sensitive information, succumb to bia...
Article
Full-text available
Highlights Novel artificial nose based upon electrode-deposited carbon dots (C-dots). Significant selectivity and sensitivity determined by “polarity matching” between the C-dots and gas molecules. The C-dot artificial nose facilitates, for the first time, real-time, continuous monitoring of bacterial proliferation and discrimination among bacteria...
Article
Motivation Teratogenic drugs can cause severe fetal malformation and therefore have critical impact on the health of the fetus, yet the teratogenic risks are unknown for most approved drugs. This paper proposes an explainable machine learning model for classifying pregnancy drug safety based on multimodal data and suggests an orthogonal ensemble fo...
Article
Background: Patients with chronic lymphocytic leukemia (CLL) are known to have a suboptimal immune response of both humoral and cellular arms. Recently, a BNT162b2 mRNA COVID-19 vaccine was introduced with a high efficacy of 95% in immunocompetent individuals. Approximately half of the patients with CLL fail to mount a humoral response to the vacci...
Preprint
Full-text available
Anomaly detection is a well-known task that involves the identification of abnormal events that occur relatively infrequently. Methods for improving anomaly detection performance have been widely studied. However, no studies utilizing test-time augmentation (TTA) for anomaly detection in tabular data have been performed. TTA involves aggregating th...
Article
A context-aware recommender system (CARS) utilizes users’ context to provide personalized services. Contextual information can be derived from sensors in order to improve the accuracy of the recommendations. In this work, we focus on CARSs with high-dimensional contextual information that typically impacts the recommendation model, for example, by...
Article
Full-text available
Most work in heuristic search focused on path finding problems in which the cost of a path in the state space is the sum of its edges' weights. This paper addresses a different class of path finding problems in which the cost of a path is the product of its weights. We present reductions from different classes of multiplicative path finding problem...
Article
The widespread adoption of machine learning (ML) techniques and the extensive expertise required to apply them have led to increased interest in automated ML solutions that reduce the need for human intervention. One of the main challenges in applying ML to previously unseen problems is algorithm selection - the identification of high-performing al...
Article
Deep learning algorithms for anomaly detection, such as autoencoders, point out the outliers, saving experts the time-consuming task of examining normal cases in order to find anomalies. Most outlier detection algorithms output a score for each instance in the database. The top-k most intense outliers are returned to the user for further inspection...
Article
Full-text available
Patients with chronic lymphocytic leukemia (CLL) have a suboptimal humoral response to vaccination. Recently, a BNT162b2 mRNA COVID-19 vaccine was introduced with a high efficacy of 95% in immunocompetent individuals. We investigated the safety and efficacy of BNT162b2 mRNA Covid-19 vaccine in patients with CLL from nine medical centers in Israel,...
Article
Full-text available
This work presents a novel method for applying test-time augmentation (TTA) to tabular data. We used TTA along with an ensemble of 42 models to achieve higher performance on the MIT Global Open Source Severity of Illness Score dataset consisting of 131,051 ICU visits and outcomes. This method achieved an AUC of 0.915 on the private test set (19,669...
Article
The data clustering problem can be described as the task of organizing data into groups, where in each group the objects share some similar attributes. Most of the problems clustering algorithms address do not have a prior solution. This paper addresses the algorithm selection challenge for data clustering, while taking the difficulty in evaluating...
Article
Deep neural nets (DNNs) mostly tend to outperform other machine learning (ML) approaches when the training data is abundant, high-dimensional, sparse, or consisting of raw data (e.g., pixels). For datasets with other characteristics – for example, dense tabular numerical data – algorithms such as Gradient Boosting Machines and Random Forest often a...
Article
In recent years, machine learning algorithms, and more specifically deep learning algorithms, have been widely used in many fields, including cyber security. However, machine learning systems are vulnerable to adversarial attacks, and this limits the application of machine learning, especially in non-stationary, adversarial environments, such as th...
Article
The ability to predict human mobility, i.e., transitions between a user's significant locations (the home, workplace, etc.) can be helpful in a wide range of applications, including targeted advertising, personalized mobile services, and transportation planning. Most studies on human mobility prediction have focused on the algorithmic perspective r...

Network

Cited By