Sherif Sakr

Sherif Sakr
King Saud bin Abdulaziz University for Health Sciences | KSAU-HS · College of Public Health and Health Informatics

PhD

About

241
Publications
153,411
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,110
Citations
Introduction
Prof. Sherif Sakr is currently a Professor of Computer Science at King Saud bin Abdulaziz University for Health Sciences. He is also affiliated with The School of Computer Science and Engineering (CSE) at University of New South Wales (UNSW Australia) and Data61/CSIRO (formerly NICTA). He received his PhD degree in Computer and Information Science from Konstanz University, Germany in 2007.
Additional affiliations
January 2013 - December 2013
Alcatel Lucent
Position
  • Researcher
August 2012 - present
UNSW Sydney
Position
  • Conjoint Senior Lecturer
July 2012 - present
National ICT Australia Ltd
Position
  • Senior Researcher
Education
March 2003 - July 2007
Universität Konstanz
Field of study
  • Computer Science
July 2000 - January 2003
Cairo University
Field of study
  • Computer Science
July 1996 - July 2000
Cairo University
Field of study
  • Computer Science

Publications

Publications (241)
Preprint
Full-text available
Nowadays, machine learning is playing a crucial role in harnessing the power of the massive amounts of data that we are currently producing every day in our digital world. With the booming demand for machine learning applications, it has been recognized that the number of knowledgeable data scientists can not scale with the growing data volumes and...
Article
Full-text available
Governments pay agencies to control the activities of farmers who receive governmental support. Field visits are costly and highly time-consuming; hence remote sensing is widely used for monitoring farmers’ activities. Nowadays, a vast amount of available data from the Sentinel mission significantly boosted research in agriculture. Estonia is among...
Article
Full-text available
Deep Learning (DL) has achieved remarkable progress over the last decade on various tasks such as image recognition, speech recognition, and natural language processing. In general, three main crucial aspects fueled this progress: the increasing availability of large amount of digitized data, the increasing availability of affordable parallel and p...
Article
Full-text available
Ensuring the success of big graph processing for the next decade and beyond.
Article
Full-text available
Machine learning models assume that data is drawn from a stationary distribution. However, in practice, challenges are imposed on models that need to make sense of fast-evolving data streams, where the content of data is changing and evolving over time. This change between the distributions of training data seen so-far and the distribution of newly...
Conference Paper
Full-text available
With the booming demand for machine learning (ML) applications, it is recognized that the number of knowledgeable data scientists cannot scale with the growing data volumes and application needs in our digital world. Therefore, several automated machine learning (AutoML) frameworks have been developed to fill the gap of human expertise by automatin...
Preprint
Full-text available
Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the...
Article
Full-text available
Although complex machine learning models (eg, random forest, neural networks) are commonly outperforming the traditional and simple interpretable models (eg, linear regression, decision tree), in the healthcare domain, clinicians find it hard to understand and trust these complex models due to the lack of intuition and explanation of their predicti...
Article
Full-text available
Nowadays, modern Big Stream Processing Solutions (e.g. Spark, Flink) are working towards being the ultimate framework for streaming analytics. In order to achieve this goal, they started to offer extensions of SQL that incorporate stream-oriented primitives such as windowing and Complex Event Processing (CEP). The former enables stateful computatio...
Poster
Full-text available
Linked Data reveals the need for big semantic data processing. The underlying literature already discusses numerous attempts at leveraging the relational engines of Big Data frameworks like Apache Spark to run SPARQL queries at scale. However, the choice of a relational schema to store RDF data may significantly impact the query performance and hen...
Chapter
Nowadays, machine learning techniques and algorithms are employed in almost every application domain (e.g., financial applications, advertising, recommendation systems, user behavior analytics). In practice, they are playing a crucial role in harnessing the power of massive amounts of data which we are currently producing every day in our digital w...
Chapter
Full-text available
Alongside with the ongoing initiative of FAIR data management, the problem of handling Streaming Linked Data (SLD) is relevant as never before. The Web is changing to tame Data Velocity and fulfill the needs of a new generation of Web applications. New protocols (e.g. WebSockets and Server-Sent Events) emerge to grant continuous and reactive data a...
Conference Paper
Process mining is no longer limited to the one-off analysis of static event logs extracted from a single enterprise system. Rather, process mining may strive for immediate insights based on streams of events that are continuously generated by diverse information systems. This requires online algorithms that, instead of keeping the whole history of...
Chapter
Graphs are recognized as a general, natural, and flexible data-abstraction that can model complex relationships, interactions, and interdependencies between objects. Graphs have been widely used to represent datasets and encode problems across an already extensive range of application domains. The ever-increasing size of graph-structured data for t...
Chapter
Big data analytics is currently representing a revolution that cannot be missed. It is significantly transforming and changing various aspects in our modern life including the way we live, socialize, think, work, do business, conduct research, and govern society. In this chapter, we provide an outlook for various applications to exploit big data te...
Chapter
With the wide availability of data and increasing capacity of computing resources, machine learning and deep learning techniques have become very popular techniques on harnessing the power of data by achieving powerful analytical features. This chapter focuses on discussing several systems that have been developed to support computationally expensi...
Chapter
In general, the discovery process often employs analytics techniques from a variety of genres such as time-series analysis, text analytics, statistics, and machine learning. Moreover, the process might involve the analysis of structured data from conventional transactional sources, in conjunction with the analysis of multi-structured data from othe...
Chapter
In every second of every day, we are generating massive amounts of data. In general, stream computing is a new paradigm which has been necessitated by new data-generating scenarios, such as the ubiquity of mobile devices, location services, and sensor pervasiveness. In general, stream processing engines enable a large class of applications in which...
Chapter
In practice, it has been acknowledged that Hadoop framework is not an adequate choice for supporting interactive queries which aim of achieving a response time of milliseconds or few seconds. In addition, many programmers may be unfamiliar with the Hadoop framework and they would prefer to use SQL as a high-level declarative language to implement t...
Article
Objective To study the association between cardiorespiratory fitness (CRF) and incident stroke types. Patients and Methods We studied a retrospective cohort of patients referred for treadmill stress testing in the Henry Ford Health System (Henry Ford ExercIse Testing Project) without history of stroke. CRF was expressed by metabolic equivalents of...
Conference Paper
Full-text available
Recently, a wide range of Web applications (e.g. DBPedia, Uniprot, and Probase) are built on top of vast RDF knowledge bases and using the SPARQL query language. The continuous growth of these knowledge bases led to the investigation of new paradigms and technologies for storing, accessing, and querying RDF data. In practice, modern big data system...
Chapter
Full-text available
Web Stream Processing (WSP) is a field that studies how to identify, access, represent and process flows of data using Web technologies. One of the barriers that currently limits the adoption of WSP is the paradigm shift from Web data at-rest to Web data in-motion. This barrier is especially high when teaching undergraduate students. To quantify th...
Article
The abundance of interconnected data has fueled the design and implementation of graph generators reproducing real-world linking properties or gauging the effectiveness of graph algorithms, techniques, and applications manipulating these data. We consider graph generation across multiple subfields, such as Semantic Web, graph databases, social netw...
Article
Elasticity is one of the most important characteristics of cloud computing paradigm which enables deployed application to dynamically adapt to a changing demand by acquiring and releasing shared computational resources at runtime. Thus, elasticity is a key enabler for economies of scale in the cloud that enhances utility of cloud services. In pract...
Preprint
Full-text available
The abundance of interconnected data has fueled the design and implementation of graph generators reproducing real-world linking properties, or gauging the effectiveness of graph algorithms, techniques and applications manipulating these data. We consider graph generation across multiple subfields, such as Semantic Web, graph databases, social netw...
Article
Full-text available
Recently, Big Data systems have been gaining increasing popularity on handling the massive amounts of data that are continuously generated in our digital world. While the Hadoop framework has pioneered the area of Big Data processing systems, it had clear performance limitations on providing the best performance of processing massive amounts of str...
Article
With the enormous growth on the availability and usage of Big Data storage and processing systems, it has become essential to assess the various performance aspects of these systems so that we can carefully understand their strong and weak aspects. In practice, currently, when an individual/enterprise aims to develop a Big Data storage and processi...
Conference Paper
Full-text available
Data Velocity reached the Web. New protocols and APIs (e.g. WebSockets, and EventSource) are emerging, and the Web of Data is also evolving to tame Velocity without neglecting Variety. The RDF Stream Processing (RSP) community is actively addressing these challenges by proposing continuous query languages and working prototypes. Nevertheless, the p...
Conference Paper
Full-text available
Recently, a wide range of Web applications (e.g. DBPedia, Uniprot, and Probase) are built on top of vast RDF knowledge bases and using the SPARQL query language. The continuous growth of these knowledge bases led to the investigation of new paradigms and technologies for storing, accessing, and querying RDF data. In practice, modern big data system...
Chapter
Full-text available
Despite outperforming humans in different supervised learning tasks, complex machine learning models are criticised for their opacity which make them hard to trust especially when used in critical domains (e.g., healthcare, self-driving car). Understanding the reasons behind the decision of a machine learning model provides insights into the model...
Article
Objective: The In-hospital length of stay (LOS) is expected to increase as cardiovascular diseases complexity increases and the population ages. This will affect healthcare systems especially with the current situation of decreased bed capacity and increasing costs. Therefore, accurately predicting LOS would have a positive impact on healthcare me...
Article
Full-text available
Background: Although complex machine learning models are commonly outperforming the traditional simple interpretable models, clinicians find it hard to understand and trust these complex models due to the lack of intuition and explanation of their predictions. The aim of this study to demonstrate the utility of various model-agnostic explanation t...
Conference Paper
Full-text available
In the Big Data context, data streaming systems have been introduced to tame velocity and enable reactive decision making. However, approaching such systems is still too complex due to the paradigm shift they require, i.e., moving from scalable batch processing to continuous analysis and detection. Initially, modern big stream processing systems (e...
Preprint
Full-text available
With the continuous and vast increase in the amount of data in our digital world, it has been acknowledged that the number of knowledgeable data scientists can not scale to address these challenges. Thus, there was a crucial need for automating the process of building good machine learning models. In the last few years, several techniques and frame...
Chapter
Nowadays, modern Big Stream Processing Solutions (e.g. Spark, Flink) are working towards ultimate frameworks for streaming analytics. In order to achieve this goal, they started to offer extensions of SQL that incorporate stream-oriented primitives such as windowing and Complex Event Processing (CEP). The former enables stateful computation on infi...
Article
Cardiorespiratory fitness (CRF) is inversely associated with atherosclerotic cardiovascular disease (ASCVD) risk. It is unclear whether the prognostic value of CRF differs by baseline estimated ASCVD risk. We studied a retrospective cohort of patients without known heart failure or myocardial infarction (MI) who underwent treadmill stress testing....
Conference Paper
Full-text available
Due to the increasing success of machine learning techniques , nowadays, thay have been widely utilized in almost every domain such as financial applications, marketing, recommender systems and user behavior analytics, just to name a few. In practice, the machine learning model creation process is a highly iterative exploratory process. In particul...
Conference Paper
Full-text available
We are witnessing a continuous growth in the size of scientific communities and the number of scientific publications. This phenomenon requires a continuous effort for ensuring the quality of publications and a healthy scientific evaluation process. Peer reviewing is the de facto mechanism to assess the quality of scientific work. For journal edito...
Conference Paper
Full-text available
Event-time based stream processing is concerned with analyzing data with respect to its generation time. In most of the cases, data gets delayed during its journey from the source(s) to the stream processing engine. This is known as late data arrival. Among the different approaches for out-of-order stream processing, low watermarks are proposed to...
Article
Stream processing can generate insights from big data in real time as it is being produced. This paper reports findings from a 2017 seminar on big stream processing, focusing on applications, systems, and languages.
Article
Even though several big data processing and analytics systems have been introduced with various design architectures, we are still lacking a deeper understanding of the performance characteristics for the various design architectures in addition to lacking comprehensive benchmarks for the various Big Data platforms. There is a crucial need to condu...
Article
This paper is a survey of recent stream processing languages, which are programming languages for writing applications that analyze data streams. Data streams, or continuous data flows, have been around for decades. But with the advent of the big-data era, the size of data streams has increased dramatically. Analyzing big data streams yields immens...
Article
Full-text available
Business processes represent a cornerstone to the operation of any enterprise. They are the operational means for such organizations to fulfill their goals. Nowadays, enterprises are able to gather massive amounts of event data. These are generated as business processes are executed and stored in transaction logs, databases, email correspondences,...
Article
Full-text available
The Resource Description Framework (RDF) represents a main ingredient and data representation format for Linked Data and the Semantic Web. It supports a generic graph-based data model and data representation format for describing things, including their relationships with other things. As the size of RDF datasets is growing fast, RDF data managemen...
Article
Full-text available
Recently, we have been witnessing huge advancements in the scale of data we routinely generate and collect in pretty much everything we do, as well as our ability to exploit modern technologies to process, analyze and understand this data. The intersection of these trends is what is called, nowadays, as Big Data Science. Cloud computing represents...
Article
Full-text available
This study evaluates and compares the performance of different machine learning techniques on predicting the individuals at risk of developing hypertension, and who are likely to benefit most from interventions, using the cardiorespiratory fitness data. The dataset of this study contains information of 23,095 patients who underwent clinician- refer...
Article
Full-text available
Background Exercise capacity is associated with survival in the general population. Whether this applies to patients with treated depression is not clear. Hypothesis High exercise capacity remains associated with lower risk of all‐cause mortality (ACM) and nonfatal myocardial infraction (MI) among patients with treated depression. Methods We incl...
Chapter
Linked Data refers to links between data sources, as well as the practice of connecting data on theWeb. In contrast to the isolated data silos of the conventional Web, the SemanticWeb aims to interconnect these data so that all datasets contribute to a global data integration, connecting data from diverse domains and sources. In practice, all conce...
Chapter
Full-text available
Large RDF interconnected datasets, especially in the form of open as well as enterprise knowledge graphs, are constructed and consumed in several domains. Reasoning over such large knowledge graphs poses several performance challenges. In practice, although there has been some prior work on scalable approaches to RDF reasoning, the interest in this...
Chapter
Full-text available
With increasing sizes of RDF datasets, executing complex queries on a single node has turned to be impractical especially when the node’s main memory is dwarfed by the volume of the dataset. Therefore, there was a crucial need for distributed systems with a high degree of parallelism that can satisfy the performance demands of complex SPARQL querie...
Chapter
Standards and benchmarking have traditionally been used as the main tools to formally define and provably illustrate the level of the adequacy of systems to address the new challenges. In this chapter, we discuss benchmarks for RDF query engines and instance matching systems. In practice, benchmarks are used to inform users of the strengths and wea...
Chapter
We are witnessing a paradigm shift, where real-time, time-dependent data is becoming ubiquitous. As Linked Data facilitates the data integration process among heterogenous data sources, RDF Stream Data has the same goal with respect to data streams. It bridges the gap between stream and more static data sources. To support the processing on RDF str...
Chapter
Full-text available
The wide adoption of the RDF data model has called for efficient and scalable RDF query processing schemes. As a response to this call, a number of centralized RDF query processing systems have been designed to tackle this challenge. In these systems, the storage and query processing of RDF datasets are managed on a single node. In this chapter, we...
Chapter
Congratulations! We have covered the technical details of storing, querying, reasoning, and provenance management of Linked Data in the previous eight chapters that we have just walked over.
Chapter
The term Provenance refers to the origin of information and is used to describe where and how the data was obtained. Provenance is versatile and could include various types of information, such as the source of the data, information on the processes that led to a certain result, date of creation or last modification, and authorship. Recording and m...
Article
Full-text available
Purpose of review: Cardiovascular diseases account for nearly one third of all deaths globally. Improving exercise capacity and cardiorespiratory fitness (CRF) has been an important target to reduce cardiovascular events. In addition, the American Heart Association defined decreased physical activity as the fourth risk factor for coronary artery d...
Article
Full-text available
Existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use post- processing deduplication to avoid the negative impact on I/O performance. However, neither of them works well in the cloud servers running multiple services for the following two reasons: Firstly, the temporal locality of duplic...
Book
This book describes efficient and effective techniques for harnessing the power of Linked Data by tackling the various aspects of managing its growing volume: storing, querying, reasoning, provenance management and benchmarking. To this end, Chapter 1 introduces the main concepts of the Semantic Web and Linked Data and provides a roadmap for the b...
Article
Full-text available
Background Prior studies have demonstrated that cardiorespiratory fitness (CRF) is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of this study is to present an evaluation and comparison of how mach...
Article
Full-text available
Since the boom in new proposals on techniques for efficient querying of XML data is now over and the research world has shifted its attention toward new types of data formats, we believe that it is crucial to review what has been done in the area to help users choose an appropriate strategy and scientists exploit the contributions in new areas of d...
Article
Previous studies have demonstrated that cardiorespiratory fitness is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of the analysis is to compare the prediction of 10 years of all-cause mortality (A...
Article
Full-text available
Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiores...
Data
Description of the final dataset. (DOCX)
Article
Full-text available
Clinical Research. Presentation type: Oral Presentation. Introduction: Frailty is a state of vulnerability and decreased physiological response to stressors. Saudi vision 2030 stated a goal to increase Saudi life expectancy by 5 years. As the population ages, the prevalence of frailty is expected to increase. Thus, identifying tools and resources t...