• Home
  • Md. Rezaul Karim
Md. Rezaul Karim

Md. Rezaul Karim
Fraunhofer FIT · Data Science & AI

Ph.D.

About

50
Publications
37,402
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
723
Citations
Introduction
I'm a Data Scientist at Fraunhofer FIT, Germany. I'm adept and experienced at: i) analyzing large-scale datasets, ii) developing accurate predictive models, iii) applying advanced analytical methods to deliver actionable insights, iv) implementing data-driven solutions to complex business and research problems and making them interpretable. I'm also a PhD candidate at RWTH Aachen University. My PhD is about improving algorithmic transparency and explainability of black-box machine learning model
Additional affiliations
July 2017 - November 2020
Fraunhofer Institute for Applied Information Technology FIT
Position
  • Researcher
September 2015 - December 2015
National University of Ireland, Galway
Position
  • Research Assistant
June 2015 - July 2017
National University of Ireland, Galway
Position
  • Researcher
Description
  • Biomedical Data Analytics
Education
July 2017 - September 2020
RWTH Aachen University
Field of study
  • Artificial Intelligence
June 2015 - June 2017
National University of Ireland, Galway
Field of study
  • Data Analytics
September 2010 - August 2012
Kyung Hee University
Field of study
  • Data Mining & Knowledge Discovery

Publications

Publications (50)
Conference Paper
Full-text available
Problem of finding frequent patterns has long been studied because it is very essential to data mining tasks such as association rule analysis, clustering, and classification analysis. Privacy preserving data mining is another important issue for this domain since most users do not want their private information to leak out. In this paper, we propo...
Article
Full-text available
Mining combined association rules with correlation and market basket analysis can discover customers buying purchase rules along with frequently correlated, associated-correlated, and independent patterns synchronously which are extraordinarily useful for making everydays business decisions. However, due to the main memory bottleneck in single comp...
Article
Full-text available
Current DNA sequence datasets have become extremely large, making it a great challenge for single-processor and main-memory-based computing systems to mine interesting patterns. Such limited hardware resources make the performance of most Apriori-like algorithms inefficient. However, recent implementation of a MapReduce framework has overcome these...
Chapter
Market basket analysis techniques are useful for extracting customer’s purchase behaviors or rules by discovering what items they buy together using the association rules and correlation. Associated and correlated items are placed in the neighboring shelf to raise their purchasing probability in a super shop. Therefore, the mining combined associat...
Conference Paper
Full-text available
Finding interesting patterns plays an important role in several data mining applications, such as market basket analysis, medical data analysis, and others. The occurrence frequency of patterns has been regarded as an important criterion for measuring interestingness of a pattern in several applications. However, temporal regularity of patterns can...
Preprint
Full-text available
Deep neural networks (DNNs) have been shown to outperform traditional machine learning algorithms in a broad variety of application domains due to their effectiveness in modeling intricate problems and handling high-dimensional datasets. Many real-life datasets, however, are of increasingly high dimensionality, where a large number of features may...
Preprint
Full-text available
Numerous works have been proposed to employ machine learning (ML) and deep learning (DL) techniques to utilize textual data from social media for anti-social behavior analysis such as cyberbullying, fake news propagation, and hate speech mainly for highly resourced languages like English. However, despite having a lot of diversity and millions of n...
Article
Artificial intelligence (AI) systems are increasingly used in health and personalized care. However, the adoption of data-driven approaches in many clinical settings has been hampered due to their inability to perform in a reliable and safe manner to leverage accurate and trustworthy diagnoses. A critical and challenging usage scenario for AI is ai...
Article
Full-text available
Osteoarthritis (OA) is a degenerative joint disease, which significantly affects middle-aged and elderly people. Although primarily identified via hyaline cartilage change based on medical images, technical bottlenecks like noise, artifacts, and modality impose an enormous challenge on high-precision, objective, and efficient early quantification o...
Preprint
Full-text available
Exponential growths of social media and micro-blogging sites not only provide platforms for empowering freedom of expressions and individual voices, but also enables people to express anti-social behavior like online harassment, cyberbullying, and hate speech. Numerous works have been proposed to utilize these data for social and anti-social behavi...
Article
Full-text available
An accurate diagnosis and prognosis for cancer are specific to patients with particular cancer types and molecular traits, which needs to address carefully. The discovery of important biomarkers is becoming an important step toward understanding the molecular mechanisms of carcinogenesis in which genomics data and clinical outcomes need to be analy...
Article
Full-text available
Background: Sharing sensitive data across organizational boundaries is often significantly limited by legal and ethical restrictions. Regulations such as the EU General Data Protection Rules (GDPR) impose strict requirements concerning the protection of personal and privacy sensitive data. Therefore new approaches, such as the Personal Health Trai...
Article
Full-text available
The study of genetic variants(GVs) can help find correlating population groups to identify cohorts that are predisposed to common diseases and explain differences in disease susceptibility and how patients react to drugs. ML algorithms are increasingly being applied to identify interacting GVs to understand their complex phenotypic traits. In this...
Preprint
Full-text available
Exponential growths of social media and micro-blogging sites not only provide platforms for empowering freedom of expressions and individual voices but also enables people to express anti-social behaviour like online harassment, cyberbullying, and hate speech. Numerous works have been proposed to utilize these data for social and anti-social behavi...
Preprint
Full-text available
Amid the coronavirus disease(COVID-19) pandemic, humanity experiences a rapid increase in infection numbers across the world. Challenge hospitals are faced with, in the fight against the virus, is the effective screening of incoming patients. One methodology is the assessment of chest radiography(CXR) images, which usually requires expert radiologi...
Article
Full-text available
Clustering is central to many data-driven bioinformatics research and serves a powerful computational method. In particular, clustering helps at analyzing unstructured and high-dimensional data in the form of sequences, expressions, texts and images. Further, clustering is used to gain insights into biological processes in the genomics level, e.g....
Article
Full-text available
In recent years, as newer technologies have evolved around the healthcare ecosystem, more and more data have been generated. Advanced analytics could power the data collected from numerous sources, both from healthcare institutions, or generated by individuals themselves via apps and devices, and lead to innovations in treatment and diagnosis of di...
Article
Full-text available
Cancer is one of the deadliest diseases caused by abnormal behaviors of genes that control the cell division and growth. Genomics data and clinical outcomes from multiplatform and heterogeneous sources are used to make clinical decisions for the cancer patients, where both multimodality and heterogeneity impose significant challenges to bioinformat...
Preprint
Full-text available
The discovery of important biomarkers is a significant step towards understanding the molecular mechanisms of carcinogenesis; enabling accurate diagnosis for, and prognosis of, a certain cancer type. Before recommending any diagnosis, genomics data such as gene expressions(GE) and clinical outcomes need to be analyzed. However, complex nature, high...
Conference Paper
Interference between pharmacological substances can cause serious medical injuries. Correctly predicting so-called drug-drug interactions (DDI) does not only reduce these cases but can also result in a reduction of drug development cost. Presently, most drug-related knowledge is the result of clinical evaluations and post-marketing surveillance; re...
Preprint
Full-text available
Interference between pharmacological substances can cause serious medical injuries. Correctly predicting so-called drug-drug interactions (DDI) does not only reduce these cases but can also result in a reduction of drug development cost. Presently, most drug-related knowledge is the result of clinical evaluations and post-marketing surveillance; re...
Article
Full-text available
With the rapid advancements of ubiquitous information and communication technologies, a large number of trustworthy online systems and services have been deployed. However, cybersecurity threats are still mounting. An intrusion detection (ID) system can play a significant role in detecting such security threats. Thus, developing an intelligent and...
Article
Full-text available
Every day we experience unprecedented data growth from numerous sources, which contribute to big data in terms of volume, velocity, and variability. These datasets again impose great challenges to analytics framework and computational resources, making the overall analysis difficult for extracting meaningful information in a timely manner. Thus, to...
Preprint
Full-text available
The understanding of variations in genome sequences assists us in identifying people who are predisposed to common diseases, solving rare diseases, and finding the corresponding population group of the individuals from a larger population group. Although classical machine learning techniques allow researchers to identify groups (i.e. clusters) of r...
Book
Full-text available
Book Description Deep learning is a branch of machine learning algorithms based on learning multiple levels of abstraction. Neural networks, which are at the core of deep learning, are being used in predictive analytics, computer vision, natural language processing, time series forecasting, and to perform a myriad of other complex tasks. This book...
Article
Mining maximal frequent patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of patterns, help to reveal customers’ purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them...
Article
Full-text available
Background Next Generation Sequencing (NGS) is playing a key role in therapeutic decision making for the cancer prognosis and treatment. The NGS technologies are producing a massive amount of sequencing datasets. Often, these datasets are published from the isolated and different sequencing facilities. Consequently, the process of sharing and aggre...
Conference Paper
Integration of complementary functional annotations is an important step in understanding the gene-disease association and the mechanisms underlying complex diseases. The increasing amount of biomedical datasets and their availability drive the demand for parallel and distributed computing, which then imposes a need for scalable and high throughput...
Article
Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) an...
Article
Full-text available
Microarray gene expression techniques and tools have become of a substantial importance and widely used to analyze the protein-protein interaction (PPI) and gene regulation network (GRN) research in recent years since it can capture the expressions of thousands of genes in a single experiment. Such dataset poses a great challenge for finding associ...
Conference Paper
Full-text available
Market basket analysis techniques are substantially important to everyday's business decision, because of its capability of extracting customer's purchase rules by discovering what items they are buying frequently and together. But, the traditional single processor and main memory based computing is not capable of handling ever increasing large tra...
Conference Paper
Full-text available
Market basket analysis is very important to everyday’s business decision, because it seeks to find relationships between purchased items. Undoubtedly, these techniques can extract customer’s purchase rules by discovering what items they are buying frequently and together. Therefore, to raise the probability of purchasing the corporate manager of a...
Conference Paper
Full-text available
contemporary web browsers do not provide customized recommendations for the users; rather than some suggestions based on cookies or browsing history after content filtering. Usually, most of the users provide some key words to search the contents inside their preferred websites and based on these key words web servers provide the contents. So, it w...
Article
Contemporary web browsers do not provide customized recommendations for the users; rather than some suggestions based on cookies or browsing history after content filtering. Usually, most of the users provide some key words to search the contents inside their preferred websites and based on these key words web servers provide the contents. So, it w...
Chapter
Full-text available
Problem of finding frequent patterns has long been studied because it is very essential to data mining tasks such as association rule analysis, clustering, and classification analysis. Privacy preserving data mining is another important issue for this domain since most users do not want their private information to leak out. In this paper, we propo...
Article
Full-text available
Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still...
Article
Full-text available
Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in fi...
Article
Market basket analysis techniques are useful for extracting customer's purchase behaviors or rules by discovering what items they buy together using the association rules and correlation. Associated and correlated items are placed in the neighboring shelf to raise their purchasing probability in a super shop. Therefore, the mining combined associat...
Conference Paper
Full-text available
Biological sequences such as DNA sequences have a great number of contiguous patterns which consist of a lot of frequent items. Recently, mining contiguous frequent patterns from DNA sequences is the most challenging tasks in bioinformatics. The aim of mining contiguous frequent sequences is to analyze the important biological function hidden in th...

Network

Cited By