Hasan Kurban

Hasan Kurban
Verified
Hasan verified their affiliation via an institutional email.
Verified
Hasan verified their affiliation via an institutional email.
  • Ph.D. in Computer Science
  • Assistant Professor at Hamad bin Khalifa University

About

52
Publications
10,197
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
349
Citations
Current institution
Hamad bin Khalifa University
Current position
  • Assistant Professor
Additional affiliations
August 2017 - August 2018
Indiana University Bloomington
Position
  • Professor (Assistant)
Education
August 2012 - September 2017
Indiana University Bloomington
Field of study
  • Computer Science

Publications

Publications (52)
Article
Machine learning (ML) has been recently used to make sense of large volume of data as data-driven methods to identify correlations and then examine material properties in detail. Herein, we analyze the correlations between structural and electronic properties of ZnO nanoparticles (NPs) obtained from density-functional tight-binding method using Dat...
Article
This interdisciplinary study is conducted to find answers to two important questions which researchers often face in Machine Learning (ML) and Material Science (MS) fields . In this work, we measure the performance of the most popular ML algorithms (more than a dozen) on rare-class learning problem and determine the best learning algorithm for atom...
Article
Full-text available
Existing data mining techniques, more particularly iterative learning algorithms, become overwhelmed with big data. While parallelism is an obvious and, usually, necessary strategy, we observe that both (1) continually revisiting data and (2) visiting all data are two of the most prominent problems especially for iterative, unsupervised algorithms...
Conference Paper
Full-text available
Using a large volume of bus data in the form of GPS coordinates (over 100 million data points) and automated passenger count data (over 1 million data points) we have developed (1) a system of analysis and prediction of future public transportation demand (2) a new model that uses concepts specific to college campuses that maximizes passenger satis...
Article
Full-text available
Bazı hastalık belirtilerinin birçok tıbbi tedavi alanıyla ilgili olması, hastaların tedavi için randevu alırken zorlanmalarına sebep olmaktadır. Örneğin; karın ağrısı rahatsızlığı bulunan bir hastanın rahatsızlığı dahiliye, hariciye ya da intaniye bölümlerinden herhangi birisiyle ilgisi bulunabilmektedir. Bu çalışmada T.C. Sağlık Bakanlığına bağlı...
Preprint
Full-text available
Large Language Models (LLMs) are increasingly used in various contexts, yet remain prone to generating non-factual content, commonly referred to as "hallucinations". The literature categorizes hallucinations into several types, including entity-level, relation-level, and sentence-level hallucinations. However, existing hallucination datasets often...
Preprint
Full-text available
Large language models (LLMs) are increasingly deployed across diverse domains, yet they are prone to generating factually incorrect outputs - commonly known as "hallucinations." Among existing mitigation strategies, uncertainty-based methods are particularly attractive due to their ease of implementation, independence from external data, and compat...
Preprint
Full-text available
Despite the state-of-the-art performance of Large Language Models (LLMs), these models often suffer from hallucinations, which can undermine their performance in critical applications. In this work, we propose SAFE, a novel method for detecting and mitigating hallucinations by leveraging Sparse Autoencoders (SAEs). While hallucination detection tec...
Article
Full-text available
This paper introduces \(p\)-ClustVal, a novel data transformation technique inspired by \(p\)-adic number theory that significantly enhances cluster discernibility in genomics data, specifically single-cell RNA sequencing (scRNASeq). By leveraging \(p\)-adic-valuation, \(p\)-ClustVal integrates with and augments widely used clustering algorithms an...
Article
Full-text available
Pediatric diabetes I is an endemic and an especially difficult disease; indeed, at this point, there does not exist a cure, but only careful management that relies on anticipating hypoglycemia. The changing physiology of children producing unique blood glucose signatures, coupled with inconsistent activities, e.g., playing, eating, napping, makes “...
Article
Full-text available
Simulating complex and large materials is a challenging task that requires extensive domain knowledge and computational expertise. This study introduces Pure2DopeNet, an innovative multimodal neural network that tackles these challenges by integrating image and text data to accurately predict the physical properties of doped compounds, specifically...
Preprint
Full-text available
This paper introduces p-ClustVal, a novel data transformation technique inspired by p-adic number theory that significantly enhances cluster discernibility in genomics data, specifically Single Cell RNA Sequencing (scRNASeq). By leveraging p-adic Valuation, p-ClustVal integrates with and augments widely used clustering algorithms and dimension redu...
Conference Paper
This paper introduces p -ClustVal, a novel data transformation technique inspired by p-adic number theory that significantly enhances cluster discernibility in genomics data, specifically Single Cell RNA Sequencing (scRNASeq). By lever-aging p-adic-valuation, p -ClustVal integrates with and augments widely used clustering algorithms and dimension r...
Article
In this study, an innovative approach is explored that combines Density Functional Tight Binding (DFTB) with Computer Vision (CV) techniques to analyze the electronic structure and enhance the photocatalytic capabilities of carbon-doped titanium oxide nanoparticles (C-doped TiO2 NPs). The findings reveal that C doping, in levels ranging from 0.1% t...
Article
Full-text available
In this study, we introduce a novel de Bruijn graph (dBG) based framework for feature engineering in biological sequential data such as proteins. This framework simplifies feature extraction by dynamically generating high-quality, interpretable features for traditional AI (TAI) algorithms. Our framework accounts for amino acid substitutions by effi...
Conference Paper
Full-text available
Data-centric AI (DCAI) is an emerging paradigm that prioritizes the quality, diversity, and representation of data over model architecture and hyperparameter tuning. DCAI emphasizes upstream data operations such as cleaning, balancing, and preprocessing, rather than solely focusing on downstream model selection and optimization. This work aims to p...
Article
This paper addresses the challenge of optimizing user grouping in Index Modulation-based Orthogonal Frequency-Division Multiple Access (IM-OFDMA) systems within dynamic and stochastic noise environments. Utilizing the eXtreme Gradient Boosting (XGBoost) machine learning algorithm, we devised a framework capable of accurately predicting the optimali...
Conference Paper
Full-text available
Fantasy Sports has a current market size of $27B and is expected to grow more than $84B in less than a decade. The intent is to create virtual teams that somehow reflect what would happen if the constituent players actually played in a team. Using individual player and team statistics, models can be trained to predict an outcome. But fans are left...
Conference Paper
Full-text available
Sports awards have become almost as popular as the sports themselves bringing not only recognition, but also increases in salary, more control over decisions usually in the hands of coaches and general managers, and other benefits. Awards are so popular that even at the start of a season pundits and amateurs alike predict or argue for athletes. It...
Conference Paper
Full-text available
The Least Squares method is the oldest ML algorithm but remains the most popular and ubiquitous across domains. Now, with powerful, cheap technology and the popular open-source language Python, users of every sort have access to various libraries, all serving ordinary least squares. In this work, we ask the question of whether users can count on th...
Article
Full-text available
Impedance spectroscopy is a powerful technique and broadly used for battery characterization. In this study, we introduce a novel machine framework we call the duplex (for paired outputs) that constructs a linear ensemble of the best k models. Several impedance spectra of commercial lithium-ion battery coin cells at various states of charge and amb...
Preprint
Full-text available
Impedance spectroscopy is a powerful technique and broadly used for battery characterization. In this study, we introduce a novel methodology that devise a system to utilize the experimental impedance data by processing each one of the parameters with the most favorably designated machine learning techniques. Several impedance spectra of commercial...
Article
Full-text available
This paper describes and provides the data on the regenerated-impedance spectra that is computed from experimental results of electrochemical impedance spectroscopy measurements taken from a commercial Li-ion battery. The empirical impedance data of secondary coin type Li-ion batteries were collected in different states of charge ranging from empty...
Data
The experimental impedance data were collected in cell potentials as 3.2V, 3.4V, 3.6V, 3.8V, 4.0V, and 4.2V (corresponding state of charge values: 0%, 3%, 8%, 40%, 78%, 100%). Impedance data is generated by a Machine Learning model.
Article
Full-text available
The use of Electrochemical Impedance Spectroscopy on rechargeable Lithium-ion battery characterization is an extensively recognized non-destructive procedure for both in-situ and ex-situ analyses. In an impedance measurement for a rechargeable battery, the oscillating current with an accompanying phase angle is the response for a potential perturba...
Article
Full-text available
Predicting material properties by solving the Kohn‐Sham (KS) equation, which is the basis of modern computational approaches to electronic structures, has provided significant improvements in materials sciences. Despite its contributions, both DFT and DFTB calculations are limited by the number of electrons and atoms that translate into increasingl...
Article
Full-text available
In recent years, the introduction of single-cell RNA sequencing (scRNAseq) has enabled the analysis of a cell’s transcriptome at an unprecedented granularity and processing speed. The experimental outcome of applying this technology is a M × N matrix containing aggregated mRNA expression counts of M genes and N cell samples. From this matrix, scien...
Preprint
Full-text available
Background: In recent years, the introduction of single-cell RNA sequencing (scRNA-seq) has enabled the analysis of a cell's transcriptome at an unprecedented granularity and processing speed. The experimental outcome of applying this technology is a $M \times N$ matrix containing aggregated mRNA expression counts of $M$ genes and $N$ cell samples....
Article
Full-text available
Clustering is intractable, so techniques exist to give a best approximation. Expectation Maximization (EM), initially used to impute missing data, is among the most popular. Parameters of a fixed number of probability distributions (PDF) together with the probability of a datum belonging to each PDF are iteratively computed. EM does not scale with...
Conference Paper
Full-text available
To deal with the unimaginable continual growth of data and the focus on its use rather than its governance, the value of data has begun to deteriorate seen in lack of reproducibility, validity, provenance, etc. In this work, we aim to simply understand what is the value of data and how this basic understanding might affect existing AI algorithms, i...
Article
In this work, we perform a theoretical analysis of structural, electronic, and optical properties of pure and Mg-doped amorphous ZnO nanoparticles (a-ZnO NPs) using DFTB method. Our results show that Zn atoms are more preferential for Mg atoms than for O atoms because the number of Mg-Zn bonds is greater than that of Mg-O. The rise in the content o...
Article
In this study, we built a variety of Machine Learning (ML) systems over 23 different sizes of CH3NH3PbI3 perovskite nanoparticles (NPs) to predict the atoms in the NPs from their geometric locations. Our findings show that a specific type of ML algorithms, tree-based models which are Random Forest (RF), Extreme Gradient Boosting (XGBoost), Decision...
Article
Full-text available
Machine learning (ML) has recently made a major contribution to the fields of Material Science (MS). In this study, ML algorithms are used to learn atoms types over structural geometrical data of anatase TiO2 nanoparticles produced at different temperature levels with the density-functional tight-binding method (DFTB). Especially for this work, Ran...
Article
Structural, energetic, electronic, reactivity and stability properties of armchair (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9) and (10,10) aluminum nitride nanotubes (AlNNTs) with different diameter have been probed using density functional theory (DFT) in terms of Moreover, the chemical reactivity characteristics of AlNNTs have performed via s...
Article
Full-text available
In this work, we analyze the correlations between structural and electronic properties of anatase, brookite and rutile phases TiO2 nanoparticles (NPs) using data science techniques. For this purpose, we use the geometries of three phases TiO2 NPs under heat treatment obtained from molecular dynamics (MD) simulations in the frame of DFTB+ code. We i...
Article
Full-text available
In this work, we used the density-functional tight-binding (DFTB) and investigate ZnO nanoparticle (NP) properties, i.e., the structural and electronic properties. First, a ZnO NP with ~0.9 nm including 258 atoms was characterized from 30×30×30 supercell based on the hexagonal crystal structure of ZnO. Second, HOMO, LUMO electronic properties, band...
Article
Full-text available
We carried out a thorough examination of the structural and electronic features of undoped and Nitrogen (N)-doped ZnO nanoparticles (NPs) by the density-functional tight-binding (DFTB) method. By increasing the percent of N atoms in undoped ZnO NPs, the number of bonds (n), order parameter (R) and radial distribution function (RDF) of two-body inte...
Article
Full-text available
We perform a theoretical investigation using the density functional tight binding (DFTB) approach for the structural analysis and electronic structure of copper hydride (CuH) metallic nanoparticles (NPs) of different size (from 0.7 to 1.6 nm). By increasing the size of CuH NPs, the number of bonds, segregation phenomena and radial distribution func...
Conference Paper
Full-text available
Without question, astronomy is about Big Data and clustering is a very common task over astronomy domain. The expectation-maximization algorithm is among the top 10 data mining algorithms used in scientific and industrial applications, however, we observe that astronomical community does not make use of it as a clustering algorithm. In this work, w...
Conference Paper
Full-text available
Iterative machine learning algorithms, i.e., kmeans (KM), expectation maximization (EM), become overwhelmed with big data since all data points are being continually and indiscriminately visited while a cost is being minimized. In this work, we demonstrate (1) an optimization approach to reduce training run-time complexity of iterative machine lear...
Conference Paper
Full-text available
Stellar data, only a few years ago, measured in the .1M of objects. Now, sets are routinely 1M. With the launch of ESA’s Gaia in 2013, we expect 1000M stellar objects measured more precisely and with more measurements.Without question, astronomy is about Big Data and clustering is a very common task over astronomy domain. The expectation-maximizati...
Article
Full-text available
Existing data mining techniques, more particularly iterative learning algorithms, become overwhelmed with big data. While parallelism is an obvious and, usually, necessary strategy, we observe that both (1) continually revisiting data and (2) visiting all data are two of the most prominent problems especially for iterative, unsupervised algorithms...
Conference Paper
Full-text available
Existing data mining techniques, more particularly iterative learning algorithms, become overwhelmed with big data. While parallelism is an obvious and, usually, necessary strategy, we observe that both (1) continually revisiting data and (2) visiting all data are two of the most prominent problems especially for iterative, unsupervised algorithms...
Conference Paper
Full-text available
The challenges presented by data to scientific inquiry and hypothesis testing in an oceanographic setting are not new problems. Indeed, the challenges are at least a century old. The problems are not with the data itself, but rather with the attention to the management of the "data ecology" in the information systems. Data needs to be accessible as...
Conference Paper
Full-text available
Random Forests have been used as effective ensemble models for classification. We present in this paper a new type of Random Forests (RFs) called Red(uced)-RF that adopts a new dynamic data reduction principle and a new voting mechanism called Priority Vote Weighting (PV) which improve accuracy, execution time and AUC values compared to Breiman’s R...
Article
Full-text available
Random forests have been used as effective models to tackle a number of classification and regression problems. In this paper, we present a new type of Random Forests (RFs) called Red(uced)-RF that adopts a new voting mechanism called Priority Vote Weighting (PV) and a new dynamic data reduction principle which improve accuracy and execution time c...
Article
Full-text available
Dramatic increases in the amount and complexity of stellar data must be matched by new or refined algorithms that can help scientists make sense of this data and so better understand the universe. ParaHeap-k is a parallel cluster algorithm for analyzing big data that can potentially prove useful to astronomical research.

Network

Cited By