Amir Atiya

Amir Atiya
  • B.S. Cairo University, Ph.D. Caltech
  • Professor at Cairo University

About

226
Publications
167,710
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,871
Citations
Current institution
Cairo University
Current position
  • Professor
Additional affiliations
January 2001 - present
Texas A&M University
March 1993 - December 2009
Cairo University
September 1985 - September 1990
California Institute of Technology

Publications

Publications (226)
Article
Imbalanced data is an issue that affects various applications in machine learning and data science. Synthetic minority oversampling technique (SMOTE) is a common method used to artificially balance the data. Despite the popularity of SMOTE, there is limited information about its analytical properties. In this paper, we develop a precise theoretical...
Conference Paper
Full-text available
The usefulness of the oversampling approach to class-imbalanced structured medical datasets is discussed in this paper. In this regard, we basically look into the oversampling approach’s prevailing assumption that synthesized instances do belong to the minority class. We used an off-the-shelf over-sampling validation system to test this assumption....
Article
Full-text available
Class imbalance occurs when the class distribution is not equal. Namely, one class is under-represented (minority class), and the other class has significantly more samples in the data (majority class). The class imbalance problem is prevalent in many real world applications. Generally, the under-represented minority class is the class of interest....
Preprint
Imbalanced data is a frequently encountered problem in machine learning. Despite a vast amount of literature on sampling techniques for imbalanced data, there is a limited number of studies that address the issue of the optimal sampling ratio. In this paper, we attempt to fill the gap in the literature by conducting a large scale study of the effec...
Article
Full-text available
Word embeddings mean the mapping of words into vectors in an N-dimensional space. ArSphere: is an approach that designs word embeddings for the Arabic language. This approach overcomes one of the shortcomings of word embeddings (for English language too), namely their inability to handle opposites (and differentiate those from unrelated word pairs)...
Article
Deep neural networks (DNNs) have achieved the state of the art performance in numerous fields. However, DNNs need high computation times, and people always expect better performance in a lower computation. Therefore, we study the human somatosensory system and design a neural network (SpinalNet) to achieve higher accuracy with fewer computations. H...
Article
Full-text available
The price demand relation is a fundamental concept that models how price affects the sale of a product. It is critical to have an accurate estimate of its parameters, as it will impact the company’s revenue. The learning has to be performed very efficiently using a small window of a few test points, because of the rapid changes in price demand para...
Article
Full-text available
In this work, we propose a novel recommender system model based on a technology commonly used in natural language processing called word vector embedding. In this technology, a word is represented by a vector that is embedded in an n-dimensional space. The distance between two vectors expresses the level of similarity/dissimilarity of their underly...
Article
Full-text available
Dynamic pricing is a beneficial strategy for firms seeking to achieve high revenues. It has been widely applied to various domains such as the airline industry, the hotel industry, and e-services. Dynamic pricing is basically the problem of setting time-varying prices for a certain product or service for the purpose of optimizing revenue. However,...
Chapter
Full-text available
Hotel reviews are an important driving factor for hotel business. They can benefit guests to make informed hotel selections, and hotels to tackle their deficiencies and better their performance. In this paper, we propose an opinion mining approach that is applied to hotel reviews. The approach combines both lexical and word vectors’ methods to clas...
Article
Scaling up support vector machine (SVM) for large data sets remains one of its main challenges. One way to achieve this is to break down the problem into smaller ones using clustering techniques where local SVM models are constructed. Although this approach is considerably fast compared to the standard SVM, its performance is sometimes inferior eve...
Article
Full-text available
A variety of screening approaches have been proposed to diagnose epileptic seizures, using electroencephalography (EEG) and magnetic resonance imaging (MRI) modalities. Artificial intelligence encompasses a variety of areas, and one of its branches is deep learning (DL). Before the rise of DL, conventional machine learning algorithms involving feat...
Preprint
Understanding data and reaching valid conclusions are of paramount importance in the present era of big data. Machine learning and probability theory methods have widespread application for this purpose in different fields. One critically important yet less explored aspect is how data and model uncertainties are captured and analyzed. Proper quanti...
Preprint
Full-text available
Deep neural networks (DNNs) have achieved the state of the art performance in numerous fields. However, DNNs need high computation times, and people always expect better performance with lower computation. Therefore, we study the human somatosensory system and design a neural network (SpinalNet) to achieve higher accuracy with lower computation tim...
Preprint
Full-text available
A variety of screening approaches have been proposed to diagnose epileptic seizures, using Electroencephalography (EEG) and Magnetic Resonance Imaging (MRI) modalities. Artificial intelligence encompasses a variety of areas, and one of its branches is deep learning. Before the rise of deep learning, conventional machine learning algorithms involvin...
Article
Full-text available
Embedding words from a dictionary as vectors in a space has become an active research field, due to its many uses in several natural language processing applications. Distances between the vectors should reflect the relatedness between the corresponding words. The problem with existing word embedding methods is that they often fail to distinguish b...
Article
Full-text available
Recently, active learning is considered a promising approach for data acquisition due to the significant cost of the data labeling process in many real world applications, such as natural language processing and image processing. Most active learning methods are merely designed to enhance the learning model accuracy. However, the model accuracy may...
Article
Full-text available
Forecast combinations were big winners in the M4 competition. This note reflects on and analyzes the reasons for the success of forecast combination. We illustrate graphically how and in what cases forecast combinations produce good results. We also study the effects of forecast combination on the bias and the variance of the forecast.
Article
Support vector machine (SVM) has been recently considered as one of the most efficient classifiers. However, the time complexity of kernel SVM, which is quadratic in the number of training patterns, makes it impractical to be applied to large data sets. In such a case, the complexity is further increased when an exhaustive grid search is used to fi...
Article
The maximum drawdown (MDD) is a well-known risk measure extensively used in financial markets. It measures the maximum loss from peak to subsequent valley for a stochastic process. In this work we consider discrete time processes, and derive the probability density of the maximum drawdown in terms of integral equation recursions. This is one of the...
Data
The code for the Exponential smoothing method with maximum likelihood estimation.
Chapter
Full-text available
The growth of social media has made Arabic sentiment analysis an active research area. The challenges lie in the fact that most users write unstructured dialect texts instead of writing in Modern Standard Arabic (MSA). In this paper we address these challenges by comparing between two strategies: applying sentiment analysis algorithms directly on t...
Article
Dynamic pricing is the science of pricing a product in a time-varying way for optimising revenue. There is a slow but steady tendency over the last three decades for major businesses to move from fixed pricing to dynamic pricing. In this paper, we consider the problem of dynamic pricing for wireless broadband data. We propose a novel dynamic pricin...
Article
Full-text available
Dynamic pricing is the science of pricing a product in a time-varying way for optimising revenue. There is a slow but steady tendency over the last three decades for major businesses to move from fixed pricing to dynamic pricing. In this paper, we consider the problem of dynamic pricing for wireless broadband data. We propose a novel dynamic pricin...
Conference Paper
Full-text available
The growth of social media has made Arabic sentiment analysis an active research area. The challenges lie in the fact that most users write unstruc-tured dialect texts instead of writing in Modern Standard Arabic (MSA). In this paper we address these challenges by comparing between two strategies: applying sentiment analysis algorithms directly on...
Article
In this paper, we consider the problems of state estimation and parameter estimation. The goal is to consider Robust Unscented Kalman filter, and demonstrate their successful application on a Coupled Tank system. Traditional unscented kalman filter have a limitation to estimate the state and parameter of time-varying parameter system due to making...
Research
This issue includes the following articles; P1121606475, Author="M. AbdelDayem and H. Hemeda and A. Sarhan", Title="Enhanced User Authentication through Keystroke Biometrics for Short-Text and Long-Text Inputs" P1121602462, Author="Eslam Mahmoud and Ahmed M. Elmogy and Amany Sarhan", Title="Enhancing Grid Local Outlier Factor Algorithm for better O...
Research
Full-text available
This issue includes the following articles; P1121606475, Author="M. AbdelDayem and H. Hemeda and A. Sarhan", Title="Enhanced User Authentication through Keystroke Biometrics for Short-Text and Long-Text Inputs" P1121602462, Author="Eslam Mahmoud and Ahmed M. Elmogy and Amany Sarhan", Title="Enhancing Grid Local Outlier Factor Algorithm for better O...
Research
Full-text available
This issue includes the following articles; P1121606475, Author="M. AbdelDayem and H. Hemeda and A. Sarhan", Title="Enhanced User Authentication through Keystroke Biometrics for Short-Text and Long-Text Inputs" P1121602462, Author="Eslam Mahmoud and Ahmed M. Elmogy and Amany Sarhan", Title="Enhancing Grid Local Outlier Factor Algorithm for better O...
Article
In this paper, we consider the problems of state estimation and parameter estimation. The goal is to consider Robust Unscented Kalman filter, and demonstrate their successful application on a Coupled Tank system. Traditional unscented kalman filter have a limitation to estimate the state and parameter of time-varying parameter system due to making...
Article
Full-text available
This paper implements a new leukemia identification method which depends on Mel frequency cepstral coefficient (MFCC) feature extraction and wavelet transform. Leukemia identification is a measurement of blood cell features for detecting the blood cancer of a patient. Blood cell feature extraction is based on transforming the blood cell two dimensi...
Data
Full-text available
This issue includes the following articles; P1121606475, Author="M. AbdelDayem and H. Hemeda and A. Sarhan", Title="Enhanced User Authentication through Keystroke Biometrics for Short-Text and Long-Text Inputs" P1121602462, Author="Eslam Mahmoud and Ahmed M. Elmogy and Amany Sarhan", Title="Enhancing Grid Local Outlier Factor Algorithm for better O...
Article
Full-text available
This issue includes the following articles; P1121606475, Author="M. AbdelDayem and H. Hemeda and A. Sarhan", Title="Enhanced User Authentication through Keystroke Biometrics for Short-Text and Long-Text Inputs" P1121602462, Author="Eslam Mahmoud and Ahmed M. Elmogy and Amany Sarhan", Title="Enhancing Grid Local Outlier Factor Algorithm for better O...
Article
Text diacritic restoration is a very vital problem for languages that use diacritics in their orthography systems. Actually, it plays an important role for improving the performance of many NLP tasks. In this paper , we handle the problem of Arabic text diacritiza-tion; such that our system diacritizes input sequence of words both morphologically a...
Conference Paper
Full-text available
This paper compares between seven greedy sparse approximation algorithms with L0-norm regularization for the purpose of time series forecasting. Sparse approximation is used as a method of memory-based learning, where a dictionary is created from the time series lagged vectors, along with their corresponding targets. This dictionary is then used fo...
Conference Paper
Full-text available
In this work, we propose using the sparse coding techniques for learning models for the purpose of Time Series Forecasting. Training data are extracted from the input time series as a set of time-lagged predictors along with their correspondent targets. These time-lagged predictors are sparsely decomposed and transformed into the sparse domain. The...
Article
In this work we consider dynamic pricing for the case of continuous replenishment. An essential ingredient in such a formulation is the use of time normalized revenue or profit function, in other words revenue or profit per unit time. This provides the incentive to sell many items in the shortest time (and of course at a high price). Moreover, for...
Conference Paper
Abstract —The spread of breast cancer and its high fatality has spurred a lot of research for studying its causes and treatments. Since the discovery of gene extraction methods, many biomarkers have been investigated and related to cancer. The large number of genes and their intertwining relations necessitates advanced machine learning models, rath...
Conference Paper
Keyphrases extraction has a considerable importance in many applications such as search engine optimization, clustering, summarization, and sentiment analysis. The importance of keyphrases comes from the semantic meaning they provide as they can be used as descriptors for the documents. In this paper we compare four approaches for extracting keyphr...
Article
Multistep-ahead forecasts can either be produced recursively by iterating a one-step-ahead time series model or directly by estimating a separate model for each forecast horizon. In addition, there are other strategies; some of them combine aspects of both aforementioned concepts. In this paper, we present a comprehensive investigation into the bia...
Conference Paper
Full-text available
Telecommunications industry is a highly competitive one where operators’ strategies usually rely on significantly reducing minute rate in order to acquire more subscribers and thus have higher market share. However, in the last few years, the numbers of customers are noticeably increasing leading to more stress on the network, and higher congestion...
Article
This article introduces a new scheme to express a rectangular function as a linear combination of Gaussian functions. The main idea of this scheme is based on fitting samples of the rectangular function by adapting the well-known clustering algorithm, Gaussian mixture models (GMM). This method has several advantages compared to other existing fitti...
Conference Paper
Full-text available
In this paper a new time dependent pricing scheme is proposed for revenue management in mobile calls. The pro­ posed scheme considers many essential parameters that affect pricing such as time-of-day seasonality, weekday/weekend sea­ sonality and price demand elasticity for call arrivals and call duration. In this model, each day is partitioned int...
Conference Paper
Full-text available
In this paper we tackle the overbooking problem in hotel revenue management (RM). We propose a simulations­ based approach for the overbooking problem. It is based on accurately estimating all the hotel's processes, such as reservations arrivals, cancelations, length of stay, demand seasonality, etc. Subsequently, all these processes are simulated...
Conference Paper
The presence of missing data in time series is big impediment to the successful performance of forecasting models, as it leads to a significant reduction of useful data. In this work we propose a multiple-imputation-type framework for estimating the missing values of a time series. This framework is based on iterative and successive forward and bac...
Conference Paper
Full-text available
In this work, we address the problem of spelling correction in the Arabic language utilizing the new corpus provided by QALB (Qatar Arabic Language Bank) project which is an annotated corpus of sentences with errors and their corrections. The corpus contains edit, add before, split, merge, add after, move and other error types. We are concerned wit...
Article
In this article, we derive a series expansion of the multivariate normal probability integrals based on Fourier series. The basic idea is to transform the limits of each integral from h i to ∞ to be from -∞ to ∞ by multiplying the integrand by a periodic square wave that approximates the domain of the integral. This square wave is expressed by its...
Article
This paper derives the value of the integral of the product of the error function and the normal probability density as a series of the Hermite polynomial and the normalized incomplete Gamma function. This expression is beneficial, and can be used for evaluating the bivariate normal integral as a series expansion. This expansion is a good alternati...
Conference Paper
Full-text available
In this paper we tackle the overbooking problem in hotel revenue management (RM). We propose a simulations­ based approach for the overbooking problem. It is based on accurately estimating all the hotel's processes, such as reservations arrivals, cancelations, length of stay, demand seasonality, etc. Subsequently, all these processes are simulated...
Article
Full-text available
In this paper, we present a method of determining the parameters of a dynamic system using state estimate filter. State estimate filters such as Extended Kalman filter and the Unscented Kalman filter are widely used to estimate the status in robot and GPS navigation systems. However, in dynamic systems, determining parameters is difficult because m...
Conference Paper
Full-text available
We introduce LABR, the largest sentiment analysis dataset to-date for the Arabic lan-guage. It consists of over 63,000 book reviews, each rated on a scale of 1 to 5 stars. We investigate the properties of the the dataset, and present its statistics. We explore using the dataset for two tasks: sentiment polarity classification and rat-ing classifica...
Article
Full-text available
In this article we propose a new dynamic pricing approach for the hotel revenue management problem. The proposed approach is based on having ‘price multipliers’ that vary around ‘1’ and provide a varying discount/premium over some seasonal reference price. The price multipliers are a function of certain influencing variables (for example, hotel occ...
Article
The k-center problem arises in many applications such as facility location and data clustering. Typically, it is solved using a branch and bound tree traversed using the depth first strategy. The reason is its linear space requirement compared to the exponential space requirement of the breadth first strategy. Although the depth first strategy gain...
Article
A method for solving quasiconvex nondifferentiable unconstrained multiobjective optimization problems is proposed in this paper. This method extends to the multiobjective case of the classical subgradient method for real-valued minimization. Assuming ...
Article
Gaussian process is a very promising novel technology that has been applied to both the regression problem and the classification problem. While for the regression problem it yields simple exact solutions, this is not the case for the classification problem, because we encounter intractable integrals. In this paper we develop a new derivation that...
Article
Full-text available
In this paper, a novel algorithm is proposed for sampling from discrete probability distributions using the probability proportional to size sampling method, which is a special case of Quota sampling method. The motivation for this study is to devise an efficient sampling algorithm that can be used in stochastic optimization problems --when there i...
Chapter
Quality of Service (QoS) of telecommunication networks could be enhanced by applying predictive control methods. Such controllers rely on utilizing good and fast (real-time) predictions of the network traffic and quality parameters. Accuracy and recall speed of the traditional Neural Network models are not satisfactory to support such critical real...
Chapter
Full-text available
This chapter reviews a recent HONN-like model called Symbolic Function Network (SFN). This model is designed with the goal to impart more flexibility than both traditional and HONNs neural networks. The main idea behind this scheme is the fact that different functional forms suit different applications and that no specific architecture is best for...
Conference Paper
Full-text available
In this paper we propose a new unconstraining method for demand forecasting. Since true demand forecasting is a key aspect of hotel room revenue management systems, inaccurate forecasts will significantly impact the performance of these systems. We propose a method based on a Monte Carlo simulation forecasting model and an Expectation-Maximization...
Conference Paper
Penalized likelihood is a general approach whereby an objective function is defined, consisting of the log likelihood of the data minus some term penalizing non-smooth solutions. Subsequently, this objective function is maximized, yielding a solution that achieves some sort of trade-off between the faithfulness and the smoothness of the fit. In thi...
Article
In this paper, the states and parameters in a dynamic system are estimated by applying an Unscented Kalman Filter (UKF). The UKF is widely used in various fields such as sensor fusion, trajectory estimation, and learning of Neural Network weights. These estimations are necessary and important in determining the stability of a mobile system, monitor...
Article
Stock market forecasting has been considered an extremely challenging problem, and its predictability caused a debate that lasted for years. In this paper, we present a hidden Markov model that models the up-trend behavior of stocks. This model makes use of the fact that during up-trends the market frequently pauses and undergoes a pull-back. Altho...
Article
Multi-step ahead forecasting is still an open challenge in time series forecasting. Several approaches that deal with this complex problem have been proposed in the literature but an extensive comparison on a large number of tasks is still missing. This paper aims to fill this gap by reviewing existing strategies for multi-step ahead forecasting an...
Article
Full-text available
This paper evaluates the four leading techniques proposed in the literature for construction of prediction intervals (PIs) for neural network point forecasts. The delta, Bayesian, bootstrap, and mean-variance estimation (MVE) methods are reviewed and their performance for generating high-quality PIs is compared. PI-based measures are proposed and a...
Article
Full-text available
In this work we introduce the forecasting model with which we participated in the NN5 forecasting competition (the forecasting of 111 time series representing daily cash withdrawal amounts at ATM machines). The main idea of this model is to utilize the concept of forecast combination, which has proven to be an effective methodology in the forecasti...
Article
Forecast combination is a well-established and well-tested approach for improving the forecasting accuracy. One beneficial strategy is to use constituent forecasts that have diverse information. In this paper we consider the idea of diversity being accomplished by using different time aggregations. For example, we could create a yearly time series...
Article
Full-text available
Purpose – This paper aims to present an integrated framework for hotel revenue room maximization. The revenue management (RM) model presented in this work treats the shortcomings in existing systems. In particular, it extends existing optimization techniques for hotel revenue management to address group reservations and uses “forecasted demand” arr...
Article
Prediction intervals (PIs) have been proposed in the literature to provide more information by quantifying the level of uncertainty associated to the point forecasts. Traditional methods for construction of neural network (NN) based PIs suffer from restrictive assumptions about data distribution and massive computational loads. In this paper, we pr...
Article
Full-text available
In this work we present a large scale comparison study for the major machine learning models for time series forecasting. Specifically, we apply the models on the monthly M3 time series competition data (around a thousand time series). There have been very few, if any, large scale comparison studies for machine learning models for the regression or...
Conference Paper
Penalized likelihood is a well-known theoretically justified approach that has recently attracted attention by the machine learning society. The objective function of the Penalized likelihood consists of the log likelihood of the data minus some term penalizing non-smooth solutions. Subsequently, maximizing this objective function would lead to som...
Conference Paper
Estimating the classification error rate of a classifier is a key issue in machine learning. Such estimation is needed to compare classifiers or to tune the parameters of a parameterized classifier. Several methods have been proposed to estimate error rate, most of which rely on partitioning the data set or drawing bootstrap samples from it. Error...
Conference Paper
Time-series forecasting is an important research and application area. Much effort has been devoted over the past decades to develop and improve the time series forecasting models based on statistical and machine learning techniques. Forecast combination is a well-established and well-tested approach for improving forecasting accuracy. Many time s...
Article
This paper presents a new method for solving systems of Boolean equations. The method is based on converting the equations so that we operate in the integer domain. In the integer domain better and more efficient methodologies for solving equations are available. The conversion leads us to a system of polynomial equations obeying certain characteri...
Article
This paper presents DS an evolutionary swarm based algorithm suitable for parallel implementation and optimization of multi-objective goals. The algorithm has been implemented and tested using three standard benchmark functions; namely Rosenbrock, Griewank and Sshwefel. Next, we applied DS to solve the portfolio selection problem as a case study as...
Article
Full-text available
The nearest neighbor method is one of the most widely used pattern classification methods. However its major drawback in practice is the curse of dimensionality. In this paper, we propose a new method to alleviate this problem significantly. In this method, we attempt to cover the training patterns of each class with a number of hyperspheres. The m...
Conference Paper
Full-text available
A correct video segmentation, namely the detection of moving objects within a scene plays a very important role in many application in safety, surveillance, traffic monitoring and object detection. The main objective of this paper is to implement an effective background segmentation algorithm for corner sets extracted from video sequences. A dynami...
Article
Penalized likelihood is a general approach whereby an objective function is defined, consisting of the log likelihood of the data minus some term penalizing non-smooth solutions. Subsequently, this objective function is maximized, yielding a solution that achieves some sort of trade-off between the faithfulness and the smoothness of the fit. Most w...
Article
In this paper, we consider the problem of missing data, and develop an ensemble-network model for handling the missing data. The proposed method is based on utilizing the inherent uncertainty of the missing records in generating diverse training sets for the ensemble's networks. Specifically we generate the missing values using their probability di...
Conference Paper
We describe an application of the novel Support Vector Machined Kernel (SVM’ed Kernel) to the Recognition of hand-drawn shapes. The SVM’ed kernel function is itself a support vector machine classifier that is learned statistically from data using an automatically generated training set. We show that the new kernel manages to change the classical me...
Article
Full-text available
Trend Impact Analysis is a simple forecasting approach, yet powerful, within the Futures Studies paradigm. It utilizes experts' judgements to explicitly deal with unprecedented future events with varying degrees of severity in generating different possibilities (scenarios) of how the future might unfold. This is achieved by modifying a surprise-fre...
Conference Paper
Soft labels allow a pattern to belong to multiple classes with different degrees. In many real world applications the association of a pattern to multiple classes is more realistic; to describe overlap and uncertainties in class belongingness. The objective of this work is to develop a fuzzy Gaussian process model for classification of soft labeled...
Conference Paper
Time series forecasting is a challenging problem, that has a wide variety of application domains such as in engineering, environment, finance and others. When confronted with a time series forecasting application, typically a number of different forecasting models are tested and the best one is considered. Alternatively, instead of choosing the sin...
Article
Forecasting hotel arrivals and occupancy is an important component in hotel revenue management systems. In this article, we propose a new Monte Carlo simulation approach for the arrivals and occupancy forecasting problem. In this approach, we simulate the hotel reservations process forward in time, and these future Monte Carlo paths will yield fore...
Article
The K-nearest neighbor (KNN) rule is one of the most widely used pattern classification algorithms. For large data sets, the computational demands for classifying patterns using KNN can be prohibitive. A way to alleviate this problem is through the condensing approach. This means we remove patterns that are more of a computational burden but do not...

Network

Cited By