Ali El-Moursy

Ali El-Moursy
University of Sharjah | US · Department of Electrical and Computer Engineering

PhD

About

65
Publications
11,675
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
483
Citations
Citations since 2017
32 Research Items
241 Citations
20172018201920202021202220230102030405060
20172018201920202021202220230102030405060
20172018201920202021202220230102030405060
20172018201920202021202220230102030405060
Additional affiliations
September 2010 - present
University of Sharjah
Position
  • Professor (Assistant)
February 2007 - February 2010
IBM
Position
  • Visiting Researcher
September 2000 - August 2005
University of Rochester
Position
  • Research Assistant

Publications

Publications (65)
Article
Full-text available
In this work, we provide further evidence on the robustness of the chaotic clocking technique for driving a Correlation Power Analysis (CPA) resistant cryptographic chip running the Advanced Encryption Standard (AES). In particular, we explore the use of non-autonomous chaotic oscillators to improve the chip’s timing and power performance. We show...
Article
Full-text available
The development of Smart Home Controllers has seen rapid growth in recent years, especially for smart devices, that can utilize the Internet of Things (IoT). However, a large portion of the household devices and appliances already in use, are not IoT enabled, and therefore, requires their default control mechanisms for the devices to operate. This...
Article
Full-text available
Recognition of the modulation scheme is the intermediate step between signal detection and demodulation of the received signal in communication networks. Automatic modulation recognition (AMR) plays a central role in many applications, especially in the military and security sectors. In general, several properties of the received signal are extract...
Article
With the advent of fifth generation (5G) systems and the Internet-of-Things (IoT), the number of interconnected wireless devices is increasing significantly. Protocols that allow these deceives to interconnect peer-to-peer through wireless links are becoming of interest. The major challenge is the inevitable interference among the simultaneously ac...
Article
Full-text available
In this work, we consider amplify-and-forward two-way relay networks operating under the imperfect conditions of in-phase and quadrature-phase (I/Q) imbalance. We propose novel, efficient solutions for the problem of joint channel and I/Q imbalance estimation and the related problem of optimal pilot design. Three different estimation algorithms are...
Article
Full-text available
Abstract This paper considers the problem of joint timing‐offset and channel estimation for physical‐layer network coding systems operating in frequency‐selective environments. Three different algorithms are investigated for the joint estimation of the channel coefficients and the fractional timing offset. The first algorithm is based on the maximu...
Article
Full-text available
This paper explores the problem of energy-efficient shortest path planning on off-road, natural, real-life terrain for unmanned ground vehicles (UGVs). We present a greedy path planning algorithm based on a composite metric routing approach that combines the energy consumption and distance of the path. In our work, we consider the Terramechanics be...
Article
Full-text available
Protein-protein interaction (PPI) is an important field in bioinformatics which helps in understanding diseases and devising therapy. PPI aims at estimating the similarity of protein sequences and their common regions. STRIKE was introduced as a PPI algorithm which was able to achieve reasonable improvement over existing PPI prediction methods. Alt...
Conference Paper
Intelligent Transportation Systems (ITS) are nowadays considered very important applications of smart cities. One of the most important technologies that are utilized to support ITS is Vehicular Ad-hoc Networks (VANETs). In VANETs, vehicles communicate with each other (V2V) or with the infrastructure (Roadside Units) (V2I). Roadside Units (RSUs) co...
Preprint
Full-text available
The development of Smart Home Controllers has seen rapid growth in recent years, especially for smart devices, that can utilize the Internet of Things (IoT). However, a large portion of the household devices and appliances already in use, are not IoT enabled, and therefore, requires their default control mechanisms for the devices to operate. This...
Chapter
The fifth generation of wireless networks (5G) will kick off with evolved mobile broadband services as promised by several mobile-related associations, researchers, and operators. Compared to 4G, 5G aims to provide greater data rates with lower latency and higher coverage to numerous users who stream ubiquitous multimedia services. 5G benefits the...
Article
Full-text available
Designing a tamper-resistant microchip for small embedded systems is one of the urgent demands of the computing community nowadays due to the immense security challenges arising particularly in massively connected networks. One of the major threats to secure smart card chips is the ability of Side Channel Attacks (SCA), such as Correlation Power An...
Article
Full-text available
In cellular networks, users near the edge of the cell are usually suffering from low signal-to-noise-plus-interference-ratio (SINR) levels as a result from being far away from the base-station (BS). Many factors could lead to huge attenuation of the received signal in the cell-edge area such as path-loss and multipath fading. Increasing the BS tran...
Article
Full-text available
The cloud radio access network (C-RAN) aims at migrating the traditional base station functionality to a cloud-based centralized base band unit (BBU) pool, thereby providing a promising paradigm for fifth-generation (5G) wireless systems. This results in a novel wireless architecture in which mobile users communicate with the cloud via distributed...
Article
Modern Multi-Processor System-On-Chips (MPSOC) are widely used especially in real-time embedded systems due to their high throughput and low per unit cost. However, bounded latency is vital to guarantee fast response as well as fairness for applications running on multicore processors. In this paper, a new Priority-base Memory Controller for Embedd...
Chapter
Full-text available
The fifth generation of wireless networks (5G) will kick off with evolved mobile broadband services as promised by several mobile-related associations, researchers, and operators. Compared to 4G, 5G aims to provide greater data rates with lower latency and higher coverage to numerous users who stream ubiquitous multimedia services. 5G benefits the...
Article
Full-text available
The main goal of oil reservoir management is to provide more efficient, price-effective and environmentally more secure oil production. Oil production management includes an accurate characterization of the reservoir and strategies that involve interactions between reservoir data and human assessment. Hence, it is important to graphically visualize...
Article
Full-text available
Abstract The use of cloud computing data centers is growing rapidly to meet the tremendous increase in demand for high-performance computing (HPC), storage and networking resources for business and scientific applications. Virtual machine (VM) consolidation involves the live migration of VMs to run on fewer physical servers, and thus allowing more...
Article
Full-text available
Delays caused by congestion, faults and process variation (PV) degrade networks-on-chip (NoC) performance. A congestion aware, fault tolerant and process variation aware adaptive routing algorithm (CFPA) is introduced for congested and faulty asynchronous NoCs. The proposed routing algorithm maintains two routing tables to determine the packet path...
Article
In the modern multi‐core systems, concurrently executing applications share common resource such as main memory. Memory scheduling algorithms are developed to resolve memory contention among competing applications so that throughput is high and fairness of the overall multi‐core system is guaranteed. In this paper, we present Adaptive Time‐based Le...
Article
Full-text available
Cloud computing data centers are growing rapidly in both number and capacity to meet the increasing demands for highly-responsive computing and massive storage. Such data centers consume enormous amounts of electrical energy resulting in high operating costs and carbon dioxide emissions. The reason for this extremely high energy consumption is not...
Article
Full-text available
STRIKE is an algorithm which predicts protein-protein interactions (PPIs) and determines that proteins interact if they contain similar substrings of amino acids. Unlike other methods for PPI prediction, STRIKE is able to achieve reasonable improvement over the existing PPI prediction methods. Although its high accuracy as a PPI prediction method,...
Conference Paper
Full-text available
Low leakage power with maintained high throughput NoC is achieved. Traffic-based Virtual channel Activation (TVA) algorithm is presented to determine traffic load status at the NoC switch ports. Consequently adaptation signals are sent to activate or deactivate switch port VC groups. The algorithm is optimized to minimize power dissipation for a ta...
Article
Full-text available
A large amount of leakage power could be saved by increasing the number of idle virtual channels (VCs) in a network-on-chip (NoC). Low-leakage power switch is proposed to allow saving in power dissipation of the NoC. The proposed NoC switch employs power supply gating to reduce the power dissipation. Two power reduction techniques are exploited to...
Chapter
Full-text available
Two power-reduction techniques are exploited to design a low leakage power NoC switch. First, the adaptive virtual channel (AVC) technique is presented as an efficient way to reduce the active area using a hierarchical multiplexing tree of VC groups. Second, power gating reduces the average leakage power consumption of the switch by controlling the...
Article
Full-text available
In the modern chip-multiprocessor system, main memory is a shared resource among multiple concurrently executing threads/applications. The memory scheduling algorithms are developed to resolve memory contention by arbitrating memory access in such a way that competing threads progress at a relatively fast and even pace, resulting in high system thr...
Article
Full-text available
In recent processors multicore and multithreaded architectures are widely used. This increases the number of contexts running in parallel. Contexts are competing for shared resources of the processor. One of these critical resources is system main memory. While execution, each context sends requests to the main memory to serve the cache misses. Thi...
Article
Full-text available
Development in VLSI design allows multi- to many-cores to be integrated on a single microprocessor chip. This increase in the core count per chip makes it more critical to design an efficient memory sub-system especially the shared last level cache (LLC). The efficient utilization of the LLC is a dominant factor to achieve the best microprocessor t...
Patent
Full-text available
Fetch operations are assigned to different threads in a multithreaded environment. There are provided a number of different sorting algorithms, from which one is periodically selected on the basis of whether the present algorithm is giving satisfactory results or not. The period is preferably a sub-context interval. The different sorting algorithms...
Article
Full-text available
Two parallel computer paradigms available today are multi-core accelerators such as the Sony, Toshiba and IBM Cell or Graphics Processing Unit GPUs, and massively parallel message-passing machines such as the IBM Blue Gene BG. The solution of systems of linear equations is one of the most central processing unit-intensive steps in engineering and s...
Article
Scientific applications represent a dominant sector of compute-intensive applications. Using massively parallel processing systems increases the feasibility to automate such applications because of the cooperation among multiple processors to perform the designated task. This paper proposes a parallel hidden Markov model (HMM) algorithm for 3D magn...
Conference Paper
Full-text available
Low leakage power with maintained high throughput NoC is achieved. Traffic-based Virtual channel Activation algorithm (TVA) is proposed to activate/deactivate virtual channels in a NoC. The proposed algorithm implements Adaptive Virtual Channel technique of switching-OFF idle virtual channels in the NoC according to traffic heaviness. TVA is an eff...
Patent
Full-text available
Logical processors/hardware contexts are assigned to different jobs/threads in a multithreaded/multicore environment. There are provided a number of different sorting algorithms, from which one is periodically selected on the basis of whether the present algorithm is giving satisfactory results or not. The period is preferably a super-context inter...
Article
Full-text available
This paper proposes a hidden Markov model (HMM) algorithm for 3D MRI brain segmentation using a hierarchical/multi-level parallel implementation. The new technique is implemented using standard message passing interface (MPI). Two platforms are used to test the proposed technique namely PC-cluster system and IBM Blue Gene (BG)/L system. On PC-clust...
Conference Paper
Full-text available
With the increase in the number of the cores integrated in the single-chip microprocessor, the design of an efficient shared Last-Level-Cache (LLC) becomes more critical to the microprocessor performance. In this paper the author proposes v-set cache design for LLC for multi-core microprocessors. The proposed design has the ability to cope with the...
Conference Paper
Full-text available
In modern computer systems, long memory latency is one of the main bottlenecks micro-architects are facing for leveraging the system performance especially for memory-intensive applications. This emphasises the importance of the memory access scheduling to efficiently utilize memory bandwidth. Moreover, in recent micro-processors, multithread and m...
Conference Paper
Full-text available
The Spidergon-Donut on-chip interconnection network was proposed to interconnect the cores of a 1000+ core chip. In this paper, we derive a queueing model for the Spidergon-Donut, and analyze its throughput and latency in comparison with the commercial Spidergon network on which it is based. Results indicate that the Spidergon-Donut throughput is 3...
Article
Two image processing applications, edge detection and image resizing, are studied in this paper on two HPC platforms namely the Cell BE and the Blue Gene/L machines. In this paper we focus on the performance scalability of the studied applications. Our results show that the scale of the problem to be solved highly affects the fitness of the platfor...
Conference Paper
Full-text available
The tradeoff between molecular complexity and accuracy of results continues to be a tug of war in computational material science. One of the challenges in computational nano materials science is the ability to parallelize the tools used, so as to facilitate the solution of more computationally extensive and complex problems with better accuracy. Th...
Chapter
The ICACTE 2009 was organized to gather members of the international community of Computer Theory and Engineering scientists so that researchers from around the world could present their leading-edge work, expanding our community's knowledge and insight into the significant challenges currently being addressed in that research.
Conference Paper
Full-text available
3D transpose is an important operation in many large scale scientific applications such as seismic and medical imaging. This paper proposes a novel algorithm for fast in-place 3D transpose operation. The algorithm exploits Single Instruction Multiple Data (SIMD) multicore architecture with software managed memory hierarchy. Such architectural featu...
Conference Paper
Full-text available
The industry is rapidly moving towards the adoption of chip multi-processors (CMPs) of simultaneous multi-threaded (SMT) cores for general purpose systems. The most prominent use of such processors, at least in the near term, is as job servers running multiple independent threads on the different contexts of the various SMT cores. In such an enviro...
Conference Paper
Full-text available
Today's general-purpose processors are increasingly using multithreading in order to better leverage the additional on-chip real estate available with each technology generation. Simultaneous multi-threading (SMT) was originally proposed as a large dynamic superscalar processor with monolithic hardware structures shared among all threads. Inters hy...
Conference Paper
Full-text available
The performance and power optimization of dynamic superscalar microprocessors requires striking a careful balance between exploiting parallelism and hardware simplification. Hardware structures which are needlessly complex may exacerbate critical timing paths and dissipate extra power. One such structure requiring careful design is the issue queue....
Conference Paper
Full-text available
Microprocessor power dissipation is a growing concern, so much so that it threatens to limit future performance improvements. A major consumer of microprocessor power is the issue queue. Many microprocessors, such as the Compaq Alpha 21264 and IBM POWER4<sup>™</sup>, use a compacting latch-based issue queue design which has the advantage of simplic...
Article
Full-text available
Microprocessor power dissipation is a growing concern, so much so that it threatens to limit future performance improvements. A major consumer of microprocessor power is the issue queue. Many microprocessors, such as the Compaq Alpha 21264 and IBM POWER4 TM , use a compacting latch-based issue queue design which has the advantage of simplicity of d...
Article
Full-text available
The design of the memory hierarchy in a multi-core architecture is a critical component since it must meet the capacity (in terms of bandwidth and low latency) and coordination requirements of multiple threads of control. Most previous designs have assumed either a shared L1 data cache (e.g., simultaneous multithreaded architectures) or L1 caches t...
Article
Full-text available
The design of the memory hierarchy in a multi-core architecture is a critical component since it must meet the capacity (in terms of bandwidth and low latency) and coordination requirements of multiple threads of control. Most previous designs have assumed either a shared L1 data cache (e.g., simultaneous multithreaded architectures) or L1 caches t...

Network

Cited By