| Compute unified device architecture (CUDA) threads and blocks multidimensional programming model.

Source publication

A survey on graphic processing unit computing for large-scale data mining

Article

Full-text available

Feb 2018

Alberto Cano

General purpose computation using Graphic Processing Units (GPUs) is a well‐established research area focusing on high‐performance computing solutions for massively parallelizable and time‐consuming problems. Classical methodologies in machine learning and data mining cannot handle processing of massive and high‐speed volumes of information in the...

Context 1

... GPU scheduler maximizes the occupancy and performance by switching idle warps waiting for data accesses or function results with other warps ready for computation. Figure 2 illustrates the threads and blocks hierarchy of the multidimensional space of computation in CUDA. Figure 3 shows an example in CUDA of the computation of the pairwise Euclidean distances in an n × m matrix, a common task in data mining for measuring similarity. 20 Threads are organized into 16 × 16 two-dimensional blocks, and the blocks are organized into an n/16 × n/16 two-dimensional grid. ...

View in full-text

Context 2

... these works illustrate the potential of GPUs for high- speed streams because they provide very fast response with minimum latency. Figure 12 summarizes the speedups collected from relevant referenced works for data mining tech- niques and applications on single-GPU and multi- GPU computing. Speedups depend on a number of factors including the power of the GPU and CPU architectures being compared, the parallelizability of the computational task, the computational and mem- ory workflow and dependencies, the size of the data problem, and so on. ...

View in full-text

Designing Data-Driven Learning Algorithms: A Necessity to Ensure Effective Post-Genomic Medicine and Biomedical Research

Chapter

Full-text available

May 2019

Advances in sequencing technology have significantly contributed to shaping the area of genetics and enabled the identification of genetic variants associated with complex traits through genome-wide association studies. This has provided insights into genetic medicine, in which case, genetic factors influence variability in disease and treatment ou...

Neural-Network Based MPC for Enhanced Lateral Stability in Electric Vehicles

Article

Full-text available

Jan 2024

Distributed electric drive vehicles offer maneuver-ability but face stability challenges under different driving conditions. Model Predictive Control (MPC) algorithms can improve lateral stability, but their high computational demands hinder real-time implementation. To address this, the proposed strategy combines Nonlinear Autoregressive Exogenous (NARX) neural networks with MPC in two ways, namely, Nonlinear Prediction-Nonlinear Optimization (NMPC-NO) and Nonlinear Prediction-Linearization (MPC-NPL). While NMPC-NO involves online nonlinear optimization, MPC-NPL uses local linearization, reducing both the computational load significantly to about 40% of the computation time of MPC and 0.05% of that of nonlinear model predictive control (NMPC). The neural networks are trained and validated on 20 different datasets, with alternative training methods investigated. MATLAB/Simulink simulations under various standardized tests demonstrate the effectiveness of the proposed techniques, highlighting improved handling performance, reduced computation time, and real-time deployment capabilities.

Comparative Analysis of Deep Learning and Swarm-Optimized Random Forest for Groundwater Spring Potential Identification in Tropical Regions

Article

Full-text available

Sep 2023

Identifying areas with high groundwater spring potential is crucial as it enables better decision-making concerning water supply, sustainable development, and the protection of sensitive ecosystems; therefore, it is necessary to predict the groundwater spring potential with highly accurate models. This study aims to assess and compare the effectiveness of deep neural networks (DeepNNs) and swarm-optimized random forests (SwarmRFs) in predicting groundwater spring potential. This study focuses on a case study conducted in the Gia Lai province, located in the Central Highland of Vietnam. To accomplish this objective, a comprehensive groundwater database was compiled, comprising 938 groundwater spring locations and 12 influential variables, namely land use and land cover (LULC), geology, distance to fault, distance to river, rainfall, normalized difference vegetation index (NDVI), normalized difference moisture index (NDMI), normalized difference water index (NDWI), slope, aspect, elevation, and curvature. The DeepNN model was trained and fine-tuned using the Adaptive Moment Estimation (ADAM) optimizer, while the SwarmRF model employed the Harris Hawks Optimizer (HHO) to search for optimal parameters. The results indicate that both the DeepNN model (accuracy = 77.9%, F-score = 0.783, kappa = 0.559, and AUC = 0.820) and the SwarmRF model (accuracy = 80.2%, F-score = 0.798, kappa = 0.605, and AUC = 0.854) exhibit robust predictive capabilities. The SwarmRF model displays a slight advantage over the DeepNN model in terms of performance. Among the 12 influential factors, geology emerges as the most significant determinant of groundwater spring potential. The groundwater spring potential maps generated through this research can offer valuable information for local authorities to facilitate effective water resource management and support sustainable development planning.

DLA-E: a deep learning accelerator for endoscopic images classification

Article

Full-text available

May 2023

The super power of deep learning in image classification problems have become very popular and applicable in many areas like medical sciences. Some of the medical applications are real-time and may be implemented in embedded devices. In these cases, achieving the highest level of accuracy is not the only concern. Computation runtime and power consumption are also considered as the most important performance indicators. These parameters are mainly evaluated in hardware design phase. In this research, an energy efficient deep learning accelerator for endoscopic images classification (DLA-E) is proposed. This accelerator can be implemented in the future endoscopic imaging equipments for helping medical specialists during endoscopy or colonoscopy in order of making faster and more accurate decisions. The proposed DLA-E consists of 256 processing elements with 1000 bps network on chip bandwidth. Based on the simulation results of this research, the best dataflow for this accelerator based on MobileNet v2 is kcp_ws from the weight stationary (WS) family. Total energy consumption and total runtime of this accelerator on the investigated dataset is 4.56 × 10⁹ MAC (multiplier–accumulator) energy and 1.73 × 10⁷ cycles respectively, which is the best result in comparison to other combinations of CNNs and dataflows.

ezLDA: Efficient and Scalable LDA on GPUs

Article

Full-text available

Jan 2023

Latent Dirichlet Allocation (LDA) is a statistical approach for topic modeling with a wide range of applications. Attracted by the exceptional computing and memory throughput capabilities, this work introduces ezLDA which achieves efficient and scalable LDA training on GPUs with the following three contributions: First, ezLDA introduces three-branch sampling method which takes advantage of the convergence heterogeneity of various tokens to reduce the redundant sampling task. Second, to enable sparsity-aware format for both D and W on GPUs with fast sampling and updating, we introduce hybrid format for W along with corresponding token partition to T and inverted index designs. Third, we design a hierarchical workload balancing solution to address the extremely skewed workload imbalance problem on GPU and scale ezLDA across multiple GPUs. Taken together, ezLDA achieves superior performance over the state-of-the-art attempts with lower memory consumption.

Late industrialisation and global value chains under platform capitalism

Article

Full-text available

Nov 2022

Wim Naudé

The digital (or 4th industrial) revolution has made industrialisation harder by being less consequential for structural transformation than was initially hoped. The rise of digital platform capitalism and its relation to global value chains (GVCs) is responsible for this. This paper explains why diminished expectations of the 4th industrial revolution are justified and how this is due to digital platforms as intellectual monopolies that are reconfiguring GVCs—and by this, making industrialisation harder. As such, the paper contributes to the research lacuna on the relationship between GVCs and digital platform capitalism. The implications for late industrialisation are identified, and broad recommendations for industrial policies are made.

Modelling the Dynamic Response of Root Diameter Changes in Orchid to Temperature and Humidity using NARX Neural Network

Article

Full-text available

Nov 2022

Understanding the dynamic behaviour of plants is vital for realization of precision agriculture systems. This paper proposes a dynamic response model of root diameter to changes in temperature and relative humidity for commercial epiphytic orchids. A Vanda hybrid orchid is used for this study. Root diameter changes, temperature, and relative humidity measurements are logged for fifty-one days. The dynamic response of the root diameter changes is analysed and modelled using non-linear autoregressive with exogenous input neural network. The best performance is obtained with Levenberg-Marquardt algorithm, yielding the best overall regression, lowest mean squared error, and fastest training speed.

Software and hardware co-design and implementation of intelligent optimization algorithms

Article

Sep 2022
APPL SOFT COMPUT

In order to improve the operating efficiency of the algorithm, some intelligent optimization algorithms are considered to be implemented on hardware. However, the existing design scheme has the problem of poor versatility. Therefore, this paper proposes a general software-hardware co-design scheme of intelligent optimization algorithms. In the design scheme, the initialization module and fitness module of the algorithm are deployed on the Advanced RISC Machines (ARM) for execution to increase the flexibility of the program. The update module of the algorithm is deployed on the Field Programmable Gate Array (FPGA) for execution to realize the hardware acceleration. The data between ARM and FPGA is transferred through Advanced eXtensible Interface (AXI) bus. In this paper, the PSO, BA, WOA, GWO, CMAES and EO algorithms are implemented with the proposed design scheme. And the six algorithms are tested on thirteen benchmark functions of different types. The experimental results prove the feasibility of the design scheme. In addition, by comparing with software and other implementation methods in execution time, resource occupancy and convergence, the effectiveness and superiority of the proposed scheme are proved.

Improving the Accessibility of Heterogeneous System Resources for Application Developers using Programming Abstractions

Thesis

Full-text available

Aug 2022

Max Frederik Plauth

The heterogeneity of today's state-of-the-art computer architectures is confronting application developers with an immense degree of complexity which results from two major challenges. First, developers need to acquire profound knowledge about the programming models or the interaction models associated with each type of heterogeneous system resource to make efficient use thereof. Second, developers must take into account that heterogeneous system resources always need to exchange data with each other in order to work on a problem together. However, this data exchange is always associated with a certain amount of overhead, which is why the amounts of data exchanged should be kept as low as possible. This thesis proposes three programming abstractions to lessen the burdens imposed by these major challenges with the goal of making heterogeneous system resources accessible to a wider range of application developers. The lib842 compression library provides the first method for accessing the compression and decompression facilities of the NX-842 on-chip compression accelerator available in IBM Power CPUs from user space applications running on Linux. Addressing application development of scale-out GPU workloads, the CloudCL framework makes the resources of GPU clusters more accessible by hiding many aspects of distributed computing while enabling application developers to focus on the aspects of the data parallel programming model associated with GPUs. Furthermore, CloudCL is augmented with transparent data compression facilities based on the lib842 library in order to improve the efficiency of data transfers among cluster nodes. The improved data transfer efficiency provided by the integration of transparent data compression yields performance improvements ranging between 1.11x and 2.07x across four data-intensive scale-out GPU workloads. To investigate the impact of programming abstractions for data placement in NUMA systems, a comprehensive evaluation of the PGASUS framework for NUMA-aware C++ application development is conducted. On a wide range of test systems, the evaluation demonstrates that PGASUS does not only improve the developer experience across all workloads, but that it is also capable of outperforming NUMA-agnostic implementations with average performance improvements of 1.56x. Based on these programming abstractions, this thesis demonstrates that by providing a sufficient degree of abstraction, the accessibility of heterogeneous system resources can be improved for application developers without occluding performance-critical properties of the underlying hardware.

Introduction to Data Science

Book

May 2022

Giang Nguyen

http://elvira.fiit.stuba.sk/ This textbook presents an introduction to Data Science in the context of the responsible devel- opment of human-centric and trustworthy Artificial Intelligence systems. It presents the recent transition from focusing on modeling to the underlying data used to train and evaluate models. In the textbook, a systematical way to examine data is described with details about data source, data collection, data integration, and data preparation as a part of Data Science process. The aim of the process is to provide the best quality for machine learning data and to build an intelligent application in an organization. Intelligent data modelling is dedicated in one separated chapter to several selected machine learning methods and deep learning architectures. The emphasis is on model evaluation and model selection with optimizations to select the best models to deploy in production. In the context of intelligent software development, the textbook presents the most popular and the most used machine learning and deep learning frameworks and libraries with the dominance of Python open source software at small scale as well as large scale. A notation about specialized hardware for speeding up general purpose computation in the last decade is included. The textbook also emphasises ethics in Artificial Intelligence development with the most notable document “European Union Guideline on Ethics in Artificial Intelligence: Context and Implementation”.

Ethical Tenets of Stock Price Prediction Using Machine Learning Techniques: A Sustainable Approach

Article

Apr 2022

When Machine Learning (ML) algorithms take over decision-making from human conscience and cognition, the essence of ethics gets saturated. The visible decline of ethics generally gets reflected in financial markets, as it portrays human actions and sentiments in numerical terms than any sector. Accuracy in stock market prediction remains inefficient due to many known and unknown variables. Academia and industry recently relied on ML at large to track the market and monetize the movements. The norms of fairness, accuracy, dependability, and transparency in financing is left unattended in ML prediction models with assumptions far from reality. The integration of the tenets proposed in this study can emphasize and reconfigure the sustainable side of investing with concepts already in place but not connected to the network of prediction models. This study focuses on the ethical dimension of Machine Learning models and generates a sustainable framework for investors. Specifically, the Sustainable Development Goals (SDG) can enhance the prediction models in ML with improved efficiency. Along with SDG, this research broadens the variables' horizon of prediction in ML of computer science domain with concepts of Socially Responsible Investing (SRI), Environmental Social and Corporate Governance (ESG), and carbon footprints. With 115 articles reviewed, the proposed framework ensures sustainability in investments at the grassroots level. When adding the sustainability quotient backed by recognized norms, the research can be a breakthrough in rational investing based on ethical judgments aided by initiatives like the United Nation's SDG goals.

| Compute unified device architecture (CUDA) threads and blocks multidimensional programming model.

Contexts in source publication

Similar publications

Citations