FIGURE 2 - uploaded by Alberto Cano
Content may be subject to copyright.
| Compute unified device architecture (CUDA) threads and blocks multidimensional programming model. 

| Compute unified device architecture (CUDA) threads and blocks multidimensional programming model. 

Source publication
Article
Full-text available
General purpose computation using Graphic Processing Units (GPUs) is a well‐established research area focusing on high‐performance computing solutions for massively parallelizable and time‐consuming problems. Classical methodologies in machine learning and data mining cannot handle processing of massive and high‐speed volumes of information in the...

Contexts in source publication

Context 1
... GPU scheduler maximizes the occupancy and performance by switching idle warps waiting for data accesses or function results with other warps ready for computation. Figure 2 illustrates the threads and blocks hierarchy of the multidimensional space of computation in CUDA. Figure 3 shows an example in CUDA of the computation of the pairwise Euclidean distances in an n × m matrix, a common task in data mining for measuring similarity. 20 Threads are organized into 16 × 16 two-dimensional blocks, and the blocks are organized into an n/16 × n/16 two-dimensional grid. ...
Context 2
... these works illustrate the potential of GPUs for high- speed streams because they provide very fast response with minimum latency. Figure 12 summarizes the speedups collected from relevant referenced works for data mining tech- niques and applications on single-GPU and multi- GPU computing. Speedups depend on a number of factors including the power of the GPU and CPU architectures being compared, the parallelizability of the computational task, the computational and mem- ory workflow and dependencies, the size of the data problem, and so on. ...

Similar publications

Chapter
Full-text available
Advances in sequencing technology have significantly contributed to shaping the area of genetics and enabled the identification of genetic variants associated with complex traits through genome-wide association studies. This has provided insights into genetic medicine, in which case, genetic factors influence variability in disease and treatment ou...

Citations

... Warm start and online methods, as utilized by [28], reduce computation by reusing previous solutions, but are more suitable for slowly varying dynamics. The authors in [29] discuss the utilization of Graphics Processing Units (GPUs) for MPC computations, offering a significant speedup but demanding access to GPU hardware and complex implementation. Active-set methods, as explored by [30], focus on efficiently solving Quadratic Programming (QP) problems through active constraint selection and updating, thereby reducing computation time, though they may require careful tuning and implementation. ...
Article
Full-text available
Distributed electric drive vehicles offer maneuver-ability but face stability challenges under different driving conditions. Model Predictive Control (MPC) algorithms can improve lateral stability, but their high computational demands hinder real-time implementation. To address this, the proposed strategy combines Nonlinear Autoregressive Exogenous (NARX) neural networks with MPC in two ways, namely, Nonlinear Prediction-Nonlinear Optimization (NMPC-NO) and Nonlinear Prediction-Linearization (MPC-NPL). While NMPC-NO involves online nonlinear optimization, MPC-NPL uses local linearization, reducing both the computational load significantly to about 40% of the computation time of MPC and 0.05% of that of nonlinear model predictive control (NMPC). The neural networks are trained and validated on 20 different datasets, with alternative training methods investigated. MATLAB/Simulink simulations under various standardized tests demonstrate the effectiveness of the proposed techniques, highlighting improved handling performance, reduced computation time, and real-time deployment capabilities.
... Deep learning (DL) is a subfield of artificial intelligence that focuses on developing and applying algorithms and models based on interconnected networks of simple neurons [46,47]. Deep learning has a long history, but its practical applications and remarkable impact on various fields have been most prominent in the past decade, especially from the 2010s onward, primarily due to advancements in computational power and especially the development of graphics processing units (GPUs) [48]. In addition to breakthroughs in deep-learning architectures and techniques, the use of rectified linear unit (ReLU) and the ADAM optimization significantly improved performance [49]. ...
Article
Full-text available
Identifying areas with high groundwater spring potential is crucial as it enables better decision-making concerning water supply, sustainable development, and the protection of sensitive ecosystems; therefore, it is necessary to predict the groundwater spring potential with highly accurate models. This study aims to assess and compare the effectiveness of deep neural networks (DeepNNs) and swarm-optimized random forests (SwarmRFs) in predicting groundwater spring potential. This study focuses on a case study conducted in the Gia Lai province, located in the Central Highland of Vietnam. To accomplish this objective, a comprehensive groundwater database was compiled, comprising 938 groundwater spring locations and 12 influential variables, namely land use and land cover (LULC), geology, distance to fault, distance to river, rainfall, normalized difference vegetation index (NDVI), normalized difference moisture index (NDMI), normalized difference water index (NDWI), slope, aspect, elevation, and curvature. The DeepNN model was trained and fine-tuned using the Adaptive Moment Estimation (ADAM) optimizer, while the SwarmRF model employed the Harris Hawks Optimizer (HHO) to search for optimal parameters. The results indicate that both the DeepNN model (accuracy = 77.9%, F-score = 0.783, kappa = 0.559, and AUC = 0.820) and the SwarmRF model (accuracy = 80.2%, F-score = 0.798, kappa = 0.605, and AUC = 0.854) exhibit robust predictive capabilities. The SwarmRF model displays a slight advantage over the DeepNN model in terms of performance. Among the 12 influential factors, geology emerges as the most significant determinant of groundwater spring potential. The groundwater spring potential maps generated through this research can offer valuable information for local authorities to facilitate effective water resource management and support sustainable development planning.
... In the recent decade, Graphic Processing Units (GPUs) have become very prevalent for speeding up the computations in variant use cases like image processing, cryptography, data mining, medical physics, biostatistics and many other applications. The key success factor of GPU was utilizing high level of parallelism for data and computing tasks [12,13]. ...
Article
Full-text available
The super power of deep learning in image classification problems have become very popular and applicable in many areas like medical sciences. Some of the medical applications are real-time and may be implemented in embedded devices. In these cases, achieving the highest level of accuracy is not the only concern. Computation runtime and power consumption are also considered as the most important performance indicators. These parameters are mainly evaluated in hardware design phase. In this research, an energy efficient deep learning accelerator for endoscopic images classification (DLA-E) is proposed. This accelerator can be implemented in the future endoscopic imaging equipments for helping medical specialists during endoscopy or colonoscopy in order of making faster and more accurate decisions. The proposed DLA-E consists of 256 processing elements with 1000 bps network on chip bandwidth. Based on the simulation results of this research, the best dataflow for this accelerator based on MobileNet v2 is kcp_ws from the weight stationary (WS) family. Total energy consumption and total runtime of this accelerator on the investigated dataset is 4.56 × 10⁹ MAC (multiplier–accumulator) energy and 1.73 × 10⁷ cycles respectively, which is the best result in comparison to other combinations of CNNs and dataflows.
... Without loss of generality, this section uses Volta GPUs as an example to illustrate the essential backgrounds about modern GPUs, mainly from three aspects, that is, processor, memory and programming primitives. For more details about GPUs, we refer the readers to [40], [41]. ...
Article
Full-text available
Latent Dirichlet Allocation (LDA) is a statistical approach for topic modeling with a wide range of applications. Attracted by the exceptional computing and memory throughput capabilities, this work introduces ezLDA which achieves efficient and scalable LDA training on GPUs with the following three contributions: First, ezLDA introduces three-branch sampling method which takes advantage of the convergence heterogeneity of various tokens to reduce the redundant sampling task. Second, to enable sparsity-aware format for both D and W on GPUs with fast sampling and updating, we introduce hybrid format for W along with corresponding token partition to T and inverted index designs. Third, we design a hierarchical workload balancing solution to address the extremely skewed workload imbalance problem on GPU and scale ezLDA across multiple GPUs. Taken together, ezLDA achieves superior performance over the state-of-the-art attempts with lower memory consumption.
... Although the term AI was coined in 1956 (Moor, 2006), it was only in the last 12 years or so that modern AI, based on Machine Learning (ML), came into use, following the mentioned availability of big data, as well as advances in computing, such as the development of efficient Graphical Processing Units (GPUs) for computers in the video gaming industry, and the elaboration of algorithmic techniques that could combine these to allow computer programs to recognise patterns in data and make predictions, and learn to improve these over time (Cano, 2017;Hinton & Salakhutdinov, 2006;LeCun et al., 2015). ...
Article
Full-text available
The digital (or 4th industrial) revolution has made industrialisation harder by being less consequential for structural transformation than was initially hoped. The rise of digital platform capitalism and its relation to global value chains (GVCs) is responsible for this. This paper explains why diminished expectations of the 4th industrial revolution are justified and how this is due to digital platforms as intellectual monopolies that are reconfiguring GVCs—and by this, making industrialisation harder. As such, the paper contributes to the research lacuna on the relationship between GVCs and digital platform capitalism. The implications for late industrialisation are identified, and broad recommendations for industrial policies are made.
... These have been realized using imagery information from unmanned aerial vehicle (UAV) [20] and time-series data from various sensor systems [21]. However, the method is costly as it relies on extensive amount of data, and expensive graphic processing units [22]. ...
Article
Full-text available
Understanding the dynamic behaviour of plants is vital for realization of precision agriculture systems. This paper proposes a dynamic response model of root diameter to changes in temperature and relative humidity for commercial epiphytic orchids. A Vanda hybrid orchid is used for this study. Root diameter changes, temperature, and relative humidity measurements are logged for fifty-one days. The dynamic response of the root diameter changes is analysed and modelled using non-linear autoregressive with exogenous input neural network. The best performance is obtained with Levenberg-Marquardt algorithm, yielding the best overall regression, lowest mean squared error, and fastest training speed.
... Compared with Central Processing Unit (CPU) [34], the parallel computing capability of the FPGA can improve the running speed of the program and reduce the delay. Compared with Graphic Processing Unit (GPU) [35], FPGA has certain advantages in power consumption and flexibility. Compared with Application Specific Integrated Circuit (ASIC) chips [36], FPGA has the advantages of short development cycle and high cost performance. ...
Article
In order to improve the operating efficiency of the algorithm, some intelligent optimization algorithms are considered to be implemented on hardware. However, the existing design scheme has the problem of poor versatility. Therefore, this paper proposes a general software-hardware co-design scheme of intelligent optimization algorithms. In the design scheme, the initialization module and fitness module of the algorithm are deployed on the Advanced RISC Machines (ARM) for execution to increase the flexibility of the program. The update module of the algorithm is deployed on the Field Programmable Gate Array (FPGA) for execution to realize the hardware acceleration. The data between ARM and FPGA is transferred through Advanced eXtensible Interface (AXI) bus. In this paper, the PSO, BA, WOA, GWO, CMAES and EO algorithms are implemented with the proposed design scheme. And the six algorithms are tested on thirteen benchmark functions of different types. The experimental results prove the feasibility of the design scheme. In addition, by comparing with software and other implementation methods in execution time, resource occupancy and convergence, the effectiveness and superiority of the proposed scheme are proved.
... Over the last decade, the use of GPUs as a general purpose compute resource has become prevalent across arbitrary domains [24,87,188,174]. Consequently, the demand for GPU compute resources has been steadily increasing over the last few years to the point where many use cases even require multiple GPUs to satisfy their resource demands. ...
Thesis
Full-text available
The heterogeneity of today's state-of-the-art computer architectures is confronting application developers with an immense degree of complexity which results from two major challenges. First, developers need to acquire profound knowledge about the programming models or the interaction models associated with each type of heterogeneous system resource to make efficient use thereof. Second, developers must take into account that heterogeneous system resources always need to exchange data with each other in order to work on a problem together. However, this data exchange is always associated with a certain amount of overhead, which is why the amounts of data exchanged should be kept as low as possible. This thesis proposes three programming abstractions to lessen the burdens imposed by these major challenges with the goal of making heterogeneous system resources accessible to a wider range of application developers. The lib842 compression library provides the first method for accessing the compression and decompression facilities of the NX-842 on-chip compression accelerator available in IBM Power CPUs from user space applications running on Linux. Addressing application development of scale-out GPU workloads, the CloudCL framework makes the resources of GPU clusters more accessible by hiding many aspects of distributed computing while enabling application developers to focus on the aspects of the data parallel programming model associated with GPUs. Furthermore, CloudCL is augmented with transparent data compression facilities based on the lib842 library in order to improve the efficiency of data transfers among cluster nodes. The improved data transfer efficiency provided by the integration of transparent data compression yields performance improvements ranging between 1.11x and 2.07x across four data-intensive scale-out GPU workloads. To investigate the impact of programming abstractions for data placement in NUMA systems, a comprehensive evaluation of the PGASUS framework for NUMA-aware C++ application development is conducted. On a wide range of test systems, the evaluation demonstrates that PGASUS does not only improve the developer experience across all workloads, but that it is also capable of outperforming NUMA-agnostic implementations with average performance improvements of 1.56x. Based on these programming abstractions, this thesis demonstrates that by providing a sufficient degree of abstraction, the accessibility of heterogeneous system resources can be improved for application developers without occluding performance-critical properties of the underlying hardware.
... The main feature of many-core accelerators such as GPU is their massively parallel architecture, allowing them to speed up computations that involve matrix-based operations, which are at heart of many deep learning implementations [78]. Manufacturers often offer the possibility to enhance hardware configuration with many-core accelerators to improve machine/cluster performance as well as accelerated libraries, which provide highly optimized primitives, algorithms and functions to access the massively parallel power of GPUs. ...
... The surge of large Volume of information, especially with the Variety characteristic, to be processed by data mining and machine learning algorithms demand new transformative parallel and distributed computing solutions capable to scale computation effectively and efficiently. That is the reason of accelerated computing rising with GPU as the specialized hardware for speeding up general purpose computation in the last decade (Section 6.3) [78], [84]. ...
Book
http://elvira.fiit.stuba.sk/ This textbook presents an introduction to Data Science in the context of the responsible devel- opment of human-centric and trustworthy Artificial Intelligence systems. It presents the recent transition from focusing on modeling to the underlying data used to train and evaluate models. In the textbook, a systematical way to examine data is described with details about data source, data collection, data integration, and data preparation as a part of Data Science process. The aim of the process is to provide the best quality for machine learning data and to build an intelligent application in an organization. Intelligent data modelling is dedicated in one separated chapter to several selected machine learning methods and deep learning architectures. The emphasis is on model evaluation and model selection with optimizations to select the best models to deploy in production. In the context of intelligent software development, the textbook presents the most popular and the most used machine learning and deep learning frameworks and libraries with the dominance of Python open source software at small scale as well as large scale. A notation about specialized hardware for speeding up general purpose computation in the last decade is included. The textbook also emphasises ethics in Artificial Intelligence development with the most notable document “European Union Guideline on Ethics in Artificial Intelligence: Context and Implementation”.
... The ability of retail investors to access real-time sustainability-based investment details of a company on a constant basis is possible today with ML models. This was considered impossible a decade back due to computing limitations (47). Today, the ML models can filter crucial data required by retail investors and act as a catalyst for impactful investing. ...
Article
When Machine Learning (ML) algorithms take over decision-making from human conscience and cognition, the essence of ethics gets saturated. The visible decline of ethics generally gets reflected in financial markets, as it portrays human actions and sentiments in numerical terms than any sector. Accuracy in stock market prediction remains inefficient due to many known and unknown variables. Academia and industry recently relied on ML at large to track the market and monetize the movements. The norms of fairness, accuracy, dependability, and transparency in financing is left unattended in ML prediction models with assumptions far from reality. The integration of the tenets proposed in this study can emphasize and reconfigure the sustainable side of investing with concepts already in place but not connected to the network of prediction models. This study focuses on the ethical dimension of Machine Learning models and generates a sustainable framework for investors. Specifically, the Sustainable Development Goals (SDG) can enhance the prediction models in ML with improved efficiency. Along with SDG, this research broadens the variables' horizon of prediction in ML of computer science domain with concepts of Socially Responsible Investing (SRI), Environmental Social and Corporate Governance (ESG), and carbon footprints. With 115 articles reviewed, the proposed framework ensures sustainability in investments at the grassroots level. When adding the sustainability quotient backed by recognized norms, the research can be a breakthrough in rational investing based on ethical judgments aided by initiatives like the United Nation's SDG goals.