# Cluster Computing

Online ISSN: 1573-7543
Recent publications
Article
• Abdul Razzaq
Internet of Things (IoTs) is an integrated network collection of heterogeneous objects which enable seamless integration between systems, humans, devices, and various other things, to support pervasive computing for smart systems. IoT-driven systems and sensors continuously ingest data resulting in an increased volume and velocity of information which can lead to critical concerns such as security of the data and scalability of the system. The Internet of Underwater Things (IoUTs) is a specific genre of IoTs in which data related to oceanic ecosystems is continuously sensed through underwater sensors. IoUT has emerged as an innovative paradigm to support smart oceans. However, there are several critical challenges which IoUT system designers must consider such as (1) scalability of the system to handle large volumes of oceanic data and (2) security of data that is transmitted from IoT sensors deployed underwater. Blockchain as a newly emerged technology and an enabling platform allows decentralized and secure transmission of data among a wide group of untrustworthy parties. This research aims to exploit blockchain technology to secure IoUT data transmission by exploiting Interplanetary File System (IPFS) method. Additionally, this study also addresses the system’s scalability in two aspects, (1) scalability, and (2) security. We used a case study-based approach and performed experiments to evaluate the proposed solution’s usability and efficiency in terms of query response (i.e., performance), and algorithmic execution (i.e., efficiency). The proposed solution unifies blockchain technologies to secure IoT-driven systems and provides guidelines to engineer and develop next-generation of robust and secure blockchain-aided distributed IoT systems.

Article
• Tarek Kanan
• Ala Mughaid
• [...]
Satisfaction Detection is one of the most common issues that impact the business world. So, this study aims to suggest an application that detects the Satisfaction tone that leads to customer happiness for Big Data that came out from online businesses on social media, in particular, Facebook and Twitter, by using two famous methods, machine learning and deep learning (DL) techniques.There is a lack of datasets that are involved with this topic. Therefore, we have collected the dataset from social media. We have simplified the concept of Big Data analytics for business on social media using three of the most famous Natural Language Processing tools, stemming, normalization, and stop word removal. To evaluate the performance of the classifiers, we calculated F1-measure, Recall, and Precision measures. The result showed superiority for the Random Forest classifier the highest value of F1-measure with (99.1%). The best result achieved without applying pre-processing techniques, through Support Vector Machine with F1-measure (93.4%). On the other hand, we apply DL techniques, and we apply the feature extraction method, which includes Word Embedding and Bag of Words on the dataset. The results showed superiority for the Deep Neural Networks DNN algorithm.

Article
A blockchain based scheme is proposed in the underlying work for performing registration, mutual authentication, data sharing and nonrepudiation in internet of wireless sensor things. The nodes are divided into three types in the proposed scheme: sensor nodes, cluster heads and coordinators. Moreover, a consortium blockchain, deployed on the coordinators, is employed for storing the legitimate nodes' identities. Furthermore, coordinators also help in the execution of smart contracts, which facilitate the sensor nodes in authentication, data sharing and nonrepudia-tion processes. Additionally, for storing the nodes' ambient data, artificial intelligence based interplanetary file system (IPFS) is used. Furthermore, to increase the transaction throughput and efficiency of the network, a stellar consensus protocol is used. From the simulation results, the transaction latency of the proposed model is approximately 81.82% lower than the proof of work based model. Moreover, the gas consumption of data request and provisioning is 0.10 US Dollars.

Article
The plethora of existing methods in the streaming environment is sensitive to extensive and high-dimensional data. The distribution of these streaming data may change concerning time, known as concept drift. Several drift detectors are built to identify the drift near its occurrence point. Still, they lack proper attention to determine the feature relevance change over time, known as feature drift. Over time, the distribution change of the relevant features subset or the change in the relevant features subset itself may cause feature drift in the data stream. The paper proposes an adaptive principal component analysis based feature drift detection method (PCA-FDD) using the statistical measure to determine the feature drift. The proposed work presents a framework for identifying the most important features subset, feature drift, and incremental adaptation of the prediction model. The proposed method finds the relevant features subset by utilizing the incremental PCA and detects feature drift by observing the change in the percentage similarities among the most important features subset with respect to time. It also helps to forecast the prediction error of the base learning model. The proposed method is compared with state-of-the-art methods using synthetic and real-time datasets. The evaluation results exhibit that the proposed work performs better than the existing compared methods in terms of classification accuracy.

Article
Cloud technology is a modern data storing technique that gives opportunities for outsourcing storage and computation. While storing sensitive data (such as medical records) On the cloud side can violate personal privacy, Homomorphic Encryption (HE) was presented as a special type of encryption that leverages users' privacy by allowing computation over cipher-texts at the cloud side. In our prior work, we developed and tested a new additive HE scheme (SAVHO) that has been proven to be a good competitor for the Paillier scheme. The aim of this paper is to build a new se- cure and efficient multiplicative HE scheme competitor for the well-known multiplicative HE scheme ElGamal. The proposed scheme is called Logarithm Operation for Randomization and Multiplicative Homomorphic Encryption scheme (LORMHE). Security and performance analyses have proven its high level of security and its efficiency in comparison with ElGamal scheme and its efficiency for real-world applications.

Article
Nowadays, the research in deoxyribonucleic acid (DNA) cryptography seeks to implement data transmission techniques to ensure secure data transmission across the world. As data transmission techniques are not secured due to the presence of hackers and attackers, a DNA-based cryptosystem can be suitable to secure data transmission, where confidential information (plaintext) is encoded in an unreadable form (ciphertext) prior to its transmission. This paper proposes a novel cryptosystem based on DNA cryptography and finite state machines. Here, finite state machines perform substitution operations on the DNA sequence and make the system more secure. Moreover, a DNA character conversion table is proposed in this paper to increase the randomness of the ciphertext. The efficiency of the proposed scheme is tested in terms of the randomness of the ciphertext. The randomness of the ciphertext determines the security of a cryptosystem, and here, randomness tests mentioned in the National Institute of Standards and Technology (NIST) test suite assess the randomness of the ciphertext. The experimental results show that the proposed scheme yields an average P-value of 0.95, which outperforms the existing systems. The proposed scheme guarantees a highly secured cryptosystem as an average avalanche effect of 75.65% is achieved. As a result, the proposed scheme is more secure than the existing DNA-based cryptosystems.

Article
In mobile edge computing environment, due to resources constraints of edge devices, when user locations continue changing, the network will be delayed or interrupted, which affects the quality of user’s service access. Previous studies have shown that deploying multiple microservice instances with the same function on multiple edge servers through container technology can solve this problem. However, how to choose the optimal microservice instance from multiple servers in a cloud-edge hybrid environment needs to be further investigated. This paper studies the selection of microservices problem based on the dynamic and heterogeneous characters of the cloud-edge collaborative environment, which is defined as a microservice selection and scheduling optimization problem (MSSP) to minimize users’ service access delay. To cope with the complexity of cloud-edge collaborative environment and improve learning efficiency, MSSP is regarded as a Markov decision-making process, a Deep Deterministic Policy Gradient algorithm for microservice selection called MS_DDPG is then proposed to solve this problem, and the microservice selection strategy experience pool is established in MS_DDPG. Performance evaluations of MS_DDPG based on a real dataset and some synthetic dataset have been conducted, and the results show that MS_DDPG outperforms the other three baseline algorithms. In terms of average access delay, MS_DDPG is reduced by 23.82%. We also validate the performance of MS_DDPG by increasing the number of user requests, and the results also show that MS_DDPG obtains better performance in scalability.

Article
Nowadays, attackers are constantly targeting the modern aspects of technology and attempting to abuse these technologies using different attacks types such as the distributed denial of service attack (DDoS). Therefore, protecting web services is not an easy task. There is a critical demand to detect and prevent DDoS attacks. This paper introduces a fuzzy inference-based anomaly-based intrusion detection (IDS) system to detect DDoS attacks. The aim of using the fuzzy inference system is to avoid binary decisions and, meanwhile, to avoid the issues associated with the deficiencies of IDS alert system awareness. This benefit could improve the IDS alert system’s robustness and effectively produce more readable and understandable IDS alerts. The proposed detection model was applied to a recent open-source DDoS dataset. At the early stage of designing the proposed detection model, the DDoS dataset was preprocessed using the Info-gain features selection algorithm to deal with the relevant features only and reduce the complexity of the fuzzy inference system. The proposed detection model was tested, evaluated, and obtained a 96.25% accuracy rate and a false-positive rate of 0.006%. Moreover, it effectively smoothes the boundaries between normal and DDoS traffic. In addition, the results obtained from the proposed detection model were compared with other literature results. The results indicated that the detection accuracy of this work is competitive with other methods. In addition to this, this work offers more elements of trust in DDoS attack detection by following the strategy to avoid the binary decision and offering the required extension of the binary decision to the continuous space; hence, the attack level could be easily measured.

Article
Satisfaction Detection is one of the most common issues that impact the business world. So, this study aims to suggest an application that detects the Satisfaction tone that leads to customer happiness for Big Data that came out from online businesses on social media, in particular, Facebook and Twitter, by using two famous methods, machine learning and deep learning (DL) techniques.There is a lack of datasets that are involved with this topic. Therefore, we have collected the dataset from social media. We have simplified the concept of Big Data analytics for business on social media using three of the most famous Natural Language Processing tools, stemming, normalization, and stop word removal. To evaluate the performance of the classifiers, we calculated F1-measure, Recall, and Precision measures. The result showed superiority for the Random Forest classifier the highest value of F1-measure with (99.1%). The best result achieved without applying pre-processing techniques, through Support Vector Machine with F1-measure (93.4%). On the other hand, we apply DL techniques, and we apply the feature extraction method, which includes Word Embedding and Bag of Words on the dataset. The results showed superiority for the Deep Neural Networks DNN algorithm.

Article
The usage of Internet of Things (IoT) has introduced genuine concerns regarding resource and its Quality of Service (QoS) and Quality of Experience (QoE), when data is in collection, exchange, and used. In this article, a novel QoE-aware mechanism based on task allocation in IoT edge computing systems, has proposed. We developed a tasks processing controller based on a fuzzy logic to computing the QoS requirements of each task request considering multiple context QoE parameters decide its allocation, such as: data, network congestion, resource affordability, and quality. Furthermore, we illustrate how the design and realization of fuzzy mechanism leads to refinements used IoT-Edge architectural styles on Processing in the edge of IoT Network. To represent the effectiveness of fuzzy approach, we instantiate the task allocation mechanism in the edge architectures and evaluate their performance using iFog-Sim. The simulation result suggest that the utilization of the fuzzy task allocation mechanism improves network usage, latency, and reduces energy consumption.

Article
Software-defined networks (SDN) offer a centralized administration programming interface to govern the network infrastructure. It overtook conventional networks by creating a configurable link between the control and data planes. As the logic of the SDN environment completely depends on the control plane, the controller is vulnerable to many security attacks. To degrade the network’s performance, attackers will saturate the control plane resources. TCP flooding is a serious threat in which attackers restrict legitimate users from accessing the network resources. To handle this problem, we propose a TCP Flooding Attack Detection (TFAD) technique using proxy-based and Machine-Learning-based mechanisms (ML-TFAD). The TFAD technique contains two proxies, SYN and ACK: the former defends against TCP SYN flood attacks and the latter against TCP ACK flood attacks. The ML-TFAD module uses the C4.5 decision tree algorithm, which detects SYN flood attacks before reaching the targeted server. The CAIDA 2007 DDoS dataset is involved in training the proposed model. The proposed mechanisms help remove half-opened connections from the server queue at the earliest to accommodate TCP connection requests from legitimate users.

Article
With the rise of 5G/6G and cloud computing, cluster management has become increasingly popular. Elastic cluster resources allow cloud clients to dynamically scale their resource requirements over time. Existing researches of cluster schedulers focus on improving resource scheduling speed, increasing cluster utilization, compacting the number of active physical machines (PMs) and time satisfaction function (TSF) within a cluster. The TSF is applied as a time to measure the parallel-VM scheduling problem. However, completing execution time (makespan) of task requests is often neglected, which results in inaccurate scheduling and unreasonable total cost computation. The total cost involves PM cost, migrate cost, and balance cost. To solve the problem of inaccurate scheduling of task requests and total cost billing in cluster management, in this paper, we propose an innovative heuristic algorithm, namely, multi-objective two-stage variable neighborhood searching (MO_STVNS), which aims at minimizing total cost while also considering TSF for active PMs. Moreover, we design a Multi-Objective FreeVM (MO-FreeVM) scheduler based on resource prediction, which incorporates a variety of algorithms to work in collaboration to provide near-optimal resource management for cluster. We evaluate MO_STVNS in different real traces and measure it through extensive experiments. The experimental results show that compared with state-of-art methods, the average total cost and average TSF of MO_STVNS are reduced by 33.75% and 60.67% respectively.

Article
Frequent itemset mining (FIM) is one of the prominent techniques to extract knowledge from transactional databases. Finding frequent itemsets becomes tiresome while dealing with large-scale datasets due to the enormous demand for computational resources. Cluster-based versions of FIM algorithms become a natural choice to make FIM algorithms efficient and scalable for large-scale datasets and to meet the ever-growing data needs. A variety of MapReduce-based algorithms on the Hadoop cluster has been developed to meet the expectations. Due to the iterative nature of the FIM algorithms, they are still unable to perform adequately. Bottlenecks associated with MapReduce-based FIMs include challenges originating from adapting FIM algorithms in parallel and distributed contexts, as well as reading and publishing intermediate results to HDFS, which necessitates significant disc and communication I/O. Many FIM algorithms have been redesigned on Spark to achieve efficiency for large-scale datasets by utilizing its in-memory processing capabilities. However, Spark-based FIM adaptations still face the same challenges except achieving memory efficiency. The limitations associated with basic FIM algorithms, such as repeated scanning of input data, massive candidate generation, and maintaining large conditional trees in memory, still need to be addressed. Also, tuning of Spark’s shuffle behavior becomes important to control the communication overhead. In this paper, we propose a Spark-based algorithm, namely PartEclat. It uses the Eclat method in combination with the partitioning technique to gain efficiency and scalability. Vertical data format helps to avoid repeated scanning of input data to calculate individual support values. It utilizes the benefits of the partitioning technique to limit the movement of key-value pairs across the cluster nodes (shuffle overheads) during iterations. Also, it helps to deal efficiently with the memory requirements to handle large Transaction ID (TID) sets and computational overhead imposed in computing the intersection of these large TID sets. In-depth experiments with benchmark datasets have been conducted to gain insight into the efficiency and scalability performance of the algorithm. Experimental results show that the proposed scheme outperforms other Spark-based Eclat variants by reducing network and computing loads by approx. 25–40%.

Article
The Routing Protocol for Low power and Lossy networks (RPL) utilises the Objective Function (OF) to form a Destination Oriented Directed Acyclic Graph (DODAG) to reach the destination by selecting the best path. Many works in literature have explored this domain concerning the Internet of Things (IoT) applications. Although, the application of RPL protocol from IoT to the Internet of Vehicles (IoV) in the smart city still presents a big test. Since this gap has not been much traversed, it motivated us to present our findings on this research gap. This paper has realised the transition of RPL protocol from IoT to IoV for the first time. The network performance has been analysed using RPL in a static and mobile environment based on three configurations: Quality of Service (QoS) parameters, network scalability and mobility models. Also, a comprehensive analysis of the RPL performance in both environments has been bestowed in our paper. Finally, we have summarised our inputs and stated potential future directions for researchers. The experiments have been performed using Contiki OS/Cooja Simulator, BonnMotion tool and Wireshark. Simulation results have shown that Self-similar Least Action Walk (SLAW) has outperformed Random Way-Point (RWP) and Nomadic mobility model. High value of Packet Delivery Ratio (PDR) is achieved in mobile/dynamic environment than static. These findings can be directly applied to IoV and IoT applications using RPL protocol like Traffic Monitoring System (TMS), smart corridors, Electronic Toll Collection (ETC), etc. in smart city. Moreover, this article will help the researchers in gaining a better insight of RPL protocol in static and mobile environments for future works.

Article
In this paper, a multi agent recommender system is designed and developed for user interests extraction The system consists of eight agents such as age, identity, personality, social, financial, location, and needs. The agents works with each others in a collaborative way to make recommendation to the users according to their interest. The relation between the agents and the users are controlled by a well developed protocol and pre-defined senses. The information between the users and the agents are collected in information center agent (ICA). The data collected in ICA can be used to rearrange the videos in way such that it is more relative to the user depending on his interest. This interest can be extracted from the information that the user initially provides to the system which can be then analyzed from the multi agent system to decide whether the user is interested in a video or not. This is done by creating video -important term matrix, user important term matrix and agent -feature matrix. Then, theses matrices are used by the multi-agent system to get video-Agent effective matrix for the users which leads to most ordered videos in order to be presented to the user. The proposed model was verified by intensive simulations using eight agents using JADE platform. The results show that the accuracy of the system for 50 videos that were well arranged for 40 users is 87%.

Article
Cloud computing provides different types of resources to users on-demand which are hosted in cloud data centers. Aforesaid services are provided at the expense of large energy consumption. Energy consumption increases the expenditure budget, greenhouse gases, and CO2 emissions. To handle this issue, researchers have come up with various server-level energy-efficient techniques. Though the proposed techniques attempt to reduce energy consumption, they only consider the energy consumption of the CPU during the task placement process. However, researchers have recently noted that memory is also one of the higher energy consumption components and it should be considered in task placement. Moreover, existing techniques ignore the SLA violations that are encountered due to workload. To address the aforementioned issues, we propose two novel nature-inspired techniques which consider the energy consumption of both CPU and memory during the VM placement process. Proposed novel techniques are based on artificial bee colony and particle swarm optimization which haven’t been used to place VM while considering energy consumption of CPU and memory. Moreover, to handle the issue of resultant SLA violations, we also provide the SLA-aware variants of the proposed energy-efficient techniques, which try to lower SLA violations faced because of excessive task consolidation. The results depict that the proposed energy-efficient techniques perform better than the existing state-of-the-art techniques, whereas proposed SLA variants also reduce the SLA violations.

Article
The sixth-generation (6G) wireless communication networks are expected to support heterogeneous services and decentralized infrastructure with resource-aware smart self-organization for Internet of Things (IoT) applications. Large-scale IoT applications face challenges like load balancing and scalability within the network due to the inherent vulnerability of ad-hoc structures. This paper proposes a multiHop constant-time complexity clustering algorithm (MultiHopFast) for IoT networks to address these challenges. The proposed MultiHopFast algorithm reduces the computing burden from IoT nodes with smart load balancing to ensure IoT network scalability. The algorithm addresses the network load, scalability, and time efficiency challenges. Using neighbourhood heuristics, the MultiHopFast algorithm builds appropriate size (i.e., up to 5 hops) of clusters with participating IoT nodes. Each cluster is associated with a cluster head (CH) (or a coordinator). The MultiHopFast algorithm probabilistically selects CH for each cluster. When compared with state-of-the-art counterparts, MultiHopFast algorithm: (i) operates with constant-time complexity in a large scale network as well as in small-scale networks, (ii) runs without any impact on network scalability, and (iii) creates 12% fewer CHs to save precious resources such as energy. Better use of heuristics and resource-aware self-organization, constant-time computational complexity, and network operation with fewer CHs demonstrate that the performance of the MultiHopFast algorithm surpasses the compared algorithms in the literature. The MultiHopFast algorithm is envisioned as a better candidate to match the standard and expectations set by the 6G wireless communications.

Article
Massive internet of things (IoT) data generated by IoT edge devices are shaping the data economy. Monetizing the stream of IoT data has enabled the development of IoT data trading systems, which allow individuals to sell and exchange data. This article presents a blockchain-based system for IoT data trading using fog computing. We propose crowdsourcing fog nodes on the edge network to communicate and collect data from IoT edge device owners. This paper focuses on developing a secure and dependable blockchain-based system that allows data providers and consumers to engage in data trading process. Through experimental results, we evaluate the performance of the proposed model with respect to transaction throughput, latency, and resource consumption metrics under varied scenarios and parameters using Hyperledger blockchain.

Article
In recent years, the exponential growth of malware has posed a significant security threat to intelligent systems. Earlier static and dynamic analysis methods fail to achieve effective recognition rate and incurs high computational complexity. The recently developed machine learning (ML) and deep learning (DL) models can be employed to detect and classify cyberattacks and Malware efficiently. This paper presents a fusion of deep learning based cyberattack detection and classification model for intelligent systems named FDL-CADIS technique. The proposed FDL-CADIS technique transforms the Malware binary files into two-dimensional images, which are then classified by the fusion model. The FDL-CADIS technique employs the binary input images into the MobileNetv2 model for the extraction of features and the hyper parameter tuning process takes place utilizing the black widow optimization technique. The MobileNetv2 model derives all features from the Malware dataset and trains the model using the derived features. Finally, an ensemble of voting based classifiers, including gated recurrent unit and long short-term memory techniques, for Malware cyberattack detection and classification was developed. A comprehensive range of experimental analysis is performed against the benchmark dataset to demonstrate the FDL-CADIS technique’s promising performance. According to the comparative analysis of the results, the FDL-CADIS technique outperformed current approaches.

Article
The agricultural crop productivity can be affected and reduced due to many factors such as weeds, pests, and diseases. Traditional methods that are based on terrestrial engines, devices, and farmers' naked eyes are facing many limitations in terms of accuracy and the required time to cover large fields. Currently, precision agriculture that is based on the use of deep learning algorithms and Unmanned Aerial Vehicles (UAVs) provides an effective solution to achieve agriculture applications, including plant disease identification and treatment. In the last few years, plant disease monitoring using UAV platforms is one of the most important agriculture applications that have gained increasing interest by researchers. Accurate detection and treatment of plant diseases at early stages is crucial to improving agricultural production. To this end, in this review, we analyze the recent advances in the use of computer vision techniques that are based on deep learning algorithms and UAV technologies to identify and treat crop diseases.

Article
Software-defined networks (SDN) have gained a lot of attention in recent years as a technique to develop smart systems with a help of the Internet of Things (IoT). Its powerful and centralized architecture makes a balanced contribution to the management of sustainable applications through efficient processes. These networks also systematically keep track of mobile devices and decrease the extra overheads in the communication cost. Many solutions are proposed to cope with data transferring for the critical system, however, mobile devices, on the other hand, require long-distance communication links with minimal retransmissions. Furthermore, the mobile network is highly infected by security attacks and compromised the IoT architecture for both the intermediate layers and end-users. Therefore, this paper presents an adaptive routes migration model for sustainable applications with the collaboration of SDN architecture and limits the disconnectivity time in data transporting along with efficient management of network services. Moreover, its centralized controller fetches the updated information from low-level smart devices and supervised their monitoring efficiently. The proposed model also secures the cloud of things (CoTs) from network threats and protects private data. It provides three levels of security algorithms and supports adaptive computing systems. The proposed model was tested using simulations, and the findings showed that it outperformed other existing studies in terms of packet delivery ratio by 13%, packet loss rate by 15%, transmission error by 22%, computing cost by 17%, and latency by 18%.

Article
To improve the accuracy of malware detection on the Internet of Battlefield Things (IoBTs), a class of malware detection techniques transforms the benign and malware files into control flow graph (CFG) for better detection of malwares. In the construction process of CFG, the binary code of a file is transformed into opcodes using disassemblers. Probability CFGs are generated where vertices represent the opcodes and the edges between the opcodes represent the probability of occurrence of those opcodes in the file. Probability CFGs are fed to the deep learning model for further training and testing. The accuracy of deep learning model depends on the probability of CFGs. If the graph generation techniques reflectorize the binary file more accurately, then the result of the deep learning malware detection model is likely to be more accurate. In this research, we identify the limitations of the existing probability CFG techniques, propose a new probability CFG generation technique which is the combination of crisp and heuristic approaches called HeuCrip, and compare the proposed technique with the existing state-of-the-art schemes. The experimental results show that the HeuCrip achieved 99.93% accuracy, and show significant improvement in performance as compared to the existing state-of-the-art schemes.

Article
The evolution of parallel architectures points to dynamic environments where the number of available resources or configurations may vary during the execution of applications. This can be easily observed in grids and clouds, but can also be explored in clusters and multiprocessor architectures. Over more than two decades, several research initiatives have explored this characteristic by parallel applications, enabling the development of adaptive applications that can reconfigure the number of processes/threads and their allocation to processors to cope with varying workloads and changes in the availability of resources in the system. Despite the long history of development of solutions for adaptability for parallel architectures, there is no literature reviewing these efforts. In this context, the goal of this paper is to present the state of-the-art on adaptability from resource and application perspectives, ranging from shared memory architectures, clusters, and grids, to virtualized resources in cloud and fog computing in the last twenty years (2002-2022). A comprehensive analysis of the leading research initiatives in the field of adaptive parallel applications can provide the reader with an understanding of the essential concepts of development in this area.

Article
Understanding flow traffic patterns in networks, such as the Internet or service provider networks, is crucial to improving their design and building them robustly. However, as networks grow and become more complex, it is increasingly cumbersome and challenging to study how the many flow patterns, sizes and the continually changing source-destination pairs in the network evolve with time. We present Netostat, a visualization-based network analysis tool that uses visual representation and a mathematics framework to study and capture flow patterns, using graph theoretical methods such as clustering, similarity and difference measures. Netostat generates an interactive graph of all traffic patterns in the network, to isolate key elements that can provide insights for traffic engineering. We present results for U.S. and European research networks, ESnet and GEANT, demonstrating network state changes, to identify major flow trends, potential points of failure, and bottlenecks.

Article
To better collect data in context to balance energy consumption, wireless sensor networks (WSN) need to be divided into clusters. The division of clusters makes the network become a hierarchical organizational structure, which plays the role of balancing the network load and prolonging the life cycle of the system. In clustering routing algorithm, the pros and cons of clustering algorithm directly affect the result of cluster division. In this paper, an algorithm for selecting cluster heads based on node distribution density and allocating remaining nodes is proposed for the defects of cluster head random election and uneven clustering in the traditional LEACH protocol clustering algorithm in WSN. Experiments show that the algorithm can realize the rapid selection of cluster heads and division of clusters, which is effective for node clustering and is conducive to equalizing energy consumption.

Article
Cloud computing is a computing service provided on demand through the Internet, which can provide sufficient computing resources for applications such as big data and the Internet of Things. Before using cloud computing resources, users need to lease the corresponding virtual machines in the data centers, and then submit the tasks to the leased virtual machines for execution. In these two stages, how to choose the optimal data center for resource deployment and task submission is particularly important for cloud users. To tackle the problem that it is difficult to select optimal data center to lease and use virtual machines in distributed data centers, we proposed data center selection algorithms based on deep reinforcement learning. In the stage of virtual machine lease, aiming to minimizing user costs, we considered both the cost of computing resources and the cost of network communication. Then, we used a deep enforcement learning algorithm to obtain the shortest communication path between the users and the data centers,and solve the problem of network communication costs which are difficult to calculate. We achieved an optimal selection of data centers and effectively reduce the overall user cost. In the stage of virtual machine use, to improve quality of service for the users, we use a deep reinforcement learning algorithm to obtain an optimal task scheduling strategy. This effectively solved the problem that user tasks cannot be effectively scheduled due to dynamic changes in the type and size of user tasks and in the state of the virtual machines. Moreover, the proposed scheme reduces the overall task completion time.

Article
In the last decades, mobile-based apps have been increasingly used in several application fields for many purposes involving a high number of human activities. Unfortunately, in addition to this, the number of cyber-attacks related to mobile platforms is increasing day-by-day. However, although advances in Artificial Intelligence science have allowed addressing many aspects of the problem, malware classification tasks are still challenging. For this reason, the following paper aims to propose new special features, called permission maps (Perm-Maps), which combine information related to the Android permissions and their corresponding severity levels. Such features have proven to be very effective in classifying different malware families through the usage of a convolutional neural network. Also, the advantages introduced by the Perm-Maps have been enhanced by a training process based on a federated logic. Experimental results show that the proposed approach achieves up to a 3% improvement in average accuracy with respect to J48 trees and Naive Bayes classifier, and up to 16% compared to multi-layer perceptron classifier. Furthermore, the combined use of Perm-Maps and federated logic allows dealing with unbalanced training datasets with low computational efforts.

Article
A continuing trend in many scientific disciplines is the growth in the volume of data collected by scientific instruments and the desire to rapidly and efficiently distribute this data to the scientific community. As both the data volume and number of subscribers grows, a reliable network multicast is a promising approach to alleviate the demand for the bandwidth needed to support efficient data distribution to multiple, geographically-distributed, research communities. In prior work, we identified the need for a reliable network multicast: scientists engaged in atmospheric research subscribing to meteorological file-streams. An application called Local Data Manager (LDM) is used to disseminate meteorological data to hundreds of subscribers. This paper presents a high-performance, reliable network multicast solution, Dynamic Reliable File-Stream Multicast Service (DRFSM), and describes a trial deployment comprising eight university campuses connected via Research-and-Education Networks (RENs) and Internet2 and a DRFSM-enabled LDM (LDM7). Using this deployment, we evaluated the DRFSM architecture, which uses network multicast with a reliable transport protocol, and leverages Layer-2 (L2) multipoint Virtual LAN (VLAN/MPLS). A performance monitoring system was developed to collect the real-time performance of LDM7. The measurements showed that our proof-of-concept prototype worked significantly better than the current production LDM (LDM6) in two ways. First, LDM7 distributes data faster than LDM6. With six subscribers and a 100 Mbps bandwidth limit setting, an almost 22-fold improvement in delivery time was observed with LDM7. Second, LDM7 significantly reduces the bandwidth requirement needed to deliver data to subscribers. LDM7 needed 90% less bandwidth than LDM6 to achieve a 20 Mbps average throughput across four subscribers.

Article
The advent of inexpensive data storage has resulted in larger and larger datasets as the cost of pruning data becomes more expensive then storing it for future insights. This decreasing cost of storage has also led to the practice of storing data in multiple locations for redundancy. However, without any uniform method of determining link costs to different storage sites, a dataset is not always retrieved from the most cost effective site. Distributed dataset DNS, or DDD, solves this problem in two key ways. The first allows “local” servers to provide meaningful information to a user in order to ensure that they target the location that offers the most advantageous network connection. The second allows other trusted servers to easily gain access to this information in a distributed way. These combined approaches aim to both lower aggregate network bandwidth usage and prevent single points of failure when retrieving dataset pointers.

Article
The device-to-device D-2-D Communication empowered Cloud Radio Access Network (CRAN) which is examined to be auspicious system model, gives energy efficiency and high data rate. In this research study, we formulate a mode selection (MS), and user-association (UA) problem in device-to-device empower Single Carrier Frequency Division Multiple Access (SCFDMA) based CRAN in the forthcoming link. Combined-integer non-linear problem is not possible to solve in its current state. To address the problem, an iterative algorithm is proposed which solves this problem in two stages; Mode Selection Stage and User-Association Stage. D2D communication can be made possible using single carrier frequency division multiple access-based C-RAN. Cloud radio access network is a revolution for cellular networks. For next round selection, link-based algorithm is presented while for user-association, an iterative algorithm is presented. Different cell radius for base algorithms, optimal technique and proposed solution are compared in simulation. The efficiency of the proposed algorithm is verified by simulation results and have efficient outcome among other algorithms.

Article
Write-optimized data structures (WODS), offer the potential to keep up with cyberstream event rates and give sub-second query response for key items like IP addresses. These data structures organize logs as the events are observed. To work in a real-world environment and not fill up the disk, WODS must efficiently expire older events. As the basis for our research into organizing security monitoring data, we implemented a tool, called Diventi, to index IP addresses in connection logs using RocksDB (a write-optimized LSM tree). We extended Diventi to automatically expire data as part of the data structures’ normal operations. We guarantee that Diventi always tracks the N most recent events and tracks no more than N+k events for a parameter k

Article
There are many organizations interested in sharing data with others, and they can do this only if a multi-domain secure platform is available. Such platforms, often referred to as Digital Data Marketplaces (DDMs), require that all the transactions follow the pre-defined policies that are established by the participating parties i.e, domains. However, building a multi-domain network infrastructure in which each domain can manage its own connectivity while at the same time all of the transactions follow the sharing agreements is still a challenge. In this paper, we introduce a multi-domain containerized DDM that is built upon a P4-based network. It can handle the communication of multiple domains and guarantee that the operation of transactions is based on the pre-defined policies. We also studied the setup performance by defining a model which we demonstrated follows the real measurements, and we can use for decision making. The results also show the low overhead of using P4 switch in network setup time. In addition, we conducted a security evaluation which showed that our P4-based network setup is secure against most types of attacks.

Article
Most traditional Public-Key Encryption with keyword Search (PEKS) schemes are suffering a tremendous threat occasioned by the burgeoning of quantum computing since these schemes are derived from the bilinear pairing. For the sake of preserving the security of data outsourced by the Industrial Internet of Things (IIoT), a novel efficient PEKS scheme based on lattice assumption is proffered, and it can achieve security against quantum computing attacks. Also, it supports both multi-user and conjunctive keyword search. Besides, we adopt broadcast encryption technology to address the enormous storage cost of keywords ciphertext in a multi-user setting. Our scheme only needs to generate one ciphertext for all data users, thus significantly reducing the storage cost. Lastly, its performance is analyzed theoretically and experimentally. Experimental simulation results demonstrated the superiority of our algorithms in multi-user and multi-keyword scenarios. The scenario of 100 keywords in keyword encryption for 10–100 users always costs about 0.0204 s, and the storage cost keeps at 81.7 KB–84.7 KB.

Article
The article presents design and methodology of a novel benchmark suite named IMB-ASYNC. The presented suite and method are aimed at measuring and comparing practical communication-computation overlap levels for Message Passing Interface standard (MPI) implementations with a special accent to some applicable use cases. Some typical MPI communication patterns implying communication-computation overlap are analyzed, and their reflection on a benchmark structure is proposed. We also analyze the previous works on overlap benchmarking and their best practices. We present a new benchmarking approach for non-blocking neighborhood collectives overlap and clarify the overlap estimation methodology. After a short overview of some technical details of currently available MPI asynchronous progress implementations, two benchmarking case studies are presented to illustrate the relevance of the methodology.

Article
The rapid deployment of the Internet of Things (IoT) devices have led to the development of innovative information services, unavailable a few years ago. To provide these services, IoT devices connect and communicate using networks like Bluetooth, Wi-Fi, and Ethernet. This full-stack connection of the IoT devices has introduced a grand security challenge. This paper presents an IoT security framework to protect smart infrastructures from cyber attacks. This IoT security framework is applied to Bluetooth protocol and IoT sensors networks. For the Bluetooth protocol, the intrusion detection system (IDS) uses n-grams to extract temporal and spatial features of Bluetooth communication. The Bluetooth IDS has a precision of 99.6% and a recall of 99.6% using classification technique like Ripper algorithm and Decision Tree (C4.5). We also used AdaBoost, support vector machine (SVM), Naive Bayes, and Bagging algorithm for intrusion detection. The Sensor IDS uses discrete wavelet transform (DWT) to extract spatial and temporal features characteristics of the observed signal. Using the detailed coefficients of Biorthogonal DWT, Daubechies DWT, Coiflets DWT, Discrete Meyer DWT, Reverse Biorthogonal DWT, Symlets DWT, we present the results for detecting attacks with One-Class SVM, Local Outlier Factor, and Elliptic Envelope. The attacks used in our evaluation include Denial of Service Attacks, Impersonation Attacks, Random Signal Attacks, and Replay Attacks on temperature sensors. The One-Class SVM performed the best when compared with the results of other machine learning techniques.

Article
Sensor network infrastructures are widely used in smart cities to monitor and analyze urban traffic flow. Starting from punctual information coming from traffic sensor data, traffic simulation tools are used to create the digital twin” mobility data model that helps local authorities to better understand urban mobility. However, sensors can be faulty and errors in sensor data can be propagated to the traffic simulations, leading to erroneous analysis of the traffic scenarios. Providing real-time anomaly detection for time series data streams is highly valuable since it enables to automatically recognize and discard or repair sensor faults in time-sensitive processes. In this paper, we implement a data cleaning process that detects and classifies traffic anomalies distinguishing between sensor faults and unusual traffic conditions, and removes sensor faults from the input of the traffic simulation model, improving its performance. Experiments conducted on a real scenario for 30 days have demonstrated that anomaly detection coupled with anomaly classification boosts the performance of the traffic model in emulating real urban traffic.

Article
Deep learning-based video anomaly detection methods have drawn significant attention in the past few years due to their superior performance. However, almost all the leading methods for video anomaly detection rely on large-scale training datasets with long training times. As a result, many real-world video analysis tasks are still not applicable for fast deployment. On the other hand, the leading methods cannot provide interpretability due to the uninterpretable feature representations hiding the decision-making process when anomaly detection models are considered as a black box. However, the interpretability for anomaly detection is crucial since the corresponding response to the anomalies in the video is determined by their severity and nature. To tackle these problems, this paper proposes an efficient deep learning framework for video anomaly detection and provides explanations. The proposed framework uses pre-trained deep models to extract high-level concept and context features for training denoising autoencoder (DAE), requiring little training time (i.e., within 10 s on UCSD Pedestrian datasets) while achieving comparable detection performance to the leading methods. Furthermore, this framework presents the first video anomaly detection use of combing autoencoder and SHapley Additive exPlanations (SHAP) for model interpretability. The framework can explain each anomaly detection result in surveillance videos. In the experiments, we evaluate the proposed framework's effectiveness and efficiency while also explaining anomalies behind the autoencoder’s prediction. On the USCD Pedestrian datasets, the DAE achieved 85.9% AUC with a training time of 5 s on the USCD Ped1 and 92.4% AUC with a training time of 2.9 s on the UCSD Ped2.

Article
Estimating software enhancement efforts became a challenging task in software project management. Recent researches focused on identifying the best machine learning algorithms for software maintenance effort estimation. Most of the research publications investigated the use of ensemble learning for improving software effort estimation. Intending to increase the estimation accuracy over individual models, this paper investigates the use of the stacking ensemble method for estimating the enhancement maintenance effort (EME) of software projects. This paper makes a comparison between two machine learning-based approaches for estimating software EME: The M5P (as an individual model) and the stacking as an ensemble method combining different regression models (GBRegr, LinearSVR, and RFR) using the ISBSG dataset. A correlation-based feature selection (CFS) algorithm is basically used to achieve efficient data reduction. The selected ML techniques-based approaches were trained and tested on a dataset with relevant features leading to the improvement of estimate accuracy. Results show that the software EME estimation using CFS and stacking ensemble method is improved in terms of mean absolute error (MAE) = 0.0383 and root mean square error (RMSE) = 0.1973.

Article

Article
Fault injection attacks (FIA), which cause information leakage by injecting intentional faults into the data or operations of devices, are one of the most powerful methods compromising the security of confidential data stored on these devices. Previous studies related to FIA report that attackers can skip instructions running on many devices through many means of fault injection. Most existing anti-FIA countermeasures on software are designed to secure against instruction skip (IS). On the other hand, recent studies report that attackers can use laser fault injection to manipulate instructions running on devices as they want. Although the previous studies have shown that instruction manipulation (IM) could attack the existing countermeasures against IS, no effective countermeasures against IM have been proposed. This paper is the first work tackling this problem, aiming to construct software-based countermeasures against IM faults. Evaluating program vulnerabilities to IM faults is required to consider countermeasures against IM faults. We propose three IM simulation environments for that aim and compare them to reveal their performance difference. GDB (GNU debugger)-based simulator that we newly propose in this paper outperforms the QEMU-based simulator that we presented in AICCSA:1–8, 2020 in advance, in terms of evaluation time at most $$\times$$ × 400 faster. Evaluating a target program using the proposed IM simulators reveals that the IM faults leading to attack successes are classified into four classes. We propose secure coding techniques as countermeasures against IMs of each four classes and show the effectiveness of the countermeasures using the IM simulators.

Article
Computational science depends on complex, data intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist’s workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfiguration, end-to-end infrastructure that can increase scientific productivity, but applications are not taking advantage of them. In our previous work, we introduced DyNamo, that enabled CASA scientists to improve the efficiency of their operations and effortlessly leverage capabilities of the cloud resources available to them that previously remained underutilized. However, the provided workflow automation did not satisfy all the operational requirements of CASA. Custom scripts were still in production to manage workflow triggering, while multiple layer 2 connections would have to be allocated to maintain network QoS requirements. To address these issues, we enhance the DyNamo system with advanced network manipulation mechanisms, end-to-end infrastructure monitoring and ensemble workflow management capabilities. DyNamo’s Virtual Software Defined Exchange (vSDX) capabilities have been extended, enabling link adaptation, flow prioritization and traffic control between endpoints. These new features allow us to enforce network QoS requirements for each workflow ensemble and can lead to more fair network sharing. Additionally, to accommodate CASA’s operational needs we have extended the newly integrated Pegasus Ensemble Manager with event based triggering functionality, that improves managing CASA’s workflow ensembles. The Pegasus Ensemble Manager, apart from managing the workflow ensembles can also create conditions for a more fair resource usage, by employing throttling techniques to reduce compute and network resource contention. We evaluate the effects of the DyNamo’s vSDX policies by using two CASA workflow ensembles competing for network resources, and we show that traffic shaping of the ensembles can lead to a fairer sharing of the network links. Finally, we study how changing the Pegasus Ensemble Manager’s throttling for each of the two workflow ensembles affects their performance while they compete for the same network resources, and we assess if this approach is a viable alternative compared to the vSDX policies.

Article
Due to the complexities of indoor WiFi signal propagations, it is challenging to improve the performance of indoor fingerprint-based positioning techniques which is the main hot research in Internet of Things. Most existing methods have limited positioning accuracy, since they do not take the full advantage of the information available, i.e. timing information attached to the Received Signal Strength Indicator (RSSI) vector, and adopt the inappropriate training methods. This paper proposes an indoor localization method based on Convolutional Neural Network (CNN) by using time-series RSSI, termed CTSLoc, by taking into account the correlation among RSSI in time and space. A CNN model is used to extract the temporal fluctuation patterns of RSSI and learn the nonlinear mappings from the signal features with time and space to position coordinates. Finally the trained model is used to predict the user’s location. An extensive experiment has been carried out in a space with the size of nearly 1000 squared meters, and a comprehensive comparison with several existing methods indicates that CTSLoc attains a lower average localization error (i.e. 4.23 m) and more stable performance than those methods. The CTSLoc method performs relatively less dependent on the amount of data which also eliminates spatial ambiguity and reduces the effect of noise on localization.

Article
The Coronavirus pandemic and the work-from-anywhere has created a shift toward cloud-based services. The pandemic is causing an explosion in cloud migration, expected that by 2025, 95% of workloads will live in the cloud. One of the challenges of the cloud is data security. It is the responsibility of cloud service providers to protect user data from unauthorized access. Historically, a third-party auditor (TPA) is used to provide security services over the cloud. With the tremendous growth of demand for cloud-based services, regulatory requirements, there is a need for a semi to fully automated self sovereign identity (SSI) implementation to reduce cost. It’s critical to manage cloud data strategically and extend the required protection. At each stage of the data migration process, such as data discovery, classification, and cataloguing of the access to the mission-critical data, need to be secured. Cloud storage services are centralized, which requires users must place trust in a TPA. With the SSI, this can become decentralized, reducing the dependency and cost. Our current work involves replacing TPA with SSI. A cryptographic technique for secure data migration to and from the cloud using SSI implemented. SSI facilitate peer-to-peer transactions, meaning that the in-between presence of TPA needs no longer be involved. The C2C migration performance is recorded and found the background or foreground replication scenario is achievable. Mathematically computed encrypted and decrypted ASCII values for a word matched with the output by the algorithm. The keys generated by the algorithm are validated with an online validator to ensure the correctness of the generated keys. RSA based mutual TLS algorithm is a good option for SSI based C2C migration. SSI is beneficial because of the low maintenance cost, and users are more and more using a cloud platform. The result of the implemented algorithm shows that the SSI based implementation can provide a 13.32 Kbps encryption/decryption rate which is significantly higher than the TPA method of 1 Kbps.

Article
This paper introduces and tests a novel machine learning approach to detect Android malware. The proposed approach is composed of Support Vector Machine (SVM) classifier and Harris Hawks Optimization (HHO) algorithm. More specifically, the role of HHO algorithm is to optimize SVM classifier hyperparameters while the SVM performs the classification of malware based on the best-chosen model, as well as producing the optimal solution for weighting the features. The effectiveness of the proposed approach and the ability to increase detection performance are demonstrated by scientific testing using CICMalAnal2017 sampled datasets. We test our method and its robustness on five sampled datasets and achieved the best results in most datasets and measures when compared with other approaches. We also illustrate the ability of the proposed approach to measure the significance of each feature. In addition, we provide deep analysis of possible relationships between weighted features and the type of malware attack. The results show that the proposed approach outperforms the other metaheuristic algorithms and state-of-art classifiers.

Article
Data sharing is required for research collaborations, but effective data transfer performance continues to be difficult to achieve. The NetSage Measurement and Analysis Framework can assist in understanding research data movement. It collects a broad set of monitoring data and builds performance Dashboards to visualize the data. Each Dashboard is specifically designed to address a well-defined analysis need of the stakeholders. This paper describes the design methodology, the resulting architecture, the development approach and lessons learned, and a set of discoveries that NetSage Dashboards made possible.

Article
Recently the IoT technology is widely used in the field of smart cities, smart banking, and smart transportation system, etc. Various sensors can be installed in an open environment which allows users to collect the information in an easy way using the internet of things. However, due to the open environment, it’s really difficult to provide security to the communicated information. In this paper, a secure authentication technique is presented for connecting different IoT devices in the smart city infrastructure. This technique shows the legitimacy of IoT sensors (IOi) to the reader (Ri) and Authentication Entity. In this paper, a set of primitives and the cubic equation are used to propose a secure authentication technique. In the proposed technique, an authentication entity is introduced between the IoT sensor and the receiver. The authentication entity will be responsible for performing the authentication process. The proposed authentication technique includes a strong mathematical procedure, which is very complex. The computational cost and Energy cost have also been analyzed and compared with previous techniques. It is providing a secure authentication process for the different IoT sensors, which is useful to start a secure communication with the receiver. The proposed technique offers the best computation and energy cost in comparison with the previous techniques.

Article
In this paper, we investigate intelligent ubiquitous computing for future unmanned aerial vehicle (UAV)-enabled mobile edge computing network (MEC) systems, where multiple users process some computational tasks with the help of one computational access point (CAP), under the jamming from a UAV attack. Taking into account that the system may operate in a dynamic varying scenario, we optimize the system performance by using the reinforcement learning and transfer learning algorithms in order to reduce the latency and energy consumption. Specifically, we firstly use the reinforcement learning to devise the offloading strategy that meets the latency and energy consumption constraints as well as to alleviate the effect caused by jamming attack. We then propose to use the transfer learning to speed up the training process and improve the performance of reinforcement learning. Simulation results are provided to reveal that the proposed offloading strategy can outperform the conventional ones, and using transfer learning can achieve a better system performance while reducing the training time significantly.

Article
Supporting transfers of science big data over Wide Area Networks (WANs) with Data Transfer Nodes (DTNs) requires optimizing multiple parameters within the underlying infrastructure. New solutions for such data movement require new paradigms and technologies, such as NVMe over Fabrics, which provides high-performance data movement with direct remote NVMe device access over traditional fabrics. However, recent NVMe over Fabrics studies have been limited to local storage fabrics. To support increasing demands for the large volume of science data movement during Supercomputing (SC) conferences, we proposed a SCinet DTN-as-a-Service framework orchestrating the desired optimization to meet users, applications, and providers’ requirements. Furthermore, we extend the SCinet DTN-as-a-Service framework to incorporate new techniques, solve optimization issues in data-intensive science and evaluate NVMe over Fabrics with multiple WAN testbeds to examine its performance and discover new opportunities for optimization.

Article