Annual Size of the Global Datasphere (7).

Annual Size of the Global Datasphere (7).

Source publication
Conference Paper
Full-text available
The rapid pace of technological progress has led to increased growth in the volume of digital data on servers and circulating on the web, which contributed to the birth of the concept of Big Data. This concept, in addition to the huge amount of information, indicates the heterogeneity and complexity of the data. Therefore, analyzing this data, espe...

Context in source publication

Context 1
... we are witnessing a remarkable emergence of the exceptional technological development of digitization through the Internet of Things (4), cloud computing technology (5), and others. This is an important factor in facilitating online transactions, which contributed to the endless expansion of data on the World Wide Web, within its spread on large-sized Internet pages, according to the website (6), its capacity is estimated to be 80 ZB in 2022, and it is expected According to IDC (7), this big data will grow to 175 ZB by 2025 as shown in Figures 1 and 2. ...

Citations

... The author of [17] correctly forecasted the election's winner by analyzing Twitter user reactions. An innovative method for foreseeing the number of comments posted on political blogs was identified [18] [19]. The Internet is one of the essential venues for evaluating the sentiment of the general populace in today's society, and it's one of the most important venues, period. ...
Article
Full-text available
The main purpose of Sentiment Analysis (SA) is to derive useful insights from large amounts of unstructured data compiled from various sources. This analysis helps to interpret and classify textual data using different techniques applied in machine learning (ML) models. In this paper, we compared simple and ensemble ML methods as classifiers for SA: Random Forest (RF), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN), Gradient Boosting (GB), Support Vector Machine (SVM), AdaBoost, Extreme Gradient Boosting (XGBoost), Decision Tree (DT), Light GBM, Stochastic Gradient Descent (SGD) and Bagging. For this, we considered a test set database of 50,000 movie reviews, of which 25,000 were rated positive and 25,000 negatives. We have chosen 20,000 words that have an impact on the feelings of the documents. This work aims to propose a new rating prediction approach based on a textual customer review. We consider term frequency (TF) characteristics and term frequency-inverse document frequency (TF-IDF) from the large-scale and serial trials to compare the results obtained by various classifiers using feature extraction techniques. For the decision phase, we applied the Fuzzy Decision by Opinion Score Method (FDOSM), one of the most recent methods for multi-criteria decision-making (MCDM). To evaluate and quantify the performance of the different ML methods we considered, we apply six standard measures namely precision, accuracy, recall, F-score, AUC, and Kappa-measure. The results we obtained, at the end of the experimental work that we conducted, indicated that the SVM classier is the best with 88,333% as a precision rate followed by the FDOSM method, with 0.800 for the same measurement.
... The task of understanding the implications of big data [10] is difficult for humans. In this context, many research studies have used machine learning and big data analysis frameworks [11,12,13] for the process of classifying data into different poles [14]. ...
Article
Full-text available
Nowadays, cloud computing technology plays an important role in the process of storing structured and unstructured data. This has contributed to very significant growth of data on web servers known as Big Data. This technology is adopted by many applications, the most important of which are social networking applications, emails, and others, which represent an important source of data in the process of communication between Internet users. Therefore, these statements represent views and opinions on various topics that affect all areas, related to companies, and public and private institutions. To achieve this goal, several methods have been proposed. Recently, big data frameworks have been used as a data analysis tool due to their high performance in storing high-volume data and then analyzing it to extract predictions for various domains. This article introduces a new approach in the latency reduction process for big data classification based on Hadoop through the MapReduce model, where we designed another part of the MapReduce model containing data classification using features. Our experiments showed that the results we obtained provided the best performance in our approach, which obtained the fastest response time compared to machine learning and deep learning techniques. We also recorded after the classification process a 35.14% decrease in data, which reflects the researchers' help with the accuracy of data processing and then classification at other stages.