Conference PaperPDF Available

Walmart's Sales Data Analysis - A Big Data Analytics Perspective

Authors:

Abstract and Figures

Information technology in this 21st century is reaching the skies with large-scale of data to be processed and studied to make sense of data where the traditional approach is no more effective. Now, retailers need a 360-degree view of their consumers, without which, they can miss competitive edge of the market. Retailers have to create effective promotions and offers to meet its sales and marketing goals, otherwise they will forgo the major opportunities that the current market offers. Many times it is hard for the retailers to comprehend the market condition since their retail stores are at various geographical locations. Big Data application enables these retail organizations to use prior year’s data to better forecast and predict the coming year’s sales. It also enables retailers with valuable and analytical insights, especially determining customers with desired products at desired time in a particular store at different geographical locations. In this paper, we analysed the data sets of world’s largest retailers, Walmart Store to determine the business drivers and predict which departments are affected by the different scenarios (such as temperature, fuel price and holidays) and their impact on sales at stores’ of different locations. We have made use of Scala and Python API of the Spark framework to gain new insights into the consumer behaviours and comprehend Walmart’s marketing efforts and their data-driven strategies through visual representation of the analyzed data.
Content may be subject to copyright.
Walmart’s Sales Data Analysis- A Big Data
Analytics Perspective
Manpreet Singh∗§, Bhawick Ghutla, Reuben Lilo Jnr, Aesaan F S Mohammedand Mahmood A Rashid†‡§
National Training and Productivity Centre, Fiji National University, Samabula, Suva, Fiji
School of Computing, Information and Mathematical Sciences, The University of the South Pacific, Suva, Fiji
Institute for Integrated and Intelligent Systems, Griffith University, QLD, Australia
§Corresponding Authors: manpreet.singh@fnu.ac.fj OR mahmood.rashid@usp.ac.fj
Abstract—Information technology in this 21st century is reaching
the skies with large-scale of data to be processed and studied
to make sense of data where the traditional approach is no
more effective. Now, retailers need a 360-degree view of their
consumers, without which, they can miss competitive edge of the
market. Retailers have to create effective promotions and offers
to meet its sales and marketing goals, otherwise they will forgo
the major opportunities that the current market offers. Many
times it is hard for the retailers to comprehend the market
condition since their retail stores are at various geographical
locations. Big Data application enables these retail organizations
to use prior year’s data to better forecast and predict the coming
year’s sales. It also enables retailers with valuable and analytical
insights, especially determining customers with desired products
at desired time in a particular store at different geographical
locations. In this paper, we analysed the data sets of world’s
largest retailers, Walmart Store to determine the business drivers
and predict which departments are affected by the different
scenarios (such as temperature, fuel price and holidays) and
their impact on sales at stores’ of different locations. We have
made use of Scala and Python API of the Spark framework to
gain new insights into the consumer behaviours and comprehend
Walmart’s marketing efforts and their data-driven strategies
through visual representation of the analysed data.
KeywordsBig Data Analytics; Hadoop Distributed File Systems;
Apache Spark; MapReduce
I. INTRODUCTION
We all are constantly thinking about the future and what
is expected to happen in the coming weeks, months and
even years, and to be able to do so, a look at the past is
mandatory. Business needs to be able to see their progress
and the factors affecting their sales [1]. In this technological
era of large scale data, businesses need to rethink on the
modern approaches to better understand the customers to gain a
competitive edge in the market. Data is worthless if it cannot
be analysed, interpreted and applied in context [2]. In this
work, we have used the Walmart’s sales data to create business
value by understanding customer intent (sentiment analysis)
and business analytics. A picture speaks a thousand words
and business analytics would help paint a picture through
visualization of data to give the retailers insights on their
business. With these insights the businesses can make relevant
changes to their strategy for the future to maximize profits and
success. Most of the raw data, particularly large scale datasets
do not offer value in its unprocessed state. By applying the
right set of tools [3], we can pull powerful insights from this
stockpile of bits.
The main focus here is to read and analyse the Walmart’s avail-
able datasets to produce insights and the company’s overall
overview. The retail stores sell products and gain profit from it.
There are a lot of subsidiaries of the stores network which are
scattered on various geographical locations. As the network of
stores is huge and located at different geographical locations,
the company would not fully understand the customer needs
and market potentials at these various locations. In this work,
we used the gathered store sales datasets of Walmart to
understand the factors affecting the sales for example, the un-
employment rate, fuel prices, temperature and holidays in the
different stores located at different geographical locations so
that the resources can be managed wisely to maximize on the
returns. These insights can help retailers comprehend market
conditions of the various factors affecting sales for example
Easter holiday would induce a spike in sales and retailers
can better allocate resources (supply of goods and human
resources). Thus, customer demands are observed accordingly
based on the above factors.
Moreover, the big data application enables retailers to use
historical dataset to better observe the supply chain, then a
clear picture can be obtained about a particular store whether
they are making profit or are under loss. When data is properly
analysed, we will start to see the patterns, insights and the big
picture of the company. Then the required suitable actions can
be applied accordingly. This will help optimize operations and
maximize sales and profit. Additionally, these datasets are used
to predict/forecast future sales for the coming weeks so that
the retailers have a fair picture of what the company’s future
will be like and it can act as a warning for the company if it
is going downhill with its return on investments [4].
Apache data science platforms, libraries, and tools are used
in this work by testing and implementing the software devel-
opment tools and environments dealing with Big Data tech-
nology. Tools like Hadoop Distributed File Systems (HDFS)
[5], Hadoop MapReduce framework [6] and Apache Spark
along with Scala, Java and Python high-level programming
environments are used to analyse and visualize the data.
II. RE LATE D WORK
In 2015, Harsoor & Patil [4] worked on forecasting Sales of
Walmart Store using big data applications: Hadoop, MapRe-
duce and Hive so that resources are managed efficiently. This
paper used the same sales data set that we utilized for analysis,
however they forecasted the sales for up coming 39 weeks.
Their strategy included the collection of huge Sales data and
transferred on HDFS [5] and performed Map Reduce which
later due to enormous data size, proved difficult to draw con-
clusion. Thus Hive processing was done to calculate average
sales feature for all 45 stores and 99 departments. Machine
learning algorithm, R programming was used for statistic
computing. Henceforth, Holt Winters [4] was used for training
dataset provided by Walmart and then sales prediction was
done. Subsequently the predicted sales were given graphical
representation using Tableau interactive data visualization.
In 2013, Katal, Wazid, & Goudar [7] performed thorough
studies about handling a Big Data; their issues, challenges,
various tools and good practices. Technical challenges like
scalability, fault tolerance, data quality and heterogeneous data
processing was also mentioned. They have proposed Parallel
Programming Model like Distributed file system, MapReduce
[6] and Spark as a good tool for Big Data [7].
In 2015, Riyaz& Surekha [8] worked on MapReduce on
Hadoop to build a data analytical engine for weather, tempera-
ture analysis for National Climate Data Centre. This paper had
all the details and results about MapReduce program execution.
Their findings concluded that MapReduce with Hadoop [6]
is good for weather data analysis and temperature can be
analysed efficiently which at the end is important for a lot
of industries [8].
In 2017, Chouksey & Chauhan [9]performed weather forecast
using MapReduce and Spark in order to formulate earlier
weather warnings so that people and businesses are prepared
for undesirable weather condition. Weather has greater influ-
ence in agriculture sector, sporting, tourism and government
planning. Various weather sensors/parameters like wind speed,
temperature, humidity, pressure, and other factors was anal-
ysed with the technology benchmark comparison for Hadoop
MapReduce and Spark. Eventually the performance of Spark
for weather analytics is proven to be better in results.
In 2013, Zaslavsky, Perera, & Georgakopoulos [10] rec-
ommended the use of Hadoop, Apache Spark and NoSQL
Technology to process billions of sensing devices data. They
explained Sensing as a service and big data; where storage as
well as processing of this huge data is becoming a challenge.
This sensing devices are connected to computer networks and
thus generates enormous data on daily basis.
In 2016, Sharma, Chauhan, & Kishore, [11] performed com-
parative study between Hadoop MapReduce and Spark. The
paper enclosed chart comparison between these two tools;
advantages and disadvantages in big data analysis context.
Through this comparative study, they concluded that Spark
is much better [12], [13] than MapReduce; however, it also
depended on the area of analysis [11].
In 2017, Inoublia, Aridhib, Meznic, & Jungd [14] worked on
experimental evaluation and a comparative study of Healthcare
scientific applications which decided health status using inter-
connected sensors over the human body. This included breath,
insulin, cardiovascular, glucose, blood and body temperature.
They recommended Spark because processing stream of health
data, sending and processing iteratively cannot be handled or
supported by MapReduce model.
In [7], [9]–[14], the authors have recommended Apache Spark
as a better option in terms of faster and having a very intel-
ligent way of processing data in-memory (memory caching),
rather than reading it back and again from the disk all the time.
III. BACKGRO UN D
Retailers plan to insure success or maximum profit by learning
about the factors that affects their sales and in what measure.
Big organizations and retailers around the world, such as the
one this work is based on, Walmart Stores, Inc., try to max-
imize the profit by providing maximum customer satisfaction
in all geographical locations to maintain the standards of the
stores.
Walmart sales data is considered for this work since most of the
challenges faced by the company is universal or that all other
big retailers are facing similar problems that is to maintain,
manage and organize their retail shops data in a way that it
provides useful insights on the company as an overall retailer,
individual shops or only for the departments in the shops itself.
The retailers have to overcome a lot of similar challenges to
stay on top of a competitive market [15].
Retailers have to manage resources wisely to maximize the
profit while at the same time minimizing the cost. Retailers
fail to gauge market potential at the right time. When there is
a sudden spike in sales and the retailers are caught off-guard
there might not be enough stock or enough staff to meet the
customer needs thus losing potential sales.
With insights to the causes of the spike in sales and the factors
affecting it, the retailers can make better resource allocation
like getting more employees to the store with more customers
or transfer more stock to that store.
Planning of the store can be smarter, providing better human
resource management, better supply management [16]. By
observing to past helps get an idea of sales in stores and its
separate departments and predictions for the future sales can
be made. These predictions will be used as a guideline or to
mark a trajectory for the future and it will allow the retailers to
make relevant changes to the objective of the stores for better
success in the future.
A. Problem studied
Retailer’s first priority is usually to understand their customers
to be able to satisfy their needs so that these customers
will return to the store for future needs, thus increasing the
product demands and adding to the business value. These
businesses want this information to plan where and when to
invest profitably.
B. Tools and techniques applied
The tools and techniques used for this work includes the
collection of Huge Walmart sales datasets stored in CSV
format. We used Apache Spark with a build version of Hadoop
leveraging HDFS [5] as a data storage option. Apache Spark
is a framework capable of handling both batch and stream pro-
cessing on the same application at the same time [7], [9]–[14],
[17]. Our development tools include InteliJ Idea Community
Edition [18] and iPython Notebook [19], [20]. InteliJ Idea was
integrated with Spark instead of using the traditional Spark
shell. After we configured our environment, our first task was
to load the files as spark dataframes. Dataframe is a distributed
collection of data organized into named columns which is
equivalent to tables in RDMS [21]. The spark dataframe API
was designed to make big data processing simple for a wider
audience and also it supports distributed data processing in
general purpose programing languages like Scala, Python and
Java. Spark supports reading files from popular data types like
JSON files, Parquet files, HIVE table, HDFS, cloud storage
(S3) or external RDMS [22]; however, the CSV file formats
are not natively supported. Thus, we used a separate library
instead, called Spark-CSV developed by Databricks [23] to
load the datasets. As the files are stored in dataframes, we
query the data using spark-SQL component. We then apply
MapReduce functions on the datasets using Spark-SQL. After
applying some operations on the data such as grouping, sorting
etc., we save the files to HDFS as CSV. We then use Ipython
Notebook [19], [20] as pySpark shell to read the processed
data for graphing. We use Pandas library [20] to visualize the
datasets.
IV. TECHNOLOGY IMPLEMENTATION
Walmart has 45 stores in geographically diverse locations, each
of the store having 99 departments. The dataset of 3 years
contains the weekly sales and the factors affecting sales such as
(Temperature, fuel price, unemployment rate, holiday) for each
store locations. To analyse the dataset and find relationship
between the sales and affecting factors Apache spark and its
various libraries was chosen as shown in Figure 1.
Figure 1: Apache Spark components [13].
With Apache spark all the essentials are coupled in a single
system in various libraries with the need to only call the
libraries needed.
Spark is not the only solution out there. Hadoop MapReduce
is also a good choice but it focuses more on batch processing
as it was designed in a context where size, scope and data
completeness are more important than speed. Spark is 100
times faster [7], [9]–[14], [17] than Hadoop MapReduce, and
is one of the best solution out of the box. The comparisons
are presented in Figure 2.
Spark SQL was used to map and reduce the dataset to a key-
value format to compare. The key is a concatenated value
in the format STORE DATE and the values are the sales,
temperature, fuel price, holiday and unemployment rate.
After getting the analysed data in key and value it is easier
to graph and see relationships between values of the date and
store location using GraphX library provided by Apache Spark
using its python API which will be seen in Figure 4, 5, 6.
Machine learning library is employed with a simple regression
model to predict future sales. The regression model finds
relations between variables to see trends. Predictions can
be more accurate with multiple variable correlation between
temperature, fuel price, holidays, unemployment rate and Store
sales can be used to get more accurate predictions (see Figure
3).
Figure 3: Forecasts of the future sales given by the simple
regression model.
V. RESULTS AND DISCUSSION
The following are the results of our paper:
1) Retailers need to plan and evaluate according to the
market driving factors which are, and not limited
to, the temperature, unemployment rate, fuel prices
holidays, human resources, geographical location and
many more.
2) Effective and efficient supply chain, inventory, human
resource management is needed to avoid losing com-
petitive edge in the market, especially planning sales
at different locations.
3) We analysed largest tycoon retailer, Walmart’s sales
dataset to gain valuable and analytical insights, espe-
cially determining customer behaviours at a desired
time in a particular store at different geographical
locations.
4) There was 45 Walmart stores with different depart-
ment (approximately 99), weekly sales, temperature,
etc. located in different regions dataset1.
5) We have used Big Data Technology: MapReduce with
Hadoop, Apache Spark combined big data fundamen-
tals in high level API’s for Scala, Python and Java
1Dataset can be retrieved from following link [22]:
https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/data
Figure 2: Comparing Spark’s performance with several widely used specialized systems [5].
to analyse this tremendous Weekly Sales dataset and
outline a pattern and meaning to it.
6) Hadoop MapReduce is good for batch processing [4],
[24] whereby intermediate data between Mapper and
Reducer is stored on disk which clearly depicted to
us that there is latency causing slower performance
and result generation.
7) Spark does in-memory computing [7], [9]–[14], [17]
whereby intermediate data is stored in the memory to
avoid latency which is in former, MapReduce. Spark
is mostly used for stream data processing, graphing,
machine learning and iterative computing.
8) Hence, for our experiment Spark had much better
performance and job execution to show pattern and
Sales analysis in terms of market condition like
weather, temperature, fuel pricing, holiday and many
more.
9) Finally, we used Sparks with its python API, (Pandas
– python library for graphing) for graphical visual-
ization.
The analysis achieved from Walmart’s data for the 421571
tuple needed to be visualized for better insights and under-
standing for improved decision making and acquire advantage
in terms of resource allocations.
In figure 4, we have data visualized to comprehend the pattern
of weekly sales for all 45 stores across different locations
observing the years from 2010 to 2012, fuel price and Tem-
perature respectively.
According to Data visualization in Figure 3, we have observed
sales at beginning of all the three years. The first quarter for
each year, i.e. January-March, the Sales is low (decreasing)
for entire Walmart stores at different locations. However, as
we approach second quarter (April – June), Sales intensifies
upwards for 2010, 2011, and 2012. Similarly, in third quarter
(July-September) all the 45 stores all around has declining
Sales values. Eventually, in final quarter (October- December)
we noticed spike in Sales across all 45 stores as we approach
end of the year for 2010, 2011 and 2012.
From the Figure 5, the following observations has been orga-
nized in Table I. Therefore, more sales occur when fuel price is
at reasonable range of $2.90 to $3.80 per liter. From the Figure
6, the following observations has been organized in Table II.
Therefore, more sales occur when temperature is at reasonable
210to 600in Fahrenheit scale which is neither too cold or
too hot, more of normal temperature.
Figure 5: Fuel price effect on all weekly sales: - summarized
information of the figure is outlined in Table I.
Figure 4: Quarterly Sales Graph from year 2010–2012.
Table I: Fuel price effect on all weekly sales.
Fuel ($/Gal) Total Sales
2.5 – 2.8 Sales ranging from $500000 – $3M
2.9 – 3.8 Sales ranging from$500000 – $4M
3.9 – 4.5 Sales ranging from $500000 – $25M
Figure 6: Temperature effect on total weekly sales:- summa-
rized information of the figure 6 is outlined in Table II.
Table II: Temperature effect on total weekly sales.
Temp (0F) Total Sales
0 – 20 Sales ranging from $100000 – $2M
21 – 60 Sales ranging from$100000 – $4M
61 – 100 Sales ranging from $500000 – $3M
VI. CONCLUSION
In conclusion, Wal-Mart is the number one retailer in the USA
and it also operates in many other countries all around the
world and is moving into new countries as years pass by.
There, are other companies who are constantly rising as well
and would give Walmart a tough competition in the future
if Walmart does not stay to the top of their game. In order
to do so, they will need to understand their business trends,
the customer needs and manage the resources wisely. In this
era when the technologies are reaching out to new levels,
Big Data is taking over the traditional method of managing
and analyzing data. These technologies are constantly used to
understand complex datasets in a matter of time with beautiful
visual representations. Through observing the history of the
company’s datasets, clearer ideas on the sales for the previous
years was realized which will be very helpful to the company
on its own. Additionally, seasonality trend and randomness
and future forecasts will help to analyse sale drops which the
companies can avoid by using a more focused and efficient
tactics to minimize the sale drop and maximize the profit and
remain in competition.
REFERENCES
[1] M. Franco-Santos and M. Bourne, “The impact of performance targets
on behaviour: a close look at sales force contexts,Research executive
summaries series, vol. 5, 2009.
[2] D. Silverman, Interpreting Qualitative Data: Methods for Analyzing
Talk, Text and Interaction 3rd Ed. Text and Interaction, Sage
Publications Ltd: Methods for Analyzing Talk, 2006.
[3] UBM. (2003) Big Data analytics: Descriptive vs. predictive vs.
prescriptive. [Accessed 17 September 2017]. [Online]. Available:
www.informationweek.com/about-us/d/d-id/705542
[4] A. S. Harsoor and A. Patil, “Forecast of sales of walmart store using
Big Data application,” International Journal of Research in Engineering
and Technology, vol. 4, p. 6, June 2015.
[5] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. Mccauley, M. J.
Franklin, S. Shenker, and I. Stoica, “Fast and interactive analytics over
Hadoop data with Spark,” Usenix - The Advanced Computing Systems
Association, 2012.
[6] J. Dean and S. Ghemawat, MapReduce: simplified data processing on
large clusters. Association for Computing Machinery, 2008.
[7] A. Katal, M. Wazid, and R. H. Goudar, Big Data: Issues, Challenges,
Tools and Good Practices, 2013.
[8] P. A. Riyaz and V. Surekha, Leveraging MapReduce With Hadoop for
Weather Data Analytics. OSR Journal of Computer Engineering (IOSR,
2015.
[9] P. Chouksey and A. S. Chauhan, A Review of Weather Data Analytics
using Big Data. International Journal of Advanced Research in
Computer and Communication Engineering, 2017.
[10] A. Zaslavsky, C. Perera, and D. Georgakopoulos, “Sensing as a service
and Big Data,” in Proceedings of the International Conference on
Advances in Cloud Computing (ACC), I. Bangalore, Ed., July 2012.
[11] M. Sharma, V. Chauhan, and K. Kishore, “A review: MapReduce and
Spark for Big Data analysis,” in 5th International Conference on Recent
Innovations in Science. 5: Engineering and Management, June 2016.
[12] H. Pandey, “Is Spark really 100 times faster on stream
or its hype?” vol. 2, Sept 2016, [Online]. Available:
[Accessed 2017]. [Online]. Available: www.quora.com/Is-{Spark}
-really- 100-times- faster-on-stream-or-its- hype
[13] T. A. S. Foundation, “Lightning-fast cluster computing,” [Online].
Available: [Accessed Sept, vol. 2017, 2017. [Online]. Available:
{Spark}.apache.org/
[14] W. Inoublia, S. Aridhib, H. Meznic, and A. Jungd, An Experimental
Survey on Big Data Frameworks, 2017.
[15] W. C. Kim and R. A. Mauborgne, “Blue ocean strategy, expanded edi-
tion: How to create uncontested market space and make the competition
irrelevant,” vol. 2015.
[16] D. L ¨
aubli, G. Schl¨
ogl, and P. Sil¨
en, “Mckinsey &
company,[Online]. Available: [Accessed Sept 2017. [On-
line]. Available: www.mckinsey.com/industries/retail/our-insights/
smarter-schedules-better-budgets-how-to-improve-store- operations
[17] J. Ellingwood, “Hadoop, storm, samza, Spark, and flink: Big Data
frameworks compared,[Available Online][Accessed Sept 2017]. [On-
line]. Available: www.digitalocean.com/community/tutorials/{Hadoop}
-storm- samza-{Spark}- and-flink- big-data- frameworks-compared
[18] JetBrains. (2011) Intellij idea, the most intelligent java ide. [Online].
Available: www.resources.jetbrains.com/storage/products/intellij-idea/
docs/Comparisons IntelliJIDEA.pdf
[19] S. Duvvuri and B. Singhal, Spark for Data Science, ser. Analyze your
data and delve deep into the world of machine learning with the latest
Spark version. Packt Publishing Ltd, 2016.
[20] W. Mckinney, “Python for data analysis: Data wrangling with pandas,
numpy, and ipython, o’reilly,” vol. 2012.
[21] A. Spark, “Spark sql and dataframe guide,” [Online]. Available:
[Accessed Sept 2017. [Online]. Available: www.{Spark}.apache.org/
docs/1.5.2/sql-programming- guide.html
[22] R. Xin, M. Armbrust, and D. Liu, “Introducing
dataframes in apache Spark for large scale data sci-
ence,” [Online]. Available: [Accessed August 2017], February
2015. [Online]. Available: https://databricks.com/blog/2015/02/17/
introducing-dataframes- in-{Spark}- for-large-scale- data-science.html
[23] S. Venkataraman, Z. Yang, D. Liu, E. Liang, H. Falaki, X. Meng,
R. Xin, A. Ghodsi, M. Franklin, I. Stoica, and M. Zaharia. (2016)
Sparkr: Scaling r programs with Spark.
[24] V. Nivargi, “Big Data: From batch processing to interactive analysis,”
[Online]. Available: [Accessed Sept 2017], 2013. [Online]. Available:
www.clearstorydata.com/2013/01/evolving big data/
... The problem is that Walmart Stores encounter problems gathering and handling relevant insights using suitable data analytics and business intelligence tools for their operations for consumers satisfaction and stock management; hence, they seek a 360-degree holistic view of their consumers to compete and make a profit [9]. The company employed big data tools such as Hadoop MapReduce, Apache Spark, and other appropriate tools for data analysis and visualization to examine historical data and boost business forecasts for subsequent years to gain invaluable insights and comprehend what consumers want at different contact points [9]. ...
... The problem is that Walmart Stores encounter problems gathering and handling relevant insights using suitable data analytics and business intelligence tools for their operations for consumers satisfaction and stock management; hence, they seek a 360-degree holistic view of their consumers to compete and make a profit [9]. The company employed big data tools such as Hadoop MapReduce, Apache Spark, and other appropriate tools for data analysis and visualization to examine historical data and boost business forecasts for subsequent years to gain invaluable insights and comprehend what consumers want at different contact points [9]. The research question was not specified; however, the methodology quantitatively analyzes a three-year dataset from 45 outlets in various locations containing weekly sales data and other variables like temperature, fuel price, unemployment rate, and holidays that impact sales [9]. ...
... The company employed big data tools such as Hadoop MapReduce, Apache Spark, and other appropriate tools for data analysis and visualization to examine historical data and boost business forecasts for subsequent years to gain invaluable insights and comprehend what consumers want at different contact points [9]. The research question was not specified; however, the methodology quantitatively analyzes a three-year dataset from 45 outlets in various locations containing weekly sales data and other variables like temperature, fuel price, unemployment rate, and holidays that impact sales [9]. In the data analysis, Apache Spark and its diverse libraries served as a tool to analyze the dataset and pinpoint correlations between sales and the influencing factors specified. ...
Article
Full-text available
The present study evaluates Walmart’s existing big data analytics with business intelligence techniques, accentuating their strengths and weaknesses, and suggests improvements for implementation and maintenance through the literature review of the scholarly journals addressing similar topics. Big data analytics is receiving loads of attention globally in the business environment within every sector of the economy. Incorporating the job plan as an additional input component in their models would be beneficial for Walmart to improve the precision and appropriateness of their data analysis and decision-making procedures. Walmart is a company that heavily invests in utilizing big data to improve its operations; this includes optimizing in-store experiences and predicting product trends. Scholarly articles emphasize the importance of advanced data analytics tools like MapReduce and Apache Spark for effective big data strategies. Social media content influences engagement and sentiment. Social network data aids sales forecasting but presents challenges. Big data analytics with business intelligence enhances performance and decision-making. Walmart's success in big data analytics relies on a data-driven culture but faces security challenges. Many Fortune 1000 companies adopt innovative solutions to improve performance and customer experiences but require significant resources. Embracing big data analytics with business intelligence remains a compelling investment for sustaining a competitive edge. Walmart's success in big data analytics is due to a data-driven culture and advanced infrastructure, including the Data Café.
... Singh et al.'s research on Walmart's sales data analysis using Big Data Analytics employs Apache Spark, Scala, and Python, highlighting factors like temperature and holidays, and underscoring the significance of big data insights for retail strategies. However, in [11] limitations include a lack of algorithm details, insufficient optimization information, and limited discussion on scalability and data quality issues. This study in [12] examines Big Data's impact on retail, emphasizing data source, tools, financial outcomes, and security. ...
Article
In the realm of modern business, crafting a data-driven growth plan for retail excellence is a crucial endeavour. This project draws inspiration from the retail giant Walmart, utilizing extensive datasets to unravel intricate interrelationships that significantly influence sales. By applying data analysis techniques, including exploratory data analysis, the aim is to provide practical insights into the impact of factors such as temperature, fuel prices, and holidays on weekly retail store sales. The ultimate goal is to contribute valuable insights to the ongoing conversation about revenue enhancement strategies, with Walmart serving as a benchmark. The project emphasizes the compilation and evaluation of large datasets to decipher complex interrelationships, allowing for the formulation of more efficacious revenue-oriented programs and tactics. By focusing on exploratory data analysis, the analysis sheds light on how various elements impact retail store sales, providing practical insights that contribute to the evolving discourse on revenue in the retail sector
... Singh et al. [9] introduced a MapReduce (MR) to analyze Walmart's sales data. To ascertain the business drivers, use the Walmart Shop. ...
Article
Full-text available
Today, a group of supermarkets requires a consistent ridge of their yearly sales. This primarily results from a need for knowledge, resources, and the capability to estimate sales. Conventional statistical methods for supermarket sales are important and often lead to predictive models. In the age of big data and powerful computers, machine learning is the standard for sales forecasting. This comprehensive literature review examines superstore sales prediction models using ML and DL. This article review focuses on superstore sales prediction using machine learning and deep learning in data mining. Finally, DL is the best SSP for results. DL models market movements well. Automatic feature extraction models and forecasting strategies have been tested with various inputs. DL algorithms process large real-time datasets better. DL research found the best hybrid processing methods for real-time stock market data. DL and ML methods predict the client's response and identify its factors. DL and ML algorithms are evaluated using Rodolfo Saladanha marketing campaign data. Four metrics precision, recall, F-measure, and accuracy compare ML and DL algorithms. MATLAB tested these methods. LSTM, CNN, LR, RF, and LR algorithms were used to compare results to well-known ML and DL algorithms. Artificial Convolutional Neural Network (ACNN) is compared to RF, LR, CNN, and LSTM. The proposed superstore sales prediction algorithm outperformed the others. The proposed model predicted superstore sales with a validation accuracy of 93.90 percent, outperforming current and suitable baselines.
... II. LITERATURE SURVEY [1] 'Walmart's Sales Data Analysis -A Big Data Analytics Perspective' in this study, an inspection of the data collected from a retail store and a prediction of future strategies related to store management is executed. The effect of various sequences of events such as the climatic conditions, holidays, etc. can actually modify the state of different departments so it also studies these effects and examines their influence on sales. ...
Article
In today’s highly competitive environment and ever-changing consumer landscape, accurate and timely forecasting of future revenue, is also known as sales forecasting can offer valuable insight to companies engaged in the manufacture, distribution, or retail of goods. Earlier companies used to produce goods without considering the number of sales and demand. For any manufacturer to determine whether to increase or decrease the production of several units, data regarding the demand for products on the market is required
... For the Walmart Recruiting (i.e., Store Sales data studied in this article), many articles have carried out relevant research on this data, not only limited to the prediction of sales data. Through simple visual data analysis, consumers' consumption behavior can be understood to evaluate sales data [6]. When it comes to the sales forecast in terms of uncertain competition, the M5 forecast uncertainty competition solution is proposed. ...
Article
Full-text available
Sales forecasting is a very important research direction in the business and academic fields, and sales forecasting methods are also in full bloom, such as time series model, machine learning model and deep neural network model. This paper will use three machine learning models: Decision Tree Regressor, Random Forest Regressor, and K Neighbors Regressor to predict Walmart Recruiting - Store Sales data. Using correlation, mean absolute error, and mean square error to evaluate the prediction results of these three models, it is found that the prediction effect of Random Forest Registrar performs the best of these three models. The R2 value between the predicted sales volume of Random Forest Regressor and the sales volume of the test set is 0.937, the average absolute error is 1937.810, and the mean square error is 32993323.634. Therefore, Walmart can use Random Forest Regressor when forecasting the weekly sales of its own stores. At the same time, this paper provides a good model reference value (especially Random Forest Regressor) for other industries when researching the sales forecast, as well as methods for evaluating different model predictions. Overall, these results shed light on guiding further exploration of Sales forecasts for supermarkets.
... After then, when the big data sector was still relatively unknown, Walmart started using big data analysis technology to offer a better user experience. For instance, the Walmart map software HDFS (Hadoop distributed file system) uses Hadoop to track the most recent locations of more than 1000 Walmart stores throughout the world [4]. It can even provide the precise position of an item in a Walmart store. ...
Article
Full-text available
With the advent of the era of big data, social production and lifestyle have undergone tremendous changes. The traditional supply chain management system has high storage costs and poor timeliness. In contrast, the application of big data technology in the supply chain management system will provide customers with more personalized services, so that the supply chain can achieve lean production and lean management, and the supply and demand response is more rapid. This thesis gives examples of three different companies to analyze the big data technology that they are using, which are Walmart, Toyota, and Amazon. After a series of comparative processing, it can be found that big data technology promotes production efficiency and plays an increasingly important role in the process of enterprise management. Although in the early stage of big data technology, enterprises will experience many unknown difficulties, such as loss of confidence, and the tools for processing data are not efficient enough. However, by using the data center constructed by big data technology, it can better explore the hidden value of various data and provide a stable and efficient platform for enterprise development. These results shed light on guiding further exploration of implementation of bigdata analysis into supply chain management.
Chapter
This paper developed a prediction model that will forecast product sales at a particular shop using numerous datasets. This study is able to get findings with a required degree of accuracy using the method employed to create a comprehensive model. Additionally, this information can be used to take decisions to additionally foster arrangements. This paper proposed many issues to predict sales of Big Mart and how to predict by employing five different types of regression algorithms to forecast it, including XGBoost regressor, linear regression, decision trees, random forest regressor, and grid search CV. These algorithms were employed to build models, compared the correctness of those models, and prepared them to fit the board’s assumptions so that cautious actions could be done to accomplish the affiliation’s purpose. These models may be applied in many districts. It has also been addressed how to anticipate how different kinds of items will be arranged and how different conditions will affect those arrangements. In this research, there are many implementation challenges in practical models. Various models are created and after comparing their R2-Score the best R2-score is of random forest grid search which is equal to 0.5588858290914282. Somehow R2-score is low, but it is due to the variability and lack of data, and it is good enough to predict the output.
Conference Paper
Full-text available
In this paper we discuss the various challenges of Big Data and problem arises due to continuous explosion of data resulting from the likes of social media and other online sources to gain access to deeper analysis of their data. This paper discusses two of the comparison of Hadoop Map Reduce and the recently introduced Apache Spark – both of which provide a processing model for analyzing big data. Although both of these options are based on the concept of Big Data, their performance varies significantly based on the use case under implementation. Data growing at very high speed and is having very large volume. Presently, to assemble the large volume of dataset at lesser cost, storage technology and data collection has made it possible for any organization.
Conference Paper
Full-text available
Internet of Things (IoT) will comprise billions of devices that can sense, communicate, compute and potentially actuate. Data streams coming from these devices will challenge the traditional approaches to data management and contribute to the emerging paradigm of big data. This paper discusses emerging Internet of Things (IoT) architecture, large scale sensor network applications, federating sensor networks, sensor data and related context capturing techniques, challenges in cloud-based management, storing, archiving and processing of sensor data.
Conference Paper
MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.
An Experimental Survey on Big Data Frameworks
  • W Inoublia
  • S Aridhib
  • H Meznic
  • A Jungd
W. Inoublia, S. Aridhib, H. Meznic, and A. Jungd, An Experimental Survey on Big Data Frameworks, 2017.
Blue ocean strategy, expanded edition: How to create uncontested market space and make the competition irrelevant
  • W C Kim
  • R A Mauborgne
W. C. Kim and R. A. Mauborgne, "Blue ocean strategy, expanded edition: How to create uncontested market space and make the competition irrelevant," vol. 2015.