ArticlePDF Available

Abstract

Data Warehousing has evolved over the past few decades primarily due to the exponential growth of data that traditional system is unable to handle and secondly due to technological advancement, which makes it feasible to have real-time data and cloud technology which provides unlimited storage and scalability. The journey for these changes started with the MIS (Management Information system) when data integration from various IT systems was possible. In the next stages, data repositories come into demand, and warehousing modernizes with the assistance of data mart mechanisms. The emergence of new tools and software used for the same has also given rise to Modern cloud-based SaaS data processing systems. Data lakes and data lakehouses have transformed the systems, providing greater autonomy and enabling the processing of larger volumes of data to generate insights for decision-making. The future of Datawarehouse will be based on AI and Machine Learning, which would be helpful with infrastructure scalability, cost savings, and agility, as well as increasing the reliability and usability of the data as well.
International Journal of Computer Trends and Technology Volume 71 Issue 9, 1-6, September 2023
ISSN: 22312803 / https://doi.org/10.14445/22312803/IJCTT-V71I9P101 © 2023 Seventh Sense Research Group®
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Original Article
Evolution of Enterprise Data Warehouse: Past Trends and
Future Prospects
Sivakumar Ponnusamy
Senior Data Engineer, Cognizant Technology Solutions, Richmond, VA, USA.
Received: 03 July 2023 Revised: 18 August 2023 Accepted: 01 September 2023 Published: 15 September 2023
Abstract - Data Warehousing has evolved over the past few decades primarily due to the exponential growth of data that
traditional system is unable to handle and secondly due to technological advancement, which makes it feasible to have real-
time data and cloud technology which provides unlimited storage and scalability. The journey for these changes started with
the MIS (Management Information system) when data integration from various IT systems was possible. In the next stages,
data repositories come into demand, and warehousing modernizes with the assistance of data mart mechanisms. The
emergence of new tools and software used for the same has also given rise to Modern cloud-based SaaS data processing
systems. Data lakes and data lakehouses have transformed the systems, providing greater autonomy and enabling the
processing of larger volumes of data to generate insights for decision-making. The future of Datawarehouse will be based on
AI and Machine Learning, which would be helpful with infrastructure scalability, cost savings, and agility, as well as
increasing the reliability and usability of the data as well.
Keywords - Data warehouse, ETL, DataLakeHouse, Bigdata, Machine learning.
1. Introduction
Data warehousing has been subjected to alternation and
innovation as the world of data has gone through changes in
terms of volume and variety of the information collected.
Technological advancement is the key, urging users and
developers to work on the new data models. One of the
common examples of such is the high usage of cloud
computing, which has changed the complete IT
infrastructure. In the same way, the data warehousing
techniques have also changed with the transformation in the
methodologies to manage it. According to the research
presented by Nambiar and Mundra (2022), advancements in
cloud computing, IoT, and data analytics have caused
changes in data warehousing over time. The need described
by businesses can also be resolved with the help of smart
tools and techniques. Businesses require data warehousing
solutions that can scale to meet their performance needs. As
data volumes increase, systems must handle larger query
workloads and promptly deliver results. According to the
findings of D. Subotić (2015), multiple factors have led to
the evolution of the data warehousing methodologies and
techniques. Mainly, data volume and variety of the data
needed to be stored is the key element. Data warehousing
systems must evolve to accommodate larger and more
diversified datasets, including structured, semi-structured and
unstructured data, as the amount of data generated by
businesses and individuals continues to expand dramatically
[2].
Dhaouadi et al. (2022) have said that the need for real-
time insights has prompted data warehousing to develop to
facilitate real-time data processing and analytics. This
requires technologies that process and analyze data as it is
generated, enabling businesses to make faster decisions. The
adoption of cloud computing has changed data warehousing
in addition to the other contributing elements since it offers
scalable, affordable, and adaptable solutions. Cloud-based
data warehouses eliminate the need for up-front
infrastructure investments by enabling organizations to scale
their resources up or down in response to demand. Data
warehouses can now handle and analyze data more
effectively thanks to advancements in hardware, including
faster processors, greater memory capacity, and high-speed
storage devices [3].
2. Literature Review
2.1. MIS
MIS was the major intervention in the world of data
warehousing and IT integration. Businesses relied on MIS
systems to generate basic reports and gain insights into their
data. These systems were typically file-based and lacked the
capabilities to handle large volumes of data. Mishra et al.
(2015) described in their research that MIS systems laid the
groundwork for structured data management practices. They
introduced the idea of arranging data into clearly defined
fields and tables, consistent with data warehousing's
structured nature. It was simpler to gather, store, and retrieve
Sivakumar Ponnusamy et al. / IJCTT, 71(9), 1-6, 2023
2
data from numerous sources using this method. It pulled data
from various departments within an organization to generate
reports and provide insights. This need for data integration
and consolidation paved the way for the data warehousing
concept, where data from different sources is brought
together into a central repository for analysis [4].
Varajão et al. (2022) have given the justification for the
rise of MIS and caused the rise of data warehousing with its
assistance. They stated that MIS systems emphasized the
importance of historical data for decision-making. Data
warehouses built upon this principle by storing historical
data over time, enabling trend analysis, historical
comparisons, and predictive modeling. This system is
primarily focused on generating predefined reports and basic
analytics. Data warehousing expanded on this concept by
providing more advanced reporting and analytics
capabilities, allowing users to perform complex queries, drill
down into data, and create custom reports. More
sophisticated Business Intelligence (BI) tools were
developed with the evolution of data warehousing. These
tools allow users to create interactive dashboards and
visualizations and perform ad-hoc queries for deeper
insights. MIS is also significant in usage, highlighting the
need for accurate and consistent data for effective decision-
making. Data warehousing solutions adopted data quality
practices to ensure data accuracy, integrity, and consistency,
improving overall data reliability [5].
2.2. Data Warehouses
After the MIS, the computing technologies further
transformed into more modern technologies, which were
excessively beneficial for the data warehousing processes.
Data warehousing emerged as a new concept in the late
1980s and early 1990s. A Data warehouse is subject-
oriented, integrated, time-variant, non-volatile data collection
used to support management decision-making processes.
Data warehouses are centralized, integrated repositories that
store structured data from various sources. They are designed
to support analytical processing and provide a historical data
view. Technologies like Online Analytical Processing
(OLAP) and Extract, Transform, Load (ETL) became
essential components of data warehousing solutions. Online
analytical processing (OLAP) allows data analysis from
different viewpoints.
OLAP provides the benefit of faster decision-making
and an integrated view of data. OLAP system operates in 3-
main types-MOLAP (Multi-dimensional OLAP), ROLAP
(Relational OLAP) and HOLAP (Hybrid OLAP). Data
modelling is data representation in a Data warehouse or
OLAP cube. It stores multidimensional data as Star or
Snowflake schema. Star schema consists of Facts and a
Dimension table. The fact table contains numerical facts
related to business processes, which refers to the Dimension
table via foreign keys. Snowflake schema is an extension of
star schema where some dimension table leads to one or
more secondary dimension tables [6]. Business people
perform basic analytical operations with OLAP cubes, such
as slice, dice, Rollup, Drill down and Pivot.
2.3. Data Marts
Data marts are subsets of data warehouses that focus on
specific business departments or user groups. They are
designed to provide faster access to relevant data for
particular use cases, making it easier for end-users to obtain
the information they need without querying the entire data
warehouse. As David Loshin (2013) stated in his book about
data marts, he presented his opinion on data warehousing.
Data marts and data warehouses differ mainly because they
serve different purposes.
Data warehouses are generally used for exploratory
analysis, while data marts are for formalized reporting and
specific drill-down investigations. As data marts focus on the
goals and needs of a specific department, they contain
smaller amounts of data, but that data is highly relevant to
the department's operation. Different departments might
require different data mart structures due to their unique
analytical or reporting needs [9,10].
Following the research presented by Edward M. Leonard
(2011), the role of data marts comes after the traditional data
warehouse techniques. Data marts are specialized structures
developed from a data warehouse and designed to organize
data for specific business purposes. This customization
makes data marts a vital tool for addressing the unique data
needs of different departments or business units.
Three are 3-type of data marts differ based on their
relationship to the Data warehouse and source system, which
feeds DataMart or Data warehouse. Dependent data marts are
subsets which get loaded from the Enterprise data warehouse
(top-down approach or Bill Inmon model). Independent data
marts are standalone DataMart, which serves specific
business domains and are combined to form Enterprise
Datawarehouse (Bottom-up Approach or Ralph Kimball
model). Hybrid data marts combine data from existing data
warehouses and other Operational Data stores (ODS).
Moreover, the availability of numerous reporting tools
makes data warehouses user-friendly. These tools empower
individuals to extract data by themselves instead of waiting
for others to distribute it, enhancing the efficiency of data
analysis and decision-making processes. The author further
emphasized the critical role of data warehouses and data
marts in enabling Business Intelligence by facilitating data
access, organization, and analysis. The ability to create data
marts from a data warehouse and to use diverse reporting/BI
tools makes data warehouses invaluable for the Business and
management community to make strategic business decisions
[11].
Sivakumar Ponnusamy et al. / IJCTT, 71(9), 1-6, 2023
3
2.4. BigData
As time passed, data volumes exploded, and traditional
data warehouses faced challenges in handling the sheer scale
and diversity of data. Santoso and Yulia (2017) have stated
how combining big data technology with data warehouses
can aid the decision-making process for university
management by turning raw data into actionable insights. Big
Data is defined by 3V’s- Volume, velocity and variety. Data
generated from social media, IOT sensors, and weblogs are a
few examples of Big data. It can be structured, semi-
structured or unstructured data. There are valuable insights
that can be derived from Bigdata, such as customer
sentiments and market insights, which is impossible without
implementing a big data solution. Apache Hadoop is mainly
for storage, and MapReduce, Spark and other technologies
are used for processing. Apache Hive is a distributed, fault-
tolerant Data warehouse system that enables big data
analytics. The paper concludes by pointing out the need for
future developments and the implementation of institutional
projects involving Big Data [7,8].
2.5. NoSQL Database
The need for the data fields had increased and required
major changes in the data management domain. NoSQL
databases became more popular as big data increased and
more adaptable data models were required. These databases,
like MongoDB, Cassandra, AWS DynamoDB, Couchbase
and Hbase, provide horizontal scaling and schema flexibility,
making them perfect for some large data applications. The
research offered by the article from Sokolova et al. (2019)
states a redesign of a database management system for a
retail business company. Originally based on a traditional
data model, the database system is migrated to a hybrid
model that combines SQL and NoSQL databases. Adding the
NoSQL database enhances the system's flexibility,
scalability, and efficiency. NoSQL databases store data in a
single data format, including a JSON document, rather than
the traditional table structure of a relational database. NoSQL
is also a type of distributed database, meaning that
information is copied and stored on various servers, which
can be remote or local; hence, it provides data availability
and reliability. Some main types of NoSQL databases are
key-value, document, graph, In-memory, wide-column and
search. NoSQL databases are very good at horizontal scaling
with high performance and availability. The paper also
discusses the architecture of this redesigned system and its
functionality, emphasizing the benefits brought by the hybrid
approach, combining both SQL and NoSQL databases [12].
2.6. Modern Cloud Data System
Cloud computing revolutionized data warehousing by
offering scalable, flexible, and cost-effective solutions.
Cloud-based data warehouses like Snowflake, Amazon
Redshift and Azure Synapse Analytics became popular
choices for organizations seeking to offload infrastructure
management and scale their data processing based on
demand. Bhatti and Rad (2017) have described cloud
computing in the way that the significant shift in the
Information Technology industry from traditional relational
databases to cloud databases over the last 40 years. The
cloud is particularly suitable for data-intensive applications,
such as storing and mining large datasets and commercial
data [13].
Applications supported by cloud databases are diverse
and adaptable, with many value-based data management
applications like banking, online reservation, e-commerce,
and inventory management being developed. However, while
these databases support key features like Atomicity,
Consistency, Isolation, and Durability (ACID), their use in
the cloud is not straightforward. The paper's objective was to
investigate the pros and cons of databases commonly used in
cloud systems and to examine the challenges associated with
developing cloud databases. Data security is the main issue
with moving into a cloud-based Data warehouse as we
cannot store NPI data in plain text; hence, it requires added
complexity, such as tokenization, before storing confidential
data in the cloud. Some of the challenges stated include the
security risks, reliability of the cloud system for operations,
and higher costs possess the challenges [14,15].
2.7. Data Lake and Data Lakehouse
Data lakes emerged as an alternative approach to
traditional data warehousing. A data lake is a centralized
repository that can store both structured and unstructured
data in its raw form. The concept of a Data lakehouse
combines data lakes with some elements of data
warehousing, aiming to bridge the gap between data
engineering and data analytics by providing features like
ACID transactions, data indexing, and support for SQL
queries directly on raw data. The modern data architecture
ensures the combination of elements from both a data
warehouse and a data lake. It is known as a "data lakehouse,"
which is a hybrid of data warehouses and data lakes. This
new concept seeks to break down silos between data
engineers and data scientists to foster a collaborative
environment, ultimately enabling more effective data
analysis [16].
Further, details were given to explain the procedure on
which data lakes and lake houses operate. The data lake
house is designed to handle both structured and unstructured
data, providing a unified platform for data engineers and data
scientists to work together. Previously, these two roles
tended to operate in separate domains, with data engineers
mostly working with structured data in data warehouses and
data scientists preferring data lakes for their versatility in
handling both structured and unstructured data. The
datalakehouse merges these domains, eliminating duplication
of effort and speeding up the process of finding value in the
data [17].
Sivakumar Ponnusamy et al. / IJCTT, 71(9), 1-6, 2023
4
Such a mechanism also brings improvements in data
management. It is capable of handling diverse types of data,
including structured, semi-structured, and unstructured data,
all in a cost-effective manner. This ability, combined with
data diversity, reduces the risk of data loss and enhances data
recovery and availability. The lake house paradigm fosters an
environment conducive to not only descriptive and predictive
reporting but also prescriptive reporting, which provides
advice on potential outcomes and next steps. Faster access to
shared, secure, and connected data enables businesses to
align with modern analytics and gain insights more quickly.
Moreover, it supports the need for faster development
and productization, essential for businesses seeking to extract
value from their data scientists. With data scientists spending
less time on data preparation, they can focus more on
modeling data and deriving insights from it. This agility and
speed are key for organizations wishing to mature their
business reporting and analytics practices. The lakehouse
provides a conducive environment for machine learning and
AI operations. With data's increasing volume and diversity,
organizations leverage machine learning and AI to analyze
and interpret data effectively. The lakehouse offers a "data
playground" for data scientists, allowing them to build
advanced analytics models using large quantities of
structured and unstructured data [18].
2.8. Real-time Data Processing
The need for data processing technologies has been
enhanced with the intervention of online businesses and the
boom of usage in enterprises. With the demand for real-time
analytics, modern data systems have focused on providing
real-time data processing capabilities. Technologies like
Apache Kafka, Apache Flink, and Apache Spark Streaming
allow for real-time data ingestion, processing, and analytics
[19].
2.9. Data Governance
As data becomes more critical, the need for proper data
governance and security measures has grown. Organizations
now focus on implementing robust data governance
frameworks to ensure data privacy, compliance, and data
quality. The introduction to the concepts related to data
governance has enhanced the data warehousing techniques,
which has resulted in the minimization of data-related
research and also made the way for the governance, its usage,
and utility that can be gained. Research presented by
Abraham et al. (2019) stated that data governance refers to
managing data to enhance its value and minimize data-
related costs and risks. In the context of data governance, a
data warehouse could be an essential tool since it offers a
structured repository for storing and analyzing data, which
could be crucial in implementing effective data governance
strategies. This can enable organizations to maintain data
quality, consistency, security, and privacy, which are
significant components of data governance. Nonetheless, for
specific references to data warehousing, it might be
beneficial to review other parts of the text or a different text
[20].
2.10. AI and Machine Learning Integration
The trend of development and installation of modern
tools and technologies are a part of daily targets for large
firms. As plenty of data is available, and not just commercial
enterprises, non-profit organization have made their
dependence upon the data. Nowadays, The decision-making
process depends entirely on the data available, which could
be qualitative or quantitative. There has been a push to
integrate AI and machine learning capabilities into data
warehousing solutions in recent years. This integration
enables businesses to leverage advanced analytics, predictive
modeling, and machine learning algorithms to gain deeper
insights from their data. A research paper presented by Sizwe
et al. (2015) examines the role of knowledge engineering in
enhancing organizational capabilities and adapting to
unpredictable market environments. It stresses the
importance of transforming collected data into real-time
information to support successful decision-making [21] and
delivers timely results. As data becomes increasingly crucial,
data governance practices have gained prominence, ensuring
data quality, security, and compliance. The integration The
authors emphasize the necessity of integrating artificial
intelligence into data warehousing and data mining. The
integration of data warehousing holds promise as machine
learning and AI algorithms drive advanced analytics,
automated data preparation, intelligent query optimization,
and even more accessible user interactions through natural
language. The authors emphasize the necessity of integrating
artificial intelligence into data warehousing and data mining.
The integration of AI can help in analyzing and interpreting
vast amounts of data, which can be a complex and
challenging process. The research aims to explore suitable
techniques, technologies, and trends to facilitate this
integration, providing an insightful overview of data
warehousing and data mining. It also aims to highlight the
techniques and limitations of analyzing and interpreting large
amounts of data. In this context, AI and machine learning can
be crucial tools for making sense of vast, complex data sets
and extracting meaningful insights [22].
3. Future Prospects
The potential for future data warehousing relies on
implementing advanced analytical steps. Data warehouses
will leverage machine learning and AI algorithms to provide
more advanced analytics and predictive insights. ML models
can be trained on historical data to make predictions and
recommendations, enabling organizations to predict trends
and make informed decisions. There is also a possibility that
Machine learning algorithms can be used to automate the
process of data preparation, including data cleaning,
transformation, and integration. This will reduce the manual
effort required to structure and format data for analysis. The
Sivakumar Ponnusamy et al. / IJCTT, 71(9), 1-6, 2023
5
generation and ease of access to AI have made everything
easier in this world. So comes the data warehousing as well,
as it is expected that AI-powered query optimization will
become more sophisticated, automatically selecting the most
efficient query execution plans based on data distribution,
workload patterns, and system performance. AI and Machine
Learning applications in Data warehousing will help
optimize resource allocation & usage, thereby reducing
operational costs for Datawarehouse. Data warehouses will
integrate NLP capabilities to enable users to query and
interact with data using natural language. This will make data
analysis more accessible to a wider range of users, including
business users without much technical expertise, thus
reducing the learning curve for querying data [23].
4. Conclusion
The evolution of data warehousing has been a dynamic
journey driven by the increasing volume, velocity, and
variety of data. Technological advancements, including cloud
computing, big data solutions, and AI integration, have
played pivotal roles in shaping the modern data warehousing
landscape. This evolution has empowered businesses to
make more informed decisions, gain competitive advantages,
and enhance their overall operations. Research has
highlighted the extensive factors influencing data
warehousing changes over time, reflecting the growing needs
of businesses that are now effectively addressed through
smart tools and techniques. The scalability imperative is ever
more relevant, with data volumes continually on the rise. In
response, data warehousing solutions have adapted to handle
larger query workloads efficiently by incorporating AI and
machine learning capabilities into data warehousing. This has
marked a transformative phase, enabling deeper insights and
automation of various processes. The future of data
warehousing holds promise as machine learning and AI
algorithms drive advanced analytics, automated data
preparation, intelligent query optimization, and even more
accessible user interactions through natural language
processing. As technology continues to advance, data
warehousing's evolution is poised to remain aligned with
business needs, enabling organizations to harness data's full
potential for strategic decision-making and innovation.
References
[1] Athira Nambiar, and Divyansh Mundra, “An Overview of Data Warehouse and Data Lake in Modern Enterprise Data Management,”
Big Data and Cognitive Computing, vol. 6, no. 4, 2022. [CrossRef] [Google Scholar] [Publisher Link]
[2] Danijela Subotić, “Data Warehouse Schema Evolution Perspectives,” New Trends in Database and Information Systems II, pp. 333-338,
2015. [CrossRef] [Google Scholar] [Publisher Link]
[3] Asma Dhaouadi et al., “Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and
Comparisons,” Data, vol. 7, no. 8, 2022. [CrossRef] [Google Scholar] [Publisher Link]
[4] Lahar Mishra, Ratna Kendhe, and Janhavi Bhalerao, “Review on Management Information Systems (MIS) and its Role in Decision
Making,” International Journal of Scientific and Research Publications, vol. 10, no. 5, pp. 1-5, 2015. [Google Scholar] [Publisher Link]
[5] João Varajão, João Carlos Lourenço, and João Gomes, “Models and Methods for Information Systems Project Success Evaluation A
Review and Directions for Research,” Heliyon, vol. 8, no. 12, 2022. [CrossRef] [Google Scholar] [Publisher Link]
[6] Karwan Jameel, Abdulmajeed Adil, and Maiwan Bahjat, “Analyses the Performance of Data Warehouse Architecture Types," Journal
of Soft Computing and Data Mining, vol. 3, no. 1, pp. 45-57, 2022. [Google Scholar] [Publisher Link]
[7] Leo Willyanto Santoso, and Yulia, “Data Warehouse with Big Data Technology for Higher Education,” Procedia Computer Science,
vol. 124, pp. 93-99, 2017. [CrossRef] [Google Scholar] [Publisher Link]
[8] V. Rathika, and L. Arockiam, “General Aspect of (Big) Data Migration Methodologies,” SSRG International Journal of Computer
Science and Engineering, vol. 1, no. 9, pp. 1-5, 2014. [CrossRef] [Google Scholar] [Publisher Link]
[9] David Loshin, Business Intelligence the Savvy Manager's Guide, 2nd ed., Elsevier, 2012. [Google Scholar] [Publisher Link]
[10] Muhammad Khalid, “Challenges of Dimensional Modeling in Business Intelligence Systems,” International Journal of Computer &
Organization Trends, vol. 5, no. 3, pp. 30-31, 2015. [CrossRef] [Publisher Link]
[11] Edward M. Leonard, B.S., Design and Implementation of an Enterprise Data Warehouse,” Thesis, Marquette University, 2011.
[Google Scholar] [Publisher Link]
[12] Marina V. Sokolova, Francisco J. Gómez, and Larisa N. Borisoglebskaya, “Migration from an SQL to a Hybrid SQL/NoSQL Data
Model,” Journal of Management Analytics, vol. 7, pp. 1-11, 2020. [CrossRef] [Google Scholar] [Publisher Link]
[13] Junaid Hassan et al., “The Rise of Cloud Computing: Data Protection, Privacy, and Open Research ChallengesA Systematic
Literature Review (SLR),” Computational Intelligence and Neuroscience, pp. 1-26, 2022. [CrossRef] [Google Scholar] [Publisher Link]
[14] Harrison John Bhatti, and Babak Bashari Rad, “Databases in Cloud Computing: A Literature Review,” International Journal of
Information Technology and Computer Science, vol. 9, no. 4, pp. 9-17, 2017. [CrossRef] [Google Scholar] [Publisher Link]
[15] Soukaina Ait Errami et al., “Spatial Big Data Architecture: From Data Warehouses and Data Lakes to the LakeHouse,” Journal of
Parallel and Distributed Computing, vol. 176, pp. 70-79, 2023. [CrossRef] [Google Scholar] [Publisher Link]
[16] Mitesh Athwani, “A Novel Approach to Version XML Data Warehouse,” SSRG International Journal of Computer Science and
Engineering, vol. 8, no. 9, pp. 5-11, 2021. [CrossRef] [Google Scholar] [Publisher Link]
Sivakumar Ponnusamy et al. / IJCTT, 71(9), 1-6, 2023
6
[17] Philipp Wieder, and Hendrik Nolte, “Toward Data Lakes as Central Building Blocks for Data Management and Analysis,” Front Big
Data, vol. 5, pp. 1-18, 2022. [CrossRef] [Google Scholar] [Publisher Link]
[18] Dave Langton, The New Data Lakehouse: An Overdue Paradigm Shift for Data, Database Trends and Application, 2022. [Online].
Available: https://www.dbta.com/BigDataQuarterly/Articles/The-New-Data-Lakehouse-An-Overdue-Paradigm-Shift-for-Data-
151318.aspx
[19] Abdul Jabbar, Pervaiz Akhtar, and Samir Dani, “Real-Time Big Data Processing for Instantaneous Marketing Decisions: A
Problematization Approach,” Industrial Marketing Management, vol. 90, pp. 558-569, 2020. [CrossRef] [Google Scholar] [Publisher
Link]
[20] Rene Abraham, Johannes Schneider, and Jan vom Brocke, “Data Governance: A Conceptual Framework, Structured Review, and
Research Agenda,” International Journal of Information Management, vol. 49, pp. 424-438, 2019. [CrossRef] [Google Scholar]
[Publisher Link]
[21] Nelson Sizwe. Madonsela, Paulin. Mbecke, and Charles Mbohwa, “Integrating Artificial Intelligence into Data Warehousing and Data
Mining,” Proceedings of the World Congress on Engineering and Computer Science, vol. 2, pp. 1-5, 2015. [Google Scholar] [Publisher
Link]
[22] Maria F. Chan1, Alon Witztum, and Gilmer Valdes, “Integration of AI and Machine Learning in Radiotherapy QA,” Frontiers in
Artificial Intelligence, vol. 3, pp. 1-8, 2020. [CrossRef] [Google Scholar] [Publisher Link]
[23] Gizem Turcan, and Serhat Peker, “A Multidimensional Data Warehouse Design to Combat the Health Pandemics,” Journal of Data,
Information and Management, vol. 4, pp. 371-386, 2022. [CrossRef] [Google Scholar] [Publisher Link]

Supplementary resources (2)

Article
В настоящей статье рассматривается важный аспект современной образовательной практики -вопросыпо сбору, упорядочению, формализации и структурированиюданных по непрерывному профессиональному развитию педагогов (НПРП) в различных форматах. Предложенныеподходынаправленына создание системы, способной эффективно обрабатывать и анализировать разнообразную информацию о профессиональном развитии педагогов, такую как курсы повышения квалификации, тренинги, семинары, конференциии другие образовательные мероприятия. Описываютсяосновные этапы разработки методики, начиная с анализа основных процессов НПРП и определения требований к данным, и заканчивая созданием структурированной информационной модели и архитектуры данных. Так жеуделяется особое внимание методам сбора данных из различных источников, их упорядочению и формализации для последующего анализа и визуализации. Результаты исследования могут быть использованы для разработки и внедрения информационных систем, специализированных на НПРП, а также для совершенствования существующих методов анализа и управления данными в образовательной сфере.
Article
Full-text available
Artificial Intelligence (AI) has become a disruptive force that is changing the face of contemporary computer software. With artificial intelligence (AI) at the crossroads of technical innovation and processing power, its impact is evident in many facets of software development. This paradigm change signals the beginning of a new era of capabilities and opportunities. It goes beyond simple automation and involves the integration of machine learning and intelligent algorithms into the core of software. Computer software's path from rule-based systems to dynamic, self-learning entities capable of adaptation and development may be seen in the evolution of artificial intelligence. This introduction will examine the significant influence artificial intelligence (AI) has had on contemporary computer software, as well as its historical foundation, essential elements, wide range of applications, and potential and problems. It is clear as we work our way through the complexities of this technological revolution that artificial intelligence (AI) is more than just a tool; rather, it is a catalyst for efficiency, creativity, and a redefining of what is possible in the field of software development. Through this investigation, we will learn about the various ways artificial intelligence (AI) is influencing computer software going forward. These approaches include how programs are developed as well as how they function, think, and interact with users in ways previously unimaginable. DEFINITION OF COMPUTER A computer is a programmable electronic device designed to process, store, and retrieve data. It is capable of performing a variety of tasks by executing instructions provided in software. Computers consist of hardware components, such as a central processing unit (CPU), memory, storage devices, input devices (like keyboard and mouse), and output devices (such as monitors and printers). The fundamental operations of a computer involve input, processing, storage, and output. Input devices allow users to enter data, which is then processed by the CPU according to the instructions in software. The processed information is stored in memory or on storage devices for later retrieval. Finally, the results are presented to users through output devices. Computers come in various forms, including personal computers, servers, mainframes, and embedded systems. They play a crucial role in diverse fields, such as business, education, scientific research, entertainment, and communication. The evolution of computers over time has seen a tremendous increase in processing power, storage capacity, and the ability to connect and communicate with other devices, contributing to their pervasive presence in modern society.
Article
Full-text available
In the ever-evolving landscape of technology, Machine Learning (ML) and Artificial Intelligence (AI) stand at the forefront, driving unprecedented advancements and transformative changes. These fields have become the backbone of innovation, promising to reshape the future of technology in profound ways. The convergence of sophisticated algorithms, vast datasets, and computing power has propelled ML and AI to new heights, enabling systems to learn, adapt, and make decisions with human-like intelligence. Advancements in ML and AI have unlocked a myriad of possibilities, ranging from deep learning breakthroughs to the application of intelligent systems in real-world scenarios. As these technologies mature, they are permeating various industries, revolutionizing how we approach challenges and providing solutions that were once thought to be the realm of science fiction. However, with great promise comes an array of challenges that demand careful consideration. Ethical concerns, interpretability issues, and the need for standardized practices pose complex hurdles. The deployment of AI systems also raises questions about data privacy, security, and the potential amplification of societal biases. Navigating these challenges is crucial to ensuring that the benefits of ML and AI are equitably distributed, fostering a future where technology enhances human well-being rather than exacerbating existing disparities. This exploration into the Advancements and Challenges in Machine Learning and Artificial Intelligence aims to dissect the cutting-edge developments propelling us into the next technological era, while also scrutinizing the obstacles that require thoughtful solutions. By understanding the intricacies of these twin forces, we can shape a future where ML and AI contribute positively to society, empower industries, and drive innovation while addressing the ethical and societal implications that come with their integration into our daily lives. WHAT IS MACHINE LEARNING Machine Learning (ML) is a subset of artificial intelligence (AI) that focuses on developing systems and algorithms that enable computers to learn from and make predictions or decisions based on data. Instead of being explicitly programmed to perform a task, a machine learning system uses statistical techniques to automatically learn patterns and improve its performance over time, without human intervention. The core idea behind machine learning is to enable computers to recognize patterns, make decisions, and improve their performance on a specific task by learning from experience, often in the form of data. The learning process involves the model adjusting its
Article
Full-text available
Modern computer software is evolving due to the disruptive power of artificial intelligence (AI). Artificial intelligence (AI) is influencing many aspects of software development since it is at the intersection of technological innovation and processing capacity. This shift in perspective heralds the start of a new era of possibilities and capacities. It involves integrating machine learning and intelligence algorithms into the fundamental components of software, going beyond mere automation. The evolution of AI in computer software represents a journey from rule-based systems to dynamic, self-learning entities capable of adaptation and evolution. This introduction will explore the profound impact of AI on modern computer software, delving into its historical roots, key components, diverse applications, and the challenges and opportunities it presents. As we navigate through the intricacies of this technological revolution, it becomes evident that AI is not just a tool; it is a catalyst for innovation, efficiency, and a redefinition of what is achievable in the realm of software development. In this exploration, we will uncover the multifaceted ways in which AI is shaping the future of computer software, influencing not only how applications are built but also how they think, adapt, and interact with users in ways previously unimaginable. with users in ways previously unimaginable.
Article
Full-text available
In the rapidly evolving landscape of digital technology, securing cloud environments has become paramount to safeguarding sensitive data and ensuring the uninterrupted functionality of applications and services. As organizations increasingly embrace cloud computing, the integration of security measures becomes a critical component of the software development lifecycle. DevSecOps, a methodology that integrates security practices seamlessly into the DevOps workflow, has emerged as a solution to bridge the gap between development, operations, and security teams. This integration, however, comes with its own set of challenges. The dynamic nature of cloud environments, coupled with the ever-growing sophistication of cyber threats, requires a proactive and adaptive approach to security. Artificial Intelligence (AI) has emerged as a powerful ally in the realm of DevSecOps, offering advanced techniques to fortify cloud security. This introduction explores the pivotal role of AI-powered techniques in enhancing DevSecOps practices, ensuring sturdy and resilient cloud security measures. As we delve into the realms of threat detection, vulnerability assessment, and automated incident response, the application of AI unveils its potential to augment traditional security measures. The utilization of machine learning algorithms and predictive analytics equips organizations with the capability to detect anomalies, identify potential risks, and respond swiftly to emerging threats. This exploration will shed light on how AI seamlessly integrates into the DevSecOps framework, becoming an intrinsic part of Continuous Integration/Continuous Deployment (CI/CD) pipelines. By automating security checks throughout the development lifecycle, organizations can proactively address vulnerabilities, ensuring that robust security practices are ingrained into the DNA of their applications. In this context, the introduction of AI-driven communication tools facilitates enhanced collaboration between development and security teams. Real-time sharing of threat intelligence, coupled with adaptive learning from past incidents, forms the cornerstone of an agile and responsive security infrastructure. Through case studies and real-world examples, we will examine how organizations have successfully implemented AI in their DevSecOps practices, resulting in tangible improvements in security posture, incident response times, and overall resilience against cyber threats. As we navigate through the benefits, considerations, and future trends, this exploration aims to underscore the significance of adopting AI-powered techniques in DevSecOps for achieving sturdy cloud security. The journey toward a secure digital future involves embracing the synergy between artificial intelligence and security practices, empowering organizations to stay ahead in an ever-evolving threat landscape. IMPORTANCE OF ROBUST CLOUD SECURITY IN THE MODERN DIGITAL LANDSCAPE In the modern digital landscape, robust cloud security is of paramount importance due to several interconnected factors that shape the way businesses operate and individuals interact with technology.
Article
Full-text available
At the intersection of finance and learning, a complex dance takes place in the always changing field of education. Combining these two domains results in a critical discussion entitled The Crossroads of Education and Finance: Managing the Economic Environment of Learning. This junction is a dynamic interplay that impacts the present and future trajectories of individuals, societies, and economies rather than just a confluence of intellectual and economic issues. Education, once considered a societal cornerstone, is increasingly entwined with financial dimensions that extend far beyond tuition fees and textbooks. As the global economy undergoes seismic shifts, the economic landscape of learning becomes a terrain fraught with challenges and opportunities. Understanding this nexus is paramount, for it holds the key to unlocking the doors of opportunity and empowerment, while simultaneously posing formidable challenges that demand astute financial acumen. This investigation follows the development of educational financing methods and their effects on quality and accessibility, delving into the historical viewpoints that have shaped the link between education and finance. We negotiate the complex economic dynamics that support the educational journey, from the soaring expenses of school to the significant ramifications of student debt. Simultaneously, we dissect the transformative potential of education as a financial investment, exploring its enduring economic advantages and the necessity of assessing returns on educational capital. In this intersection, financial literacy becomes crucial as we examine how crucial it is to give children the tools they need to make wise financial decisions. Outside of the classroom, government laws have a big impact on how much money is spent on education, which makes us wonder how legislators shape the financial aspects of education. We address the issues brought on by the affordability dilemma and the rising amount of student debt while simultaneously surveying cutting-edge financing solutions and the impact of technology on the transformation of educational economics. This trip across the nexus of money and education is a dynamic investigation of emerging patterns rather than a static observation. The discussion aims to foresee changes in the financing of
Article
Full-text available
Organizations heavily rely on information systems to improve their efficiency and effectiveness. However, on the one hand, information systems projects have often been seen as problematic endeavors. On the other hand, one can ask if this perspective results from subjective perceptions or objective assessments. We cannot find a definitive answer to this question in the literature. Moreover, there is no structured information about the models and methods currently available to assess projects' success in practice. This paper aims to present the results of a literature review carried out on the extant models and methods for evaluating the success of information systems projects. Additionally, it also offers models and methods from other areas that may be suitable for assessing IS projects. Results show that most models and methods found in the literature are, in their essence, theoretical exercises with only a few pieces of evidence of their use in practice, thus urging for more empirically based research.
Article
Full-text available
Data is the lifeblood of any organization. In today’s world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its performance and services. Major organizations generate, collect and process vast amounts of data, falling under the category of big data. Managing and analyzing the sheer volume and variety of big data is a cumbersome process. At the same time, proper utilization of the vast collection of an organization’s information can generate meaningful insights into business tactics. In this regard, two of the popular data management systems in the area of big data analytics (i.e., data warehouse and data lake) act as platforms to accumulate the big data generated and used by organizations. Although seemingly similar, both of them differ in terms of their characteristics and applications. This article presents a detailed overview of the roles of data warehouses and data lakes in modern enterprise data management. We detail the definitions, characteristics and related works for the respective data management frameworks. Furthermore, we explain the architecture and design considerations of the current state of the art. Finally, we provide a perspective on the challenges and promising research directions for the future.
Article
Full-text available
The extract, transform, and load (ETL) process is at the core of data warehousing architectures. As such, the success of data warehouse (DW) projects is essentially based on the proper modeling of the ETL process. As there is no standard model for the representation and design of this process, several researchers have made efforts to propose modeling methods based on different formalisms, such as unified modeling language (UML), ontology, model-driven architecture (MDA), model-driven development (MDD), and graphical flow, which includes business process model notation (BPMN), colored Petri nets (CPN), Yet Another Workflow Language (YAWL), CommonCube, entity modeling diagram (EMD), and so on. With the emergence of Big Data, despite the multitude of relevant approaches proposed for modeling the ETL process in classical environments, part of the community has been motivated to provide new data warehousing methods that support Big Data specifications. In this paper, we present a summary of relevant works related to the modeling of data warehousing approaches, from classical ETL processes to ELT design approaches. A systematic literature review is conducted and a detailed set of comparison criteria are defined in order to allow the reader to better understand the evolution of these processes. Our study paints a complete picture of ETL modeling approaches, from their advent to the era of Big Data, while comparing their main characteristics. This study allows for the identification of the main challenges and issues related to the design of Big Data warehousing systems, mainly involving the lack of a generic design model for data collection, storage, processing, querying, and analysis
Article
Full-text available
Cloud computing is a long-standing dream of computing as a utility, where users can store their data remotely in the cloud to enjoy on-demand services and high-quality applications from a shared pool of configurable computing resources. Thus, the privacy and security of data are of utmost importance to all of its users regardless of the nature of the data being stored. In cloud computing environments, it is especially critical because data is stored in various locations, even around the world, and users do not have any physical access to their sensitive data. Therefore, we need certain data protection techniques to protect the sensitive data that is outsourced over the cloud. In this paper, we conduct a systematic literature review (SLR) to illustrate all the data protection techniques that protect sensitive data outsourced over cloud storage. Therefore, the main objective of this research is to synthesize, classify, and identify important studies in the field of study. Accordingly, an evidence-based approach is used in this study. Preliminary results are based on answers to four research questions. Out of 493 research articles, 52 studies were selected. 52 papers use different data protection techniques, which can be divided into two main categories, namely noncryptographic techniques and cryptographic techniques. Noncryptographic techniques consist of data splitting, data anonymization, and steganographic techniques, whereas cryptographic techniques consist of encryption, searchable encryption, homomorphic encryption, and signcryption. In this work, we compare all of these techniques in terms of data protection accuracy, overhead, and operations on masked data. Finally, we discuss the future research challenges facing the implementation of these techniques.
Article
Full-text available
The concept of storing historical data for retrieving them when needed has been conceived, and the idea was primitive to build repositories for historical data to store these data, despite the use of a specific technique for recovering these data from various storage modes. Data warehouse is the most reliable and widely used technology for scheduling, forecasting, and managing corporations, also concern with the data storage facility that extensive collection of data. Data warehouses are called ancient modern techniques; since the early days of relational databases, the critical component of decision support, increasing focus for the database industry. Many commercial products and services are now available, and almost all of the primary database management system vendors provide them. When opposed to conventional online transaction processing applications, decision support puts slightly different demands on database technology. This paper analyzes the performance of the data warehouse architectures, through studding and comparing many research works in this filed. the study involves the extract, transform and load the data from deferent recourses and the imporatnt charactrestic of the architectures types, Furthermore the tools and application service techniques used to build data warehouse architecture.
Article
Full-text available
Data warehouses are fast becoming a norm for a standard company. XML has emerged as a standard for communication and storage of data. A distributed data warehouse having data format as XML i.e., XML warehouses have gained much popularity. Data warehouses have emerged as an important aid in decision-making and in what-if-analysis. To achieve this purpose, the presence of historical data is essential. Versioning plays an important role particularly in maintaining historical data. Research and implementation of document versioning have been carried out on many fronts, but the issue of schema versioning is still in its early stage. In this paper, we have concentrated on both content versioning as well as schema versioning. Earlier work on schema versioning has largely taken the view of versioning on attributes. We have versioned the schema on a new approach taking the dimension level. To maneuver the challenging task of schema versioning we have also taken an example case study that has been used to describe the scenario in a better manner. Also, to portray our work, we have made a prototype using the .NET framework that depicts both schema versioning as well as content versioning. The approach is user-centric and therefore it allows the user to generate versions of schema according to their need specified in an XML document called" Version Specification Document" (appended to this paper). As an additional characteristic, it also allows validation of document files against the schema files.
Article
Full-text available
The collection of big data from different sources such as the internet of things, social media and search engines has created significant opportunities for business-to-business (B2B) industrial marketing organizations to take an analytical view in developing programmatic marketing approaches for online display advertising. Cleansing, processing and analyzing of such large datasets create challenges for marketing organizations — particularly for real-time decision making and comparative implications. Importantly, there is limited research for such interplays. By utilizing a problematization approach, this paper contributes through the exploration of links between big data, programmatic marketing and real-time processing and relevant decision making for B2B industrial marketing organizations that depend on big data-driven marketing or big data-savvy managers. This exploration subsequently encompasses appropriate big data sources and effective batch and real-time processing linked with structured and unstructured datasets that influence relative processing techniques. Consequently, along with directions for future research, the paper develops interdisciplinary dialogues that overlay computer-engineering frameworks such as Apache Storm and Hadoop within B2B marketing viewpoints and their implications for contemporary marketing practices.
Article
Full-text available
Data governance refers to the exercise of authority and control over the management of data. The purpose of data governance is to increase the value of data and minimize data-related cost and risk. Despite data governance gaining in importance in recent years, a holistic view on data governance, which could guide both practitioners and researchers, is missing. In this review paper, we aim to close this gap and develop a conceptual framework for data governance, synthesize the literature, and provide a research agenda. We base our work on a structured literature review including 145 research papers and practitioner publications published during 2001-2019. We identify the major building blocks of data governance and decompose them along six dimensions. The paper supports future research on data governance by identifying five research areas and displaying a total of 15 research questions. Furthermore, the conceptual framework provides an overview of antecedents, scoping parameters, and governance mechanisms to assist practitioners in approaching data governance in a structured manner.
Article
Full-text available
Nowadays, data warehouse tools and technologies cannot handle the load and analytic process of data into meaningful information for top management. Big data technology should be implemented to extend the existing data warehouse solutions. Universities already collect vast amounts of data so the academic data of university has been growing significantly and become a big academic data. These datasets are rich and growing. University’s top-level management needs tools to produce information from the records. The generated information is expected to support the decision-making process of top-level management. This paper explores how big data technology could be implemented with data warehouse to support decision making process. In this framework, we propose Hadoop as big data analytic tools to be implemented for data ingestion/staging. The paper concludes by outlining future directions relating to the development and implementation of an institutional project on Big Data.