Conference PaperPDF Available

NoSQL Databases and Data Modeling Techniques for a Document-oriented NoSQL Database

Authors:
  • Regis University, Denver, CO, USA

Abstract and Figures

NoSQL databases are an important component of Big Data for storing and retrieving large volumes of data. Traditional Relational Database Management Systems (RDBMS) use the ACID theorem for data consistency, whereas NoSQL Databases use a non-transactional approach called BASE. RDBMS scale vertically and NoSQL Databases can scale both horizontally (sharding) and vertically. Four types of NoSQL databases are Document-oriented, Key-Value Pairs, Column oriented and Graph. Data modeling for Document-oriented databases is similar to data modeling for traditional RDBMS during the conceptual and logical modeling phases. However, for a physical data model, entities can be combined (denormalized) by using embedding. What was once called a foreign key in a traditional RDBMS is now called a reference in a Document-oriented NoSQL database.
Content may be subject to copyright.
Proceedings of Informing Science & IT Education Conference (InSITE) 2015
Cite as: Mason, R. T. (2015). NoSQL databases and data modeling techniques for a document-oriented NoSQL data-
base. Proceedings of Informing Science & IT Education Conference (InSITE) 2015, 259-268. Retrieved from
http://Proceedings.InformingScience.org/InSITE2015/InSITE15p259-268Mason1569.pdf
Editor: Eli Cohen
NoSQL Databases and Data Modeling Techniques
for a Document-oriented NoSQL Database
Robert T. Mason
College of Computer & Information Sciences,
Regis University, Denver, CO, USA
RMASON@REGIS.EDU
Abstract
NoSQL databases are an important component of Big Data for storing and retrieving large vol-
umes of data. Traditional Relational Database Management Systems (RDBMS) use the ACID
theorem for data consistency, whereas NoSQL Databases use a non-transactional approach called
BASE. RDBMS scale vertically and NoSQL Databases can scale both horizontally (sharding)
and vertically. Four types of NoSQL databases are Document-oriented, Key-Value Pairs, Col-
umn-oriented and Graph. Data modeling for Document-oriented databases is similar to data
modeling for traditional RDBMS during the conceptual and logical modeling phases. However,
for a physical data model, entities can be combined (denormalized) by using embedding. What
was once called a foreign key in a traditional RDBMS is now called a reference in a Document-
oriented NoSQL database.
Keywords: NoSQL Databases, NoSQL Data Modeling, Database Technologies.
Introduction
The increase of data volume (Big Data) during the last decade is attributed to a variety of data
sources, such as social media, GPS data, sensor data, surveillance data, text documents, e-mails,
etc. For example, the Internet of Things adds urgency for companies to be able to handle vast
amounts of data (Devlin, 2014). Data that was once considered too expensive to store, can now
be captured, stored and processed. A decade ago, large data stores that were measured in Tera-
bytes are now being measured in Petabytes (1,000 Terabytes). According to the media “hoopla”,
we are living in a brave new (improved) world that has been created by the ingenuity of Web 2.0
companies, such as Yahoo, Google, Amazon and Facebook (Mohan, 2013). The term NoSQL
has come to mean databases that are alternatives to the conventional RDBMS (e.g. Oracle, MS
SQL Server and IBM DB2). However, scholars that have watched the evolution of the database
technology over the last 30 years are cautious and skeptical (Mohan, 2013). Many of the lessons
learned during the evolution of RDBMS
are being ignored or discounted by the
venture capitalist companies at the cen-
ter of the NoSQL database movement.
Many of the technical problems that
have been resolved and automated by
RDBMS database vendors are now the
responsibility of NoSQL database ad-
ministrators and application developers.
Mohan (2013) cautioned that this type
Material published as part of this publication, either on-line or
in print, is copyrighted by the Informing Science Institute.
Permission to make digital or paper copy of part or all of these
works for personal or classroom use is granted without fee
provided that the copies are not made or distributed for profit
or commercial advantage AND that copies 1) bear this notice
in full and 2) give the full citation on the first page. It is per-
missible to abstract these works so long as credit is given. To
copy in all other cases or to republish or to post on a server or
to redistribute to lists requires specific permission and payment
of a fee. Contact Publisher@InformingScience.org to request
redistribution permission.
NoSQL Databases and Data Modeling
260
of adhoc development approach for NoSQL databases can lead to long-term disastrous results for
end users.
NoSQL Databases
Although traditional Relational Database Management Systems (RDBMS) have existed for dec-
ades and are constantly being improved by the database vendors, RDBMS struggle to handle the
large volumes of data (Mohan, 2013). However, a new category of database technology called
NoSQL databases is able to support larger volumes of data by providing faster data access and
cost savings. Basically, the cost savings and improved performance of NoSQL databases results
from a physical architecture that includes the use of inexpensive commodity servers that leverage
distributed processing. For example, according to Cloudera (2014), traditional RDBMS SAN
storage costs average $30,000+ per terabyte, whereas storage for NoSQL databases average
$1000 per terabyte. This dramatic reduction in storage cost has made it possible to store data that
was previously considered too expensive.
In addition to distributed processing and inexpensive hardware, NoSQL databases differ signifi-
cantly on their approach to maintaining data integrity and consistency (Roe, 2012). A more re-
laxed approach to data consistency helps NoSQL databases improve the performance of data
storage. Because RDBMS highly value data integrity, they use the ACID theorem for data con-
sistency which was presented in the early 1980’s by Jim Grey. ACID is an acronym for Atomici-
ty, Consistency, Isolation and Durability and supports the concept of a transaction.
Atomicityensures that all tasks with a transaction are performed completely (all or nothing).
Consistencyensures that a transaction must leave the database in a consistent state at the end of
the transaction.
Isolationensures that transactions are isolated and do not interfere with other transactions.
Durabilityensures that once the transaction is complete, the data will persist permanently, even
in the event of a system failure.
In contrast, NoSQL databases use the CAP theorem (Consistency, Availability and Partition Tol-
erance) for data consistency which was presented in 2000 by Eric Brewer at an ACM symposium
(Roe, 2012). The CAP Theorem states that of the three possible combinations of CAP, only two
are available at a given point in time. In another words, NoSQL databases can have partition tol-
erance (in a distributed environment) and consistency or partition tolerance and availability, but
not all three factors at the same time according to Abramova, Bernardino, & Furtado (2014).
Cap theorem has evolved into what is now called BASE (Basically Available, Soft state and
Eventual consistency).
Basically Availablemeans that data will be available, however the response from the database
can be a failure to retrieve the data or the data retrieved may be in an inconsistent state.
Soft state means that the data can change over time as the database seeks consistency.
Eventual consistency means that sooner or later, the database will eventually become con-
sistent.
Mohan (2013) argues that although NoSQL database do not guarantee the concept of ACID
transactions, they must support some form of a smaller transaction to promote data consistency
within the database. Early RDBMS supported the concept of uncommitted reads and different
levels of locking. Therefore, the concept of BASE is not a new idea in the area of database tech-
nologies and distributed processing. ACID was applied to RDBMS as customers demanded bet-
ter data integrity and reliability from the RDBMS vendors. Leavitt (2010) cautions users that the
Mason
261
lack of ACID requires additional programming to guarantee data consistency and therefore makes
NoSQL databases less reliable. Since NoSQL databases don’t support SQL, complex query pro-
gramming can be time-consuming and challenging.
NoSQL databases also differ from traditional RDBMS in the areas of structure and scaling.
RDBMS require that the database structure (schemas) must be defined in advance of loading and
then accessing the database. This predefinition requirement is often called Schema Write by in-
dustry experts and can be a burden when data is in an unstructured format (Moniruzzaman &
Hossain, 2013). In contrast, NoSQL column-family databases, such HBase and Cassandra, pro-
vide a flexible schema structure that facilitates changes to the schema definition when new types
of data are encountered. This type of a flexible schema structure is called Schema Read. New
data sources can be easily incorporated into a read schema within minutes, thus altering the struc-
ture and data content dynamically. However, Sadalage and Fowler (2013) caution that although a
NoSQL database schema can be dynamically altered, there is still the consideration that existing
database applications that use the database will have to be altered to use the new data structures.
Thus, there is an expense of maintaining existing code to use new data structures, which should
be considered when making structural changes to existing NoSQL databases.
Scaling for RDBMS is done vertically and is usually accomplished by adding additional memory
and/or CPU to one or more servers. However, NoSQL databases can scale both horizontally and
vertically (Abramova, Bernardino, & Furtado, 2014). Most commonly, additional nodes
(commodity servers) are added to a NoSQL cluster (group of servers) to scale horizontally (called
sharding). Because distributed processing is used for a NoSQL database, after new nodes are
added to the cluster, the existing data is automatically distributed evenly across the cluster by the
NoSQL database management system. To avoid data loss, complicated data replication methods
are applied across commodity servers in preparation for an eventual server failure. The failed
server can be easily replaced within minutes and the replicated data is used to populate the new
server with the original data.
Types of NoSQL Databases
There are four types of NoSQL databases: Document-oriented, Key-Value Pairs, Wide Column
(or Column Family) and Graph (Abramova, Bernardino, & Furtado, 2014). IT organizations
will use one or more of the NoSQL database types based upon the characteristics of the data that
must be processed.
Document-orientedas the name implies, stores related information in the form of a document.
In a third normal form data model, data is normalized (separated) into different entities with rela-
tionships to reduce redundancy and to avoid update anomalies. Within a document-oriented
NoSQL database document, the data is denormalized, semi-structured and stored hierarchically
(Moniruzzaman & Hossain, 2013). For example, each book in a library of books can be stored in
a collection of documents called BOOK. Not only will the book title be stored in a BOOK doc-
ument, but details about the book such as a list of one or more authors, publication date, edition,
publisher, publisher location and ISBN numbers will be also embedded in the same document.
In addition to embedding information within a particular document, it is possible to provide a ref-
erence to another collection of documents (Moniruzzaman & Hossain, 2013). A reference (link)
is a similar to the concept of a foreign key that is used by RDBMS. A Document-oriented
NoSQL Database has a similar structure to a XML document which is hierarchical. MongoDB is
an example of a NoSQL document-oriented database. An example of a particular BOOK docu-
ment is shown below in Figure 1:
NoSQL Databases and Data Modeling
262
Figure 1. An example of a NoSQL document for a particular book.
Key-Value Pairsstores information in form of matched pairs with only two columns permitted
- the key (hashed key) and the value (Moniruzzaman & Hossain, 2013). The values can be sim-
ple text or complex data types such as sets of data. Data must be retrieved via an exact match on
the key. The advantage of this type of NoSQL database is that new types of data about a book
can easily be added to the database as new key value pairs. Examples of NoSQL databases that
use Key-Value Pairs are Project Voldemort, Cache and Dynamo. In the prior book example, the
book information from Figure 1 would be stored as shown in Table 1.
Table 1. An example of how key value pairs are stored in a NoSQL database.
Key
Value
Book Title
Business Intelligence and
Analytics: Systems for De-
cision Support
Author (set)
Ramesh Sharda
Dursun Delen
Efraim Turban
Publication Date
2015
Edition
10th
Publisher
Pearson
Wide Column (Column Family) has a format of data storage that is very similar to RDBMS
(Abramova, Bernardino, & Furtado, 2014). Although RDBMS tend to have simple data types
and a predefined schema (structure), Column-oriented NoSQL databases provide much more
flexibility. They can support complex data types, unstructured text and graphics (e.g. jpeg, gif,
bmp, etc.). For example, in the example shown in Table 2, author, publication date, edition and
publisher can all be included in a complex data type called book details. Cassandra is an example
of a Column Family NoSQL database.
Book Title: Business Intelligence and
Analytics: Systems for Decision
Support
By Ramesh Sharda, Dursun Delen,
Efraim Turban
Publication Date: 2015
Edition: 10th
Publisher: Pearson
Publisher Location: Upper Saddle
River, NJ:.
ISBN-13: 978-0-13-305090-5
Mason
263
Table 2. An example of how data is stored in a column-oriented NoSQL database.
Business Intelligence and Analytics: Systems for Decision Support
Book Details (includes authors, year, edition, publisher, etc.)
Ramesh Sharda
Dursun Delen
Efraim Turban
2015
10th
Pearson
Graph supports data that has an undefined number of network connections (Abramova, Ber-
nardino, & Furtado, 2014). This type of data supports map data, bus transportation links and
relationships found in social media. For example, traversing a graph to find the shortest distance
between cities is a daunting task using a conventional RDBMS. However, a Graph NoSQL data-
base can facilitate this type of processing. Allegro, Neo4j and Virtuoso are examples of Graph
Databases. Figure 2 is an example of a graph database for the distance between major cities in
Colorado.
64.6 miles
68.5 Miles
Figure 2. An example of a Graph NoSQL Database for the distance between cities.
Data Modeling Design Techniques for a MongoDB NoSQL
Database
MongoDB is an open source Document-oriented NoSQL database that was initially developed in
2007 by a company called 10gen (Medina, 2014). Data modeling on the conceptual level (CDM)
and the logical data model (LDM) is very similar to what is done for a RDBMS (Hoberman,
2014). Either normalized data models (3rd normal form) or dimensional models (Star Schemas)
are acceptable modeling approaches. The major changes to the data model occur when the data
model is transformed from the LDM to a physical data model (PDM). For example, shown in
Figure 3, is a very simple example of a 3rd normal form data model (a.k.a. Entity Relationship
Diagram - ERD) for a fictitious Rental Car company that includes entities and the associated rela-
tionships.
Denver,
CO
Fort Collins,
CO
Springs,
NoSQL Databases and Data Modeling
264
Figure 3. An example of a Data Model for a fictitious Rental Car company.
In this example, an occurrence of a rental is created when a vehicle is rented by a customer from a
particular branch. Customers must have a physical address and a Branch must have a physical
address.
Hoberman (2014) shows how to conduct a grouping process to convert a LDM into a PDM which
can be later used to create the Document-oriented NoSQL Database. There are two options with-
in MongoDB for modeling relationships, embedding data or referencing documents. Embedding
is the process of merging together two or more entities into one entity using a hierarchy. Refer-
encing is similar to creating a foreign key in a RDBMS that serves as a pointer form one entity
(collection) to another entity (collection).
There are five heuristics that Hoberman (2014) suggests for deciding whether to embed or refer-
ence:
1. Data that is that is frequently queried from multiple entities at the same time can be em-
bedded into one document.
2. Entities that are considered dependent entities can be embedded into one entity.
3. If there is a one to one relationship between two entities, we embed one of the entities in-
to the other entity.
4. Entities that experience similar volatility (inserts, updates and deletes) at the same rate
can be embedded together.
5. Entities that are not key entities, but have relationships with key entities, can be refer-
enced and not embedded.
Using the LDM provided for the Rental Car Company and the heuristics listed above, entities are
grouped together as shown in Figure 4.
Mason
265
Figure 4. An example of groupings for a Rental Car LDM.
The logic behind these entity groupings for the PDM is as follows:
Branch Address (black circle) and Customer Address (red circle) are both dependent enti-
ties of the Address entity. Therefore, two embedded entities for Customer Address and
Branch Address will be created.
Rental, Customer and Branch (blue circles) will be frequently queried together and will
have the same volatility. Although Rental is not a dependent entity since it has a Rent-
al_SID, the volatility makes it a good candidate to be embedded in a new entity called
CustomerRental. This new entity will include the embedded attributes of Branch SID
and Name and will have a reference to Vehicle which is an independent entity. Figure 5
shows the PDM after the grouping process has been completed. Notice that Customer
Rental has references to Branch Address and Customer Address.
Figure 5. Shows the PDM after the grouping process has been completed.
NoSQL Databases and Data Modeling
266
The PDM can then be used as a template for a collection of documents. Below are samples of
MongoDB documents for the Vehicle entity and the CustomerRental entity. Partitioning and ad-
ditional secondary indexes can be added to improve performance during physical implementation
(MongoDB, 2014).
Vehicle:
{ Vehicle_SID: “2134567”,
VIN: “1234ZC33456XYZ2”,
Available_Indicator : “Y”,
Odometer : “3507”,
Make : “Toyota”,
Model : “Camry”,
Year : “2014”,
Condition_Desc : “Scratches to paint on the left back door and scratches on the rear
bumper”
}
In the example shown below, John R. Smith has rented two cars over the last year. The first car
is the Toyota Camry that was picked up at the Aurora location and returned to the Broadway lo-
cation. Notice the reference to the Toyota Camry via the Vehicle_SID. The second car rental
included a different car that was rented at the end of August and returned in September. Notice
how the two rentals are embedded in the document underneath CustomerRentals.
Customer Rental:
{CustomerSID: “987765”,
LastName: “Smith”,
FirstName: “John”,
MiddleName: “Ricardo”,
Rentals: [
{ Rental_SID: “2300453”,
Pickup_Date: ISODate(“2014-07-13”),
Branch_SID: “3374”,
Branch_Name: “Aurora”
Dropoff_Date: ISODate(“2014-07-19”),
Branch_SID: “3370”,
Branch_Name: “Broadway”,
Vehicle_SID: “2134567”},
Mason
267
{ Rental_SID: “2307111”,
Pickup_Date: ISODate(“2014-08-25”),
Branch_SID: “3374”,
Branch_Name: “Aurora”
Dropoff_Date: ISODate(“2014-09-02”),
Branch_SID: “3374”,
Branch_Name: “Aurora”,
Vehicle_SID: “2134977”},
] }
Abramova, Bernardino, & Furtado (2014) conducted performance testing using five different
NoSQL databases that included the testing of MongoDB. They found that MongoDB had the
largest increase in run time (slow performance) when the number of updates increased. Mon-
goDB uses a data locking mechanism that has the direct effect of slowing update performance.
However, in regards to read performance, MongoDB performed very well and is considered a
database that is optimized for reads. The authors did not provide details about how the test data
was modeled (e.g. embedded or referenced).
Conclusion
NoSQL databases are an important component of Big Data for storing and retrieving large
amounts of data. RDBMS use the ACID theorem for data consistency, whereas NoSQL Data-
bases use BASE. RDBMS scale vertically and NoSQL Databases can scale both horizontally
(sharding) and vertically. Although NoSQL databases provide performance gains, some re-
searchers are cautious and skeptical about data consistency. This paper describes the four types
of NoSQL databases: Document-oriented, Key-Value Pairs, Column-oriented and Graph. Data
modeling for Document-oriented databases is similar to data modeling for traditional RDBMS
during the conceptual and logical modeling phases. However, as was demonstrated in this paper,
physical data modeling for a document-oriented database is different. Separate entities can be
merged (denormalized) into one document by using embedding and the concept of a foreign key
is supported by a reference.
References
Abramova, V., Bernardino, J., & Furtado, P. (2014). Which NoSQL database? A performance over-
view. Open Journal of Databases (OJDB), 1(2), 17-24.
Cloudera. (2014). Cloudera Big Data and Hadoop Sessions, May 21, 2014. Location: Hyatt Ho-
tel, Denver, CO.
Devlin, B. (2014). Business un-intelligence: The marriage of BI and Big Data. ACM Webinar
on June 17, 2014.
Gartner. (2014). Gartner's 2014 hype cycle for emerging technologies maps the journey to digi-
tal business. Retrieved from http://www.gartner.com/technology/home.js
NoSQL Databases and Data Modeling
268
Hoberman, S. (2014). Data modeling for MongoDB: Building well-designed and supportable
MongoDB databases (1st ed.) Basking Ridge, NJ: Technics Publications. ISBN: 978-1-
935504-70-2.
Leavitt, N. (2010). Will NoSQL databases live up to their promise? Computer. Published by the
IEEE Computer Society. p. 12-14.
Medina, J. (2014). NoSQL 133 success secrets – 133 most asked questions on NoSQL – What
you need to know (1st ed.). Rock Hill, SC: Emereo Publishing. ASIN: B00QE3WDDO
Mohan, C. (2013). History repeats itself: Sensible and NonsenSQL aspects of the NoSQL hoop-
la. EDBT/ICDT 2013 Joint Conference, March 18–22, 2013, Genoa, Italy. ISBN: 978-1-
4503-1597-5.
Moniruzzaman, A. B., & Hossain, S. A. (2013). NoSQL database: New era of databases for big data ana-
lytics - Classification, characteristics and comparison. International Journal of Database Theory and
Application, 6(4), 1-14.
MongoDB. (2014). The MongoDB 2.6 manual. Retrieved from
http://docs.mongodb.org/manual/core/introduction/.
Ohlhorst, F. (2013). Big data analytics: Turning big data into big money. Hoboken, NJ. John
Wiley and Sons. ISBN: 978-1-118-14759-7.
Roe, C. (2012). ACID vs. BASE: The shifting pH of database transaction processing. Retrieved
from http://www.dataversity.net/acid-vs-base-the-shifting-ph-of-database-transaction-
processing/
Sadalage, P., & Fowler, M. (2013). NoSQL distilled: A brief guide to the emerging world of pol-
yglot persistence. Upper Saddle River, NJ: Pearson. ISBN: 978-0-321-82662-6.
Sharda, R., Delen, D., & Turban, E. (2015). Business intelligence and analytics: systems for de-
cision support (10th ed.). Upper Saddle River, NJ: Pearson. ISBN-13: 978-0-13-305090-5.
Biography
Bob Mason joined Regis University as a full-time faculty member in
January of 2011 after completing his Ph.D. at Nova Southeastern Uni-
versity located in Davie, FL. Bob is the program coordinator for two
programs: MS in Database Technologies and MS in Software Engi-
neering and Database Technologies. Prior to accepting this position
with Regis, Bob was employed by various Fortune 500 companies for
25 years as a Data Architect, DBA and Software Engineer. He also
was an affiliate faculty member at Regis in the area of Database Tech-
nologies for 10 years. Bob has just completed a new course for the
Regis University CC&IS MS in Database Technologies degree program called Introduction to
NoSQL Databases that covers the four types of NoSQL databases with hands-on lab exercises.
Courses on specific NoSQL databases, such as Cassandra, MongoDB and Neo4J will be available
soon within CC&IS.
...  ACID Free: The acronym ACID stands for atomicity, consistency, isolation, and durability and supports the SQL transaction notion [6,7]. NoSQL, a distributed database, provides improved data storage by relying on consistency but does not guarantee ACID properties. ...
... It assures a high degree of availability through replication (no requirement to have identical copies in all nodes for all the time). In order to prioritize availability above consistency, eBay proposed this database design behavior, and NoSQL adopts it [6][7][8][9]. ...
... The Document Model stores and manages semi-structured data or documents instead of atomic data [41]. For instance, the background educational data gathered from students' activities using a document model proved to be helpful to create adaptable educational documents [42]. ...
... Hierarchical Data Model [34], Network Data Model [36], Object-oriented Data Model [66], Relational Data Model [39], and NoSQL Data Models [41][42][43][44]50,51]. ...
Article
Full-text available
Numerous studies have established a correlation between creativity and intrinsic motivation to learn, with creativity defined as the process of generating original and valuable ideas, often by integrating perspectives from different fields. The field of educational technology has shown a growing interest in leveraging technology to promote creativity in the classroom, with several studies demonstrating the positive impact of creativity on learning outcomes. However, mining creative thinking patterns from educational data remains a challenging task, even with the proliferation of research on adaptive technology for education. This paper presents an initial effort towards formalizing educational knowledge by developing a domain-specific Knowledge Base that identifies key concepts, facts, and assumptions essential for identifying creativity patterns. Our proposed pipeline involves modeling raw educational data, such as assessments and class activities, as a graph to facilitate the contextualization of knowledge. We then leverage a rule-based approach to enable the mining of creative thinking patterns from the contextualized data and knowledge graph. To validate our approach, we evaluate it on real-world datasets and demonstrate how the proposed pipeline can enable instructors to gain insights into students’ creative thinking patterns from their activities and assessment tasks.
... Hacking and various attacks to cloud infrastructure would affect multiple clients even if only one site is attacked. These risks can be mitigated by using security applications, encrypted file systems, data loss software, and buying security hardware to track unusual behavior across servers [21]. ...
Thesis
Full-text available
Big data has long been the topic of fascination for computer science enthusiasts around the world, and has gained even more prominence in recent times with the continuous explosion of data resulting from the likes of social media and the quest for tech giants to gain access to deeper analysis. This paper discusses various tools in big data technology and conducts a comparison among them. Different tools namely Sqoop, Apache Flume, Apache Kafka, Hive, Spark and many more are included. Various datasets are used for the experiment and a comparative study is made to figure out which tool works faster and more efficiently over the others, and explains the reason behind this.
... To znači da se svi podaci za dokumenta skladište su samom dokumentu. Posebne karakteristike koje prete izbacivanju SQL-a su (Mason, 2015): ...
Conference Paper
Full-text available
The purpose of collecting and organizing data in a Data warehouse is to make it easily accessible so that it can be used efficiently and easily for business analysis. Data Warehouse and the use of business intelligence in public administration aim to improve public administration competencies. The impact of business intelligence in the information system for assessing the quality of project implementation in institutions and its adequate assessment in B&H institutions is a key driver for improving the quality of project implementation. Considerations and conducted research prove that there is a possibility to provide resources within the public administration’s own resources and to create a business intelligence solution for the needs of public administration. The main goal of introducing business intelligence in public administration is its digitalization and interoperability, which means not only the realization but also the improvement of public administration capacity and indirectly encouraging employment and achieving sustainable development. The development of business intelligence should have the status of a priority and include, in addition to investment in research and development and investment in human capital, institutions and practice.
... Although the relational database is easier to understand and apply, it is subject to too many restrictions. The data stored in the NoSQL database is some semi-structured data or unstructured data, and the performance of the database is higher, the scalability is also better, and it is more convenient to read and store some unstructured data (Mason 2015). From the literature, we can see that some scholars in our country have analyzed and sorted out related literature, and made suggestions on the development and dynamic characteristics of classroom teaching functions (Peterson 1988). ...
Article
Full-text available
With the continuous development of science and technology, we have fully entered the information age, people's entertainment life is becoming more and more abundant, and the Internet has also provided people with a lot of convenience. The advent of the Internet age means that more and more various kinds of data are appearing, and the situation is becoming more and more abundant. The use of traditional relational databases can no longer store these data, nor can it be queried. With the continuous development of voice technology, database technology based on NoSQL has become a research hotspot. The availability of NoSQL databases is very high, the scalability is also very high, and the efficiency is very high when processing data. Based on the development of 5G networks and the development of big data technology, this research proposes a brand-new business architecture. This architecture can use the network to store data on the basis of massive data. At the same time, we also described the business scenario, providing a new idea for more intelligent education services. We use this model for teaching in schools, and we can transmit some spoken language resources to the school through the Internet for students to use for learning. Nowadays, the application range of AI technology and intelligent technology has become more and more extensive, and these technologies have also been applied in the education field. We can apply this brand-new technology in teaching to promote the development of teaching and improve students’ enthusiasm and learning effect.
... Although the relational database is easier to understand and apply, it is subject to too many restrictions. The data stored in the NoSQL database is some semi-structured data or unstructured data, and the performance of the database is higher, the scalability is also better, and it is more convenient to read and store some unstructured data [13]. From the literature, we can see that some scholars in our country have analyzed and sorted out related literature, and made suggestions on the development and dynamic characteristics of classroom teaching functions [14]. ...
Preprint
Full-text available
With the continuous development of science and technology, we have fully entered the information age, people's entertainment life is becoming more and more abundant, and the Internet has also provided people with a lot of convenience. The advent of the Internet age means that more and more various kinds of data are appearing, and the situation is becoming more and more abundant. The use of traditional relational databases can no longer store these data, nor can it be queried. With the continuous development of voice technology, database technology based on NoSQL has become a research hotspot. The availability of NoSQL databases is very high, the scalability is also very high, and the efficiency is very high when processing data. Based on the development of 5G networks and the development of big data technology, this research proposes a brand-new business architecture. This architecture can use the network to store data on the basis of massive data. At the same time, we also described the business scenario, providing a new idea for more intelligent education services. We use this model for teaching in schools, and we can transmit some spoken language resources to the school through the Internet for students to use for learning. Nowadays, the application range of AI technology and intelligent technology has become more and more extensive, and these technologies have also been applied in the education field. We can apply this brand-new technology in teaching to promote the development of teaching and improve students' enthusiasm and learning effect.
Article
At the heart of every ERP is a single database that allows employees of the organization to rely on the same consistent set of information. Data migration is an important component of ERPs upgrade, implementation and integration projects. At the same time, the migration scenario can be complex and lengthy, require a large amount of resources and high competencies of the management staff. Underestimating the required time and effort can lead to a significant increase in costs and delay in the commissioning of the ERP. The accuracy and completeness of the transmitted data is also of great importance, since many aspects of the business – customer satisfaction, decision-making, supply chain and relationships with partners – will depend on the quality of the data. Despite this, the complexity of data migration scenarios is traditionally underestimated. In most existing studies, data migration is considered mainly from the technical side. Aspects related to the conceptual content of data migration, its relationship with business processes and company management, as well as the specific role of data migration in projects of updating, implementing and integrating ERPs, remain insufficiently developed. The aim of the study. The aim of this study is to supplement theoretical ideas about the content, diversity, problems and strategies of data migration in the context of ERPs. Materials and methods. The article summarizes and systematizes the types, stages of the project, key strategies and the most significant problems of data migration. For the purposes of this study, the material from 23 sources on a similar topic was reviewed, revised and supplemented. Results. The paper describes classifications of types of data migration, provides examples related to ERP. The stages of the data migration project are described in detail and supplemented. The comparison of two key data migration strategies is given, their advantages and disadvantages are highlighted, recommendations for the application of a particular strategy are formed. The main problems of data migration in the context of ERPs, the consequences of these problems for the entire migration project are considered. Conclusion. The results obtained suggest that data migration is a complex and time-consuming process that requires serious competencies from management and performers. The migration strategy should be developed in an effective way and take into account all the variety of influencing factors.
Article
Full-text available
Data management systems rely on a correct design of data representation and software components. The data representation scheme plays a vital role in how the data are stored, which influences the efficiency of its processing and retrieval. The system components design realizes software engineering concepts to enable performance metrics such as scalability, efficiency, flexibility, maintainability, and extendibility. This paper presents a data management system that uses a graph-based data representation scheme to achieve an efficient data retrieval when using graph-based databases. Input data are transformed into vertices, edges, and labels while inserting them into the database. The proposed system consists of three layers which are: system beans layer, data access layer, and the database engine. Healthcare data are used to evaluate the system in comparison with resource description framework (RDF) semantics. Extensive experiments are conducted to compare different scenarios of data storage and retrieval using Neo4J, OrientDB, and RDF4J. Experimental results show that the performance of the proposed graph-based approach outperforms RDF4J framework in terms of insertion and retrieval time.
Article
Full-text available
Testing the graphical user interface (GUI) of a software product is important to ensure the quality of the system and therefore to improve the user satisfaction of using the software. Using tools to support the testing process solves the problems of the manual testing which is tedious and time consuming. Capture and replay tools are commonly used in GUI testing. In this paper we compare five open source capture and replay tools, namely Abbot, Jacareto, JFCUnit, Marathon and Pounder, in terms of ease of use and capture and replay capabilities. In order to compare the tools, we defined comparison characteristics and after evaluating each tool, we selected the one that showed the best results in almost all criteria. The results of our study may serve as guidance for any novice tester or company that pretends to automate the GUI testing process using open source capture and replay tools.
Article
Full-text available
Digital world is growing very fast and become more complex in the volume (terabyte to petabyte), variety (structured and un-structured and hybrid), velocity (high speed in growth) in nature. This refers to as Big Data that is a global phenomenon. This is typically considered to be a data collection that has grown so large it can not be effectively managed or exploited using conventional data management tools: e.g., classic relational database management systems (RDBMS) or conventional search engines. To handle this problem, traditional RDBMS are complemented by specifically designed a rich set of alternative DBMS; such as - NoSQL, NewSQL and Search-based systems. This paper motivation is to provide - classification, characteristics and evaluation of NoSQL databases in Big Data Analytics. This report is intended to help users, especially to the organizations to obtain an independent understanding of the strengths and weaknesses of various NoSQL database approaches to supporting applications that process huge volumes of data.
Conference Paper
In this paper, I describe some of the recent developments in the database management area, in particular the NoSQL phenomenon and the hoopla associated with it. The goal of the paper is not to do an exhaustive survey of NoSQL systems. The aim is to do a broad brush analysis of what these developments mean - the good and the bad aspects! Based on my more than three decades of database systems work in the research and product arenas, I will outline what are many of the pitfalls to avoid since there is currently a mad rush to develop and adopt a plethora of NoSQL systems in a segment of the IT population, including the research community. In rushing to develop these systems to overcome some of the shortcomings of the relational systems, many good principles of the latter, which go beyond the relational model and the SQL language, have been left by the wayside. Now many of the features that were initially discarded as unnecessary in the NoSQL systems are being brought in, but unfortunately in ad hoc ways. Hopefully, the lessons learnt over three decades with relational and other systems would not go to waste and we wouldn't let history repeat itself with respect to simple minded approaches leading to enormous pain later on for developers as well as users of the NoSQL systems! Caveat: What I express in this paper are my personal opinions and they do not necessarily reflect the opinions of my employer.
Article
Many organizations collect vast amounts of customer, scientific, sales, and other data for future analysis. Traditionally, most of these organizations have stored structured data in relational databases for subsequent access and analysis. However, a growing number of developers and users have begun turning to various types of nonrelational, now frequently called NoSQL-databases. Nonrelational databases, including hierarchical, graph, and object-oriented databases-have been around since the late 1960s. However, new types of NoSQL databases are being developed. And only now are they beginning to gain market traction. Different NoSQL databases take different approaches. What they have in common is that they're not relational. Their primary advantage is that, unlike relational databases, they handle unstructured data such as word-processing files, e-mail, multimedia, and social media efficiently. This paper discuss issues such as limitations, advantages, concerns and doubts regarding NoSQL databases.
Location: Hyatt Hotel
  • Cloudera
Cloudera. (2014). Cloudera Big Data and Hadoop Sessions, May 21, 2014. Location: Hyatt Hotel, Denver, CO.
Business un-intelligence: The marriage of BI and Big Data
  • B Devlin
Devlin, B. (2014). Business un-intelligence: The marriage of BI and Big Data. ACM Webinar on June 17, 2014.
Gartner's 2014 hype cycle for emerging technologies maps the journey to digital business
  • Gartner
Gartner. (2014). Gartner's 2014 hype cycle for emerging technologies maps the journey to digital business. Retrieved from http://www.gartner.com/technology/home.js
Data modeling for MongoDB: Building well-designed and supportable MongoDB databases
  • S Hoberman
Hoberman, S. (2014). Data modeling for MongoDB: Building well-designed and supportable MongoDB databases (1st ed.) Basking Ridge, NJ: Technics Publications. ISBN: 978-1-935504-70-2.