Conference PaperPDF Available

Overcoming the Security Issues of NoSQL Databases

Authors:

Abstract

With the current escalating popularity and use of NoSQL databases, the amount of sensitive data stored in these types of systems is increasing significantly, which exposes a lot of security vulnerabilities, threats and risks. This paper presents effective ways to mitigate or even completely overcome them. The purpose of the developed practical tests using MongoDB is to evaluate how applying those security measures can affect the overall system performance. The results of this experimental research are presented in this article.
Copyright by Technical University - Sofia, Plovdiv branch, Bulgaria
OVERCOMING THE SECURITY
ISSUES OF NOSQL DATABASES
TONY KARAVASILEV, ELENA SOMOVA
Abstract: With the current escalating popularity and use of NoSQL databases, the amount
of sensitive data stored in these types of systems is increasing significantly, which exposes a
lot of security vulnerabilities, threats and risks. This paper presents effective ways to
mitigate or even completely overcome them. The purpose of the developed practical tests
using MongoDB is to evaluate how applying those security measures can affect the overall
system performance. The results of this experimental research are presented in this article.
Key words: security, encryption, databases, NoSQL, MongoDB, RESTful API, firewall
1. Introduction
Due to the spreading use of modern cloud
computing solutions and the increasingly larger data
volumes for storage, the adoption of a new class of
non-relational databases has arisen, also known as
NoSQL or “Not Only SQL”. Such kind of databases
have existed even before relational and object-
oriented database management systems but was
resurrected and developed in the recent years by
information technology companies for providing
private problem solutions for their growing
distributed web applications with millions of users.
The main advantage of these databases is that they
cope up with processing and storing unstructured
data way better than standard relational SQL
solutions. [1]
The simplicity of their design, schema-less
models and primitive query languages allows them
to perform, scale and distribute much better in
certain situations. These databases provide an
alternative for storing some types of data more
efficiently than other ones and support advanced
clustering solutions, including load balancing and
transparent backup features.
Using a NoSQL database depends entirely
on the problem it must solve and the data type it is
going to store. Depending on the type of
unstructured data, there are four main subclasses of
non-relation databases to choose from:
Document-oriented document string formats;
Column-oriented column or tabular nesting;
Graph-based graph structures with nodes;
Key-value store unique associative arrays.
Some solution can also provide more than
one model available and are called multi-model
systems, this may or may not include relational
functionalities. Other traditional SQL solutions have
merged a lot of the NoSQL capabilities (clustering,
sharding, XML/JSON document objects and linear
structures like the array type) and are forming a new
database class called NewSQL, unfortunately
inheriting a lot of the non-relational security and
data integrity problems. Clearly, there are no set
standards and these databases allow a bit more
flexibility. This the main reasons why they are
typically used for caching purposes, search engine
implementations, file storage and activity logging.
Unlike relational database management
systems, these types of software products do not
have complex mechanisms to guarantee data
consistency and have almost no support for security
features at the database level. Having that said, this
makes them vulnerable to both security threats and
irreversible data loss, which is a serious problem
building up over the years. [2]
The main purpose of this article is pointing
out the security risks introduced by the use of
modern non-relational databases and effective ways
to mitigate or completely overcome them. There are
multiple issues pointed out and a detailed
explanation of how they can be eliminated with the
use of end-to-end encryption and a security-oriented
configuration of all services. This paper includes a
practical implementation of how to integrate the
explained security precautions and extended tests
showing how this affects the overall system
performance and the data storage volume size.
This research is also a part of ongoing work
in developing an advanced PHP object-oriented
software framework for cryptographic services.
2. NoSQL security issues and their remedies
When using NoSQL as a solution for
dealing with sensitive and top secret data in the real
world a few problems may occur. Even when using
paid non-relational database systems, paid vendor
support and hiring well-paid professional database
administrators, you can introduce big security holes
in your system and compromise the overall data
privacy, putting your company at imminent risk.
The next sections show the most frequent
issues that come with these type of systems and
effective ways of overcoming them completely.
2.1. Lack of authorization features
In general, most professional NoSQL
solutions have either only basic access control
mechanism or do not support any at all. This
proposes a huge problem in the overall application
security and leaves out an open possibility of hostile
access without any credentials or restrictions set. [1]
A common mistake that big data fans do is
to use the default product installation credentials
and configuration. It is a must to enable and use
strong authentication credential. In the case where
there is a huge lack of authorization functionalities,
access control restrictions, user role capabilities and
auditing features in the chosen NoSQL system, the
only thing we can do to mitigate the problem is to
build a RESTful API (Application Programming
Interface) around our database solution. A lot of in-
memory key-value solutions and search engines
suffer from this problem by design. Building our
own application layer of access restriction on top of
an unprotected system is a common professional
security approach. Also, implementing access
tokens as credentials for the Web API clients is a
great way of boosting security.
Overcoming this security issue is obligatory
and must not be underestimated. It is important to
note that message queue broker systems also suffer
from the same authentication security problems,
although they are not technically database solutions.
2.2. Transport encryption and client drivers
Sadly, a lot of modern NoSQL products do
not provide network transport layer encryption over
TLS/SSL. The lack of this security trait is both on
server and client side. The support of different
programming language consumption drivers is not
good enough and may introduce the risk of data
corruption, theft or even loss. Faulty drivers may
injure the system stability and performance a lot. [2]
If your desired solution supports transport
encryption, always use TLS/SSL for client-server
communications and try to avoid the use of self-
signed generated certificates. When your NoSQL
has no such capabilities implemented, your best
option is to restrict non-encrypted connections via
the use of firewalls and develop a RESTful API on
top of your database which accepts only
communications via HTTPS (HTTP over TLS/SSL)
encrypted traffic, using the most supported client
programming language and up to date drivers.
Mitigating this security threat actually
upgrades the proposed solution in the last
authorization section by forcing the use of
encrypted communication protocols and highly
supported client library connection drivers.
2.3. Missing database encryption features
Most of the relational database systems
have built-in encryption storage engines, encryption
aggregate functions and data integrity features. In
the other corner, modern NoSQL solutions lack
such at-rest encryption functionalities and store data
as plain text which imposes too many security risks.
There are only a few paid cloud or
enterprise NoSQL products that support native
storage encryption engines and embedded recovery
functionalities. Mitigating this problem can be done
by developing a transparent encryption application
layer. This kind of software layer encrypts data
before sending it to the database and decrypts it
before returning it to your software. The transparent
middleware can either be implemented directly in
your application’s client connection internal library
or on another server, acting as a communication
proxy between the database and the application. The
only limitation created by using this approach is that
you cannot search directly inside the encrypted
fields via the database query language. [3]
A good approach is to include this feature
when building a RESTful API. It is also
encouraging to combine it with native storage
encryption functions when there are such available.
2.4. NoSQL Injections and CSRF attacks
With the emergence of new query formats
and languages, most of the old SQL injection
techniques are pointless but this does not make
NoSQL immune to query injections. Every non-
relation product can be attacked or exploited,
depending on the type of your database language
query, used message formats and its available native
application programming interfaces. [4]
The main cause for the rise of these NoSQL
injection attacks is because a lot of non-relational
database systems provide either an embedded
RESTful API or load one via third-party extensions,
that use JSON or XML formats. This can enable an
attacker to execute valid malicious code, such as
native JavaScript code, via the request’s payload
data. As spoken before, building your own custom
RESTful API on top of the database is a common
trait but if done incorrectly may easily be exploited
via NoSQL injections. Also, it is possible to
accomplish some cross-site request forgery attacks
(CSRF) and user session hijacking.
The best way of avoiding this kind of
attacks is via input sanitization, especially when
creating your own RESTful API. The cleansing of
the received data includes validations, filtering,
whitelisting, blacklisting, regular expressions and
Copyright by Technical University - Sofia, Plovdiv branch, Bulgaria
escaping of special characters. Another good
practice is adding a per client pseudo-randomly
generated token with time-to-live (TTL) expiration
or regeneration period for avoiding forged user
requests. Every request must contain a valid token
or the request will not be handled and the user may
become suspended after a given number of retries.
The disabling or firewall blocking of unused native
APIs and extensions is always recommended.
Mitigating this security risk upgrades the
previously proposed solution by adding extra input
validations and further request integrity verification
enchantments.
2.5. Cluster desynchronization issues
Most NoSQL products advertise their
decentralization features like sharding and
clustering as a data-friendly way to scale out,
containing no single point of failure. It is important
to note that almost all SQL solutions support these
features but have more reliable and time-tested
realizations than non-relation systems. [2]
When using sharding, each data partition or
shard is held on a separate database server instance
and distributing the data on more than one machine.
Some features can include storing duplicates of the
shards on other servers for backup reasons. The
main problem is that even if one server crashes you
may end up with losing a certain amount of shards
without any real backup and compromise the
overall stored information. Knowing this, a
malicious attacker may exploit a misconfigured
cluster or penetrate known vendor vulnerabilities to
destroy, hold for ransom or modify your data.
As previously noted, another traditional
solution for scaling out, load balancing or creating
active backups is called clustering. Supported types
may include master-slave replication, master-master
replication, shared-nothing clustering, auto-
sharding, hybrid storage and other high availability
distributed approaches. Most of those clustering
techniques provide a certain amount of guarantee
for data integrity, backup and availability on
hardware or software failures. There are a lot of
hidden problems like cluster desynchronization and
some failover dead-locking situations where your
information gets corrupted or even irreversible lost.
This may lead to security leaks where sensitive data
is being returned to users or gets permanently
defective without any real backup.
As with any other product, there is a huge
gap between what is being advertised or sold and
what you get in reality. A large portion of the
NoSQL community follows certain solutions for
philosophical reasons rather than practically proven
production use cases. In theory, both SQL and
NoSQL clustering solutions can fully eliminate all
failure cases but in reality, when misused,
misconfigured or not implemented correctly they
can create even bigger security and integrity issues.
The contra measures you can take involve
hourly backups to at least two separate physical
devices and creating simulations of all know crisis
situations before using the clustering configuration
in a production environment. Also, keep your
database solution up to date, always encrypt your
backups and never store them on the same
production machine. Only then you can fully
harvest their true performance and backup gains.
2.6. Virtualization leaks and disk theft risks
Even when you have encrypted your
database and have taken security measures you are
still not immune to physical disk theft or
virtualization snapshot leaks. [5]
The security threats of stealing the physical
disk or the virtual machine backup clone files
involve gaining database credentials or sensitive
data via log files analysis, raw cache files,
unencrypted database diagnostic tables or persistent
in-memory data structures. Other risks include
gaining access to guest virtual machine clones or
virtualization snapshots which contain memory
dumps of a passed machine state and are full of
unencrypted database structures, active in-memory
key-value collections or even loaded application
credentials.
To mitigate these security issues you must
apply transparent disk encryption on all physical
disks, virtualization host environments and virtual
machine guests with a strong encryption key. This
operating system feature can be used without
causing any decrease in the overall system
performance. Note that if you lose or forget your
secret password, you will not be able to start up
your operating system or restore any usable data
from your drive.
3. Analyzing and improving the MongoDB
database security
With the increasing use of MongoDB in
both startups and enterprise solutions worldwide
and also being one of the most feature-rich NoSQL
databases the need of implementing security
protection has become huge. The next sections
provide a detailed practical analysis of security
hotspots and how to tighten up the overall database
protection using previously discussed approaches.
3.1. Overview of MongoDB features
MongoDB is the most popular document-
oriented store that uses JSON (JavaScript Object
Notation) format documents providing flexible and
easily changeable schemas. It also provides server-
side scripting with JavaScript and binary-encoded
serialization of JSON-like documents. This open-
source NoSQL software is provided for free and
also has an enterprise paid version with extended
features and live support. [6]
MongoDB provides a huge variety of high
availability clustering features like load balancing,
replication and sharding. It some cases it also can be
used efficiently as a file storage server or powerful
caching system. The query language supports
aggregation, range queries, regular expressions and
different field indexing types.
The main problem is that the default
security configuration of MongoDB has been
exploited many times in production setups during
the years and even held for ransom. The next
sections will discuss how to avoid a security breach
by taking certain precautions.
3.2. Enforcing authorization, auditing and
input data sanitization
After the product installation, the database
access is publicly exposed without any credentials
or verification configured, making it easy for
anyone to connect and take full control of the
database. A lot of system architects neglect
MongoDB’s initial configuration and are commonly
hacked for it.
To avoid this, you must enable all native
authentication features. First of all, you have to add
a user administrator for the MongoDB instance and
define different limited access roles for every other
client account that can connect to the database.
Next, you must enable the native system auditing
facility for keeping track of all configuration
changes and log access history. For even further
hardening, you can set running of the database
processes with a dedicated operating system user
account that has limited permissions. In some cases,
disabling server-side scripting on database level can
remove the possibility of some types of NoSQL.
Also, if you use a cluster setup then never forget to
define proper authentication between cluster
members and always use long complex credentials.
Finally, to reduce the access, even more, it
is a great idea to develop an internal RESTful API
that connects to the database by using only a limited
user account. Also, when developing such adapter
software, you can completely sanitize your data
input, use advanced session forgery protection
techniques and create complicated authentication
features. Remember to allow only direct
connections from the internal API and block all
native MongoDB client communications via
network or system firewall.
This way you ensure that malicious code
execution or unauthorized access to the database is
not possible and will not disable the use of native
MongoDB high availability cluster configurations.
Also, you can reliably scale out and increase
performance by deploying multiple instances of
your API and using a network traffic load balancer
for distributing the incoming request between them.
It is important to note that unlike MongoDB
a huge amount of the NoSQL solutions do not
provide even basic authentication features. Either
way, you would have no real choice but to develop
your own internal database adapter software.
3.3. Using encrypted communications and
limiting network exposure
When installing the product, a lot of people
leave out the default access port and connection
protocol publicly exposed. You must always change
the default port and switch to encrypted TLS/SSL
communications for both all your cluster servers
and client machines. This way your data is
protected in-transfer and cannot be altered by man-
in-the-middle attacks (MITM).
You must never leave a MongoDB server
instance visible over the Internet or accessible in
non-management computer networks. You can even
disable the MongoDB networking service and
switch to using UNIX sockets instead, especially if
you are going to use only one server instance and
hide it behind an isolated RESTful API.
However, enabling the network is a must
when using clustering configurations and the most
professional way of protecting any type of server
instances is to combine the use of network firewalls
with Virtual Local Area Networks (VLAN). This
way you can partition and isolate different networks
with limited access to other devices or computers.
Always limit the network exposure on every
service you use and switch to the use of encrypted
protocol connections only. Remember, a service
that is not visible or accessible cannot be easily
exploited, flooded or hacked.
3.4. Applying data storage encryption
The data protection at-rest is truly important
but most databases do not provide native encryption
functions or secure storage engines. For example,
MongoDB provides native encryption only for its
enterprise paid version since a few years back. It is
recommended to use it when available and also
develop a transparent encryption middleware. [6]
When creating such encryption application
layer, you must always encrypt sensitive fields
before inserting into the database and decrypt after
fetching them back. This can be included in your
RESTful API logic. The main limitation of using
such middleware is that searching inside encrypted
Copyright by Technical University - Sofia, Plovdiv branch, Bulgaria
document values is not possible without having to
first decrypt them all.
Another good habit is to always enable the
operating system transparent disk encryption to
ensure data and system logs safety even if a
physical disk theft occurs. The use of this feature
will not harm performance and significantly
increase the overall server security. You can also
encrypt service log files over time, implement
encrypted application file logging adapter classes
and minimize the amount of service related details.
4. Testing environment specification
For the results to be adequate we have
chosen to run the tests in a virtual machine
environment created with Oracle VM VirtualBox
version 5.2.8 hypervisor. The setup consists of two
virtual machines. The first one is running Apache
2.4.18 with PHP 7.2.3-FPM (FastCGI Process
Manager implementation) for connecting to the
database and executing the experiments. The second
one has a single MongoDB 3.6.3 server instance for
the data storage purposes.
The specification of the allocated resources
for each of the machines is shown in Table 1.
Table 1. Virtual machine specification
Detail
CPU
Intel i7-6700HQ, 2 cores, 2.59GHz, 6 MB L3
RAM
DDR4, SODIMM, 4096 MB, 2.40 MHz
GPU
Intel HD Graphics 530, 16 MB, 2.40 MHz
HDD
42GB, 7200 RPM, 32 MB cache, 2GB swap
LAN
VirtualBox Intel PRO/1000 MT 82540EM
OS
Ubuntu Server 16.04.3 LTS x64, Kernel 4.4.0
The virtual machines have been installed
with all available updates, kernel drivers and
virtualization-specific packages. The connection
between them will be over the host-only networking
mode embedded in VirtualBox to avoid any
network slowness and ensure the executed tests
accuracy. On the first virtual machine, all settings
are set by default, with the exception of boosting the
values for maximum random-access memory
(RAM) usage for PHP. The second machine will be
configured twice for every experiment. All tests will
first be executed with the default insecure
configuration. Next, the tests will be repeated with
MongoDB authorization enabled, using a self-
signed TSL/SSL certificate for communication and
applying an encryption middleware via PHP.
The created encryption middleware uses the
AES-256 CTR algorithm via the OpenSSL PHP [7]
native extension functions and the Base64 encoding
core functions for converting to a storage-friendly
format. The main purpose of this setup is to
simulate both plain and encrypted pseudo-API to
MongoDB communications and to compare results.
5. Costs of implementing security measures
Although applying end-to-end encryption is
a must, are there any consequences of using it in
your production environment? To answer this
question, we have created several practical
experiments using MongoDB to evaluate the
performance and storage costs. The next sections
describe the executed tests and show their results.
5.1. Experiments suite overview
The experiments include the insertion of
10000000 (ten million) records containing pseudo-
random cryptographically generated strings with a
fixed length of 1000 printable ASCII characters and
the fetching of 20 records from the middle of the
collection via an extra inserted 10-digit integer
field. The integer field will act as a unique creation
identifier for easier lookup and will be created with
an ascending index. Since we will be retrieving
records from the middle of the collection, it would
not matter if we use an ascending or descending
index lookup. We will also leave the default
MongoDB _id field creation but use projection
when querying the database to not get it with other
results. Also, all encrypted data will be converted to
Base64 strings for storage in MongoDB documents.
The time spent executing a test shown is
just for the section of the program that does
iteration, encryption, decryption, data insertion,
collection lookup and records retrieval. The time
needed for generating cryptographically the pseudo-
random strings is explicitly excluded. Every single
experiment is executed 10 times and the average
result of those runs is taken as final. Execution time
results will be shown in seconds with 6-digit
precision after the decimal point and storage size
results will be displayed in bytes.
5.2. Record insertion results
This experiment will test the situations
when you need to store big string data and compare
the average insertion time from PHP and the record
storage size on disk. The results for both plain
record creation and using the transparent encryption
middleware are shown in Table 2.
Table 2. Ten million records insertion
Plain
With Encryption
Total Time
1493.585353
1620.437595
Average Time
0.000149
0.000162
Average Object
1052
1388
Index Storage
212541440
212492588
Collection Size
10629316608
14346072064
Database Size
11156746240
14977986560
As we can see from the results, when
applying encryption, the overall database storage
size has significantly increased by 34.25%. Also,
the total record creation time has become with
8.49% slower but is still rapid.
Having in mind that we would probably
encrypt only sensitive data fields like passwords
and credit card numbers, the cost of applying
encryption is relatively tolerable.
5.3. Record retrieval results
The second test will apply to the scenarios
where we need to execute a complex query lookup
in huge data collections like the created ones in the
previous experiment. To simulate this, we will
query the database to fetch the first twenty records
after the fifth million record, using our creation
identifier. After that, drop the created extra index
for our identifier and run the query again to see a
more precise contrast between plain and application
decryption fetching. The results for record retrieval
experiments are shown in Table 3.
Table 3. Twenty records retrieval
Plain
With Decrypting
0.000644
0.000855
11.763420
16.326245
The time increase caused by the use of
application decryption with index fetching is
32.76% and with non-index retrieval is 38.79%. It is
important to note that the experiment also showed
the huge performance boost of using field indexing.
The results show that the application
decryption will not slow us down significantly
when the correct schema approach is being applied.
6. Conclusion
This paper has created a practical analysis
of NoSQL database solutions and evaluated the
performance and storage costs of applying end-to-
end encryption. It summarizes the best approaches
to overcoming common NoSQL security problems.
The most interesting results from the
experiments are:
Building an API around the NoSQL solution in
a security-driven matter and setting up an
isolated network is the best safeguard approach;
Using a transparent encryption middleware can
increase the disk storage size significantly but
does not hurt the overall system performance;
Searching directly inside encrypted fields via
the database query language is only available
when using native database encryption engines;
Querying the database by index scan is more
than 18000 times faster than using full scan.
REFERENCES
1. J. Sadalage, P., and Fowler, M. (2012). NoSQL
Distilled: A Brief Guide to the Emerging
World of Polyglot Persistence.
ISBN-13: 978-0321826626.
2. Okman, L., Gal-Oz, N., Gonen, Y., Gudes, Eh.,
and Abramov, J. (2011). Security Issues in
NoSQL Databases.
DOI: 10.1109/TrustCom.2011.70.
3. Tian, X., Huang, B., and Wu, M. (2014). A
transparent middleware for encrypting data in
MongoDB.
DOI: 10.1109/IWECA.2014.6845768.
4. Ron, Av., Shulman-Peleg, Al., and Bronshtein,
Em. (2015). No SQL, No Injection? Examining
NoSQL Security. arXiv:1506.04082 [cs.CR].
5. Grubbs, P., Ristenpart, Th., and Shmatikov, V.
(2017). Why Your Encrypted Database Is Not
Secure. DOI: 10.1145/3102980.3103007.
6. https://docs.mongodb.com/manual/index.html
7. http://php.net/manual/en/ref.openssl.php
Contacts:
UNIVERSITY OF PLOVDIV PAISII
HILENDARSKI
24 TZAR ASEN
PLOVDIV
E-mail: tony.karavasilev@gmail.com
E-mail: eledel@uni-plovdiv.bg
Article
Full-text available
Database security has become a very critical issue for organizations and agencies that deploy databases as major data stores for their operations. The ever-increasing data volumes to be stored, maintained and manipulated, the changing user and operational requirements, and the advancement in cloud platforms and hardware have contributed to the consistent change in trends around database research and development which are in many cases directed towards the engineering of innovative data models, techniques and systems that could help overcome the security challenges already established in the existing database management systems. This paper is an articulation of the critical security threats, challenges and vulnerabilities of two widely used database management systems (DBMS): the NoSQL and SQL-based DBMS respectively. The period under review is from 2010-2019, is perceived as a decade that recorded outstanding changes in data and database engineering respectively.
Article
Full-text available
applications has created the need to store large amount of data in distributed databases that provide high availability and scalability. In recent years, a growing number of companies have adopted various types of non-relational databases, commonly referred to as NoSQL databases, and as the applications they serve emerge, they gain extensive market interest. These new database systems are not relational by definition and therefore they do not support full SQL functionality. Moreover, as opposed to relational databases they trade consistency and security for performance and scalability. As increasingly sensitive data is being stored in NoSQL databases, security issues become growing concerns. This paper reviews two of the most popular NoSQL databases (Cassandra and MongoDB) and outlines their main security features and problems.
Conference Paper
Encrypted databases, a popular approach to protecting data from compromised database management systems (DBMS's), use abstract threat models that capture neither realistic databases, nor realistic attack scenarios. In particular, the "snapshot attacker" model used to support the security claims for many encrypted databases does not reflect the information about past queries available in any snapshot attack on an actual DBMS. We demonstrate how this gap between theory and reality causes encrypted databases to fail to achieve their "provable security" guarantees.
Conference Paper
Due to the development of cloud computing and NoSQL database, more and more sensitive information are stored in NoSQL databases, which exposes quite a lot security vulnerabilities. This paper discusses security features of MongoDB database and proposes a transparent middleware implementation. The analysis of experiment results show that this transparent middleware can efficiently encrypt sensitive data specified by users on a dataset level. Existing application systems do not need too many modifications in order to apply this middleware.
  • Av Ron
  • Al Shulman-Peleg
  • Em Bronshtein
Ron, Av., Shulman-Peleg, Al., and Bronshtein, Em. (2015). No SQL, No Injection? Examining NoSQL Security. arXiv:1506.04082 [cs.CR].