Conference PaperPDF Available

A Review of MongoDB and Singularity Container Security in regards to HIPAA Regulations

Authors:

Abstract and Figures

Nowadays Linux Containers which have operating system level virtualization, are very popular over virtual machines (VMs) which have hypervisor or kernel level virtualization in high performance computing (HPC) due to reasons, such as high portability, high performance, efficiency and high security[1]. Hence, LXCs can make an efficient and secure big data analytic framework with the help of secure, efficient, easily scalable, and highly available databases. A concern for security on high performance computing clusters is high for the transdisciplinary Texas Tech University (TTU) EXPOSOME Project. This project mainly focuses on sensitive healthcare data which is operating in the Quanah Linux cluster in the High Performance Computing Center of Texas Tech University. Data privacy in this project is in 4 areas: the database, the network infrastructure, web applications, and physical security, in line with the Health Insurance Portability and Accountability Act (HIPAA). The study in this paper investigates how to assure the TTU EXPOSOME Project data security by proposing a secure data analytic framework with the Singularity Linux container and the MongoDB NoSQL database, commonly available at TTU. First, the paper investigates what are the advantages of LXCs over VMs with security and performance perspectives. Then, it focuses on four main HIPAA required areas in data security, such as authentication, authorization, encryption, and auditing, in order to make sure system security is assured to handle healthcare data. Finally it shows how the TTU EXPOSOME Project strengthens security in the aforementioned four areas using MongoDB and Singularity, such that system security is approaching compliance with HIPAA.
Content may be subject to copyright.
A Review of MongoDB and Singularity Container Security in
regards to HIPAA Regulations
Akalanka Mailewa Dissanayaka
Department of Computer Science
Texas Tech University
TX, USA
akalanka.mailewa@ttu.edu
Roshan Ramprasad Shetty
Department of Computer Science
Texas Tech University
TX, USA
roshan-ramprasad.shetty@ttu.edu
Samip Kothari
Department of Computer Science
Texas Tech University
TX, USA
samip.kothari@ttu.edu
Susan Mengel
Department of Computer Science
Texas Tech University
TX, USA
susan.mengel@ttu.edu
Lisa Gittner
Department of Political Science
Texas Tech University
TX, USA
lisa.gittner@ttu.edu
Ravi Vadapalli
High Performance Computing Center
Texas Tech University
TX, USA
ravi.vadapalli@ttu.edu
ABSTRACT
Nowadays Linux Containers
1
which have operating system level
virtualization, are very popular over virtual machines (VMs)
which have hypervisor or kernel level virtualization in high
performance computing (HPC) due to reasons, such as high
portability, high performance, efficiency and high security [1].
Hence, LXCs can make an efficient and secure big data analytic
framework with the help of secure, efficient, easily scalable, and
highly available databases. A concern for security on high
performance computing clusters is high for the transdisciplinary
Texas Tech University (TTU) EXPOSOME Project. This project
mainly focuses on sensitive healthcare data which is operating in
the Quanah Linux cluster in the High Performance Computing
Center of Texas Tech University. Data privacy in this project is in
4 areas: the database, the network infrastructure, web applications,
and physical security, in line with the Health Insurance Portability
and Accountability Act (HIPAA). The study in this paper
investigates how to assure the TTU EXPOSOME Project data
security by proposing a secure data analytic framework with the
Singularity Linux container and the MongoDB NoSQL database,
commonly available at TTU. First, the paper investigates what are
the advantages of LXCs over VMs with security and performance
perspectives. Then, it focuses on four main HIPAA required areas
in data security, such as authentication, authorization, encryption,
and auditing, in order to make sure system security is assured to
1
LXCs
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for components of this w ork owned by others
than ACM must be honored. Abstracting with credit is permitted. To copy otherwise,
or republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. Request permissions from Permissions@acm.org.
UCC'17 Companion, December 58, 2017, Austin, TX, USA
© 2017 Association for Computing Machinery.
ACM ISBN 978-1-4503-5195-9/17/12…$15.00
https://doi.org/10.1145/3147234.3148133
handle healthcare data. Finally it shows how the TTU
EXPOSOME Project strengthens security in the aforementioned
four areas using MongoDB and Singularity, such that system
security is approaching compliance with HIPAA.
CCS CONCEPTS
Security and privacy → Systems virtualization and security
KEYWORDS
MongoDB; NoSQL; Singularity; Security; HPC; LXC; HIPAA;
Linux; Big Data
1 INTRODUCTION
Virtualization is playing a major role in managing and
administrating various distributed resources in data centers
through such tools such as VMWare, VirtualBox and Hyper-V, in
order to run many virtual machines (VMs) with different guest
operating systems [2]. It is important to note that each VM
maintains its own independent environment hopefully without
compromising the security of the host system [3] which may need
to protect sensitive data, such as public health records.
Managing, securing, and administrating various public health
records among distributed resources is a difficult task that may
benefit from the use of containers over virtual machines alone [4].
In this paper, two research questions are addressed for the Texas
Tech University (TTU) Exposome Project that deal with
cataloging and making available disparate data files which may
contain tweets or enumerable amounts of records with thousands
of fields. The first research question looks at “what do Linux
Singularity containers (LXCs) offer over virtual machines in
securing protected medical data?” and the second research
question is “what advantages do LXCs offer with MongoDB
usage for HIPAA compliance?”. These questions are answered in
the context of the TTU Exposome Project which is cataloging big
data files to make them available through high performance
computing clusters.
UCC '17, December 2017, Austin, Texas USA
Akalanka Mailewa Dissanayaka et al.
2
LXCs are taking over the domain of virtualization [1]. Unlike
VMs, LXCs do not require an isolated operating system and
instead share the host machine operating system’s kernel to
facilitate all services. Because of this sharing, the security of the
container becomes important as this security may only be as good
as that of the host machine [5]. Hence, this section gives an
overview of security of VMs and LXCs with their characteristics.
Figure 1 and Figure 2 show the implementation architecture of
VMs versus LXCs on a given hardware stack. VMs have Type-1
(Bare metal) where virtual machine monitors (VMMs or
hypervisors) run directly on the host’s hardware and Type-2
(Hosted) where VMMs are software applications running on top
of the host operating system [6]. For the VM Systems in Figure-
01, if the host O/S is replaced by a VMM or a hypervisor, then a
Type-1 virtualization occurs and if the VMM is installed within
the host O/S, then a Type-2 virtualization occurs.
Figure 1: Type-1 VM vs LXCs on the same hardware stack.
Figure 2: Type-2 VM vs LXCs on the same hardware stack.
2 VMs vs LXCs
2.1 VM Attack Surface
Several attack points in VMs with hypervisors or VMMs may
occur. One attack point occurs when allowing one VM to read the
memory of another VM because of poor isolation of the interface
to the VM; a potential failure in the hypervisor, OS, or hardware
may have occurred [7]. The hypervisor itself represents a point of
attack as an attacker may be able to run arbitrary code in the
hypervisor by breaking a VM, causing other VMs to become
accessible [8].
If the host O/S is attacked, then Type-02 virtualization may be
affected, since in this mode the hypervisor runs as a process of the
host O/S. In this case, the availability of the entire O/S with the
entry points, such as SSH and other services, lead an attacker to
more possible attack points [9]. In Type-01 virtualization, the
attack surface is small because the hypervisor is directly
communicating with the underlying hardware which eliminates
most of the aforementioned entry points [10].
2.2 LXC Attack Surface
LXCs enable its users to easily create and manage application
containers by providing a userspace interface for the Linux kernel
containment features such as Kernel namespaces, Apparmor,
Seccomp policies, Chroots, Kernel capabilities, CGroups (control
groups) [11]. Therefore it is said that Linux containers provide an
O/S level virtualized environment to execute different types of
containers with a shared kernel of the host operating system [12].
Therefore, the goal of LXC is to create an environment as close as
possible to a standard Linux installation, but unlike in VMs,
without the need for a separate kernel and with less overhead
allowing system administrators to deploy containers quickly while
using fewer server resources.
Even though, other technologies are available to operate
containers in Linux securely, the majority of the work depends on
the namespace isolation [13] where the Linux namespace
represents the Linux kernel resources; such that those are
classified into six isolated different namespaces where five of
them, “mount, PID, UTS, network, IPC”, are called privileged
namespaces requiring root permission to create while the other
namespace “user” is unprivileged [14]. By using namespace
isolation, a process can be isolated from the server; hence,
underlying system processes are not visible to the current process
which provides some security for the containers [15].
In particular, with the Singularity Linux container, usually,
process isolation does not help much on workflow of the high
performance computing hence it shares almost everything. All the
processes executing inside the host machine can be seen by the
user who is inside the Singularity container and all the processes
that are running inside the Singularity are same as the process
running inside the host machine. But Singularity has provided
some configuration options to enable the namespace isolation at
any time in order to make sure its security [16].
LXCs do have some common attacks points as VMs, but some
additional attack points should be addressed. Like in Type-2
A Review of MongoDB and Singularity Container Security in
regards to HIPAA Regulations
UCC '17, December 2017, Austin, Texas USA
3
virtualization, the attacker is able to access any container if he
gets root access to the host O/S [17]. In order to minimize the
attack surface, a “minimalist” O/S may be better which means use
of an O/S with only the most required features, such as memory,
storage, and I/O handling mechanisms. The problem cannot be
solved completely since in order to manage the containers, it is
required to have some access to the host O/S [18].
Namespaces offer potential attack points particularly if some
processes are not in a namespace on Linux which leads to sharing
of some items between containers [19]. LXCs do not have their
own user namespace even though each receives its own network
stack and process space as well as its own instance of the file
system [20]. The other major issue is the large kernel interface for
the system calls, which is exposed in every container leading to
vulnerabilities in system calls that can gain access to the kernel of
the host O/S, unlike the smaller interface between VMs and
hypervisor which prevent access to the host kernel [21].
2.3 Performance
According to the literature mentioned above, when compared with
VMs, the security features of LXCs are yet to be improved and
some of them need to be addressed. By using intrusion detection
systems most of the common and well-known threats can be
detected in VMs, because of the isolation provided in VMs [22].
But for every hypervisor, a limit exists on the number of VMs that
can run on it; so, in data centers, it is difficult to increase the
number of VMs when running enterprise applications with
increased functionalities [23]. In addition, these applications may
not require even half of the resources allocated to the VM which
means wasting resources, leading to development of applications
with LXCs where only required resources for a particular
application are utilized [24]. In addition, reproducibility and
compilation are much easier with LXCs as compared to VMs
[25]. Therefore, performance and resource utilization of LXCs are
better than VMs and hence this section answers the research
question RQ1. While many container technologies are available,
such as Docker, Shifter, and UGC, the EXPOSOME Project uses
the Singularity Linux container since it provides enhanced
security features, such as encapsulation of the environment,
image-based containers, no user contextual changes or root
escalation, no root owned daemon processes, and no sudo
permissions so that Docker container images can be imported
even though Docker is a less secure LXC [26].
This paper starts with the project background and current
activities. Next is presented what type of data security is needed
when working on public health records and finally shows,
currently available methods and techniques with Singularity to
come into alignment with HIPAA requirements.
3 BACKGROUND
The TTU EXPOSOME Project is both funded by NIH and NSF
for identifying exposures of people occurring from prenatal
through death and the impact of these exposures on health. Raw
data for analysis is available in various data formats; hence, it is
very difficult task for data analysts to analyze raw data to gain an
insight from the data. A transdisciplinary team is currently
implementing a system to convert different types of data formats
for storage in a MongoDB database for further analysis of the
TTU Exposome Project files.
Since more data is being added on a continuous basis including
de-identified, but still protected personal health records [27], the
data must be secured and privacy assured. In addition, the
authenticated researchers’ access to the data should not be
impaired while keeping the data safe. As a first solution, the
transdisciplinary team has designed a secure data analytic model
with the TTU EXPOSOME Project data files, Linux Server,
Linux Singularity Container, and MongoDB due to their
availability at the TTU High Performance Computing Center
(HPCC).
3.1 Data (EXPOSOME - Health records)
The EXPOSOME data contains health information from the
1940’s up to 2010 and has data with an incredible amount of
detail. The data contains over 20,000 variables, and is geo-tagged,
such that all the information can be broken down to the county
level. The EXPOSOME data, itself, is a great source of
information to analyze and find causes or trends for diseases. So
far, only public health records have been collected, but in the
future, de-identified individual health records will be added. To
prepare, we are investigating the best means possible to secure
protected data and access to the Exposome data itself so that
performance degradation is not experienced in accessing the data
[28].
3.2 Linux Cluster (Quanah - The main
production cluster)
The Quanah Linux Cluster is the main production cluster in use
by the TTU Exposome Project located in the TTU HPCC. The
project data is stored in the MongoDB database using a
Singularity container. The Quanah Linux cluster can only be
accessed by TTU faculty and students via a SSH secure
connection through a VPN tunnel [29].
3.3 Singularity (Linux Container - LXC)
Singularity has the potential of enabling users to have full
autonomy over working environments to package scientific
workflows, software, libraries, and data. Singularity containers,
therefore, have a plug and play role, by which programs can run
without further involvement of a system administrator. Thus,
container images taken from a Singularity hub may easily and
effectively be executed, or even modified and relocated in the
hub, solely independent to individual involvements of a system
administrator [30]. These Singularity containers are similar to
other LXCs in performance, yet Singularity containers have
strong security features [31]. The Singularity hub has the potential
of creating images - with given build specifications - to bootstrap
these containers [32]. The team has made some containers by
using Singularity to facilitate some services and applications in
UCC '17, December 2017, Austin, Texas USA
Akalanka Mailewa Dissanayaka et al.
4
the TTU Exposome Project. Figure 3 below illustrates the basic
Singularity engine architecture. The latest available version of
Singularity is installed on the Quanah Linux Cluster.
Figure 3: Basic Singularity engine architecture
3.4 MongoDB Database
MongoDB is a NoSQL based document-oriented database [33].
Our team uses MongoDB version 3.4 community edition which is
the latest available release. Even though not many inbuilt security
mechanisms are available with community edition as compared to
the enterprise edition [34], the team is able to provide adequate
security for the data by using strong database policies along with
some other security mechanisms to preserve the privacy of health
records. The current system is illustrated in Figure 4.
Figure 4: The current system
In the next two sections, the focus is on MongoDB and
Singularity. First is discussed the security requirements in this
project and then how the current security level approaches the
HIPAA security rules in order to answer the RQ2. After that, the
paper presents available methods and techniques to achieve
security needs in this project with MongoDB and Singularity.
4 SECURITY REQUIREMENT OF THE
EXPOSOME
In order to provide a guaranteed security framework for this
project, four main areas are addressed: database security (Server
with LXCs), network security, web application security, and
physical security; since, the ultimate goal is to provide a secure
data analytic framework as mentioned above. This paper mainly
focuses on the assurance of database security with the MongoDB
NoSQL database and Singularity Linux container on the Quanah
Linux cluster.
For this project, in order to implement the database layer, the
MongoDB community edition is utilized that does not have
advance authentication mechanisms, such as the Kerberos
protocol or LDAP [35]. Nevertheless it is required to assure the
system security in order to comply as close as possible to HIPAA,
by using provided or inbuilt security mechanisms of this edition.
In order to comply with HIPAA data security requirements, a few
possible measures can be considered, such as authentication,
authorization, encryption, and auditing. Therefore it is required to
define a strong data security model such that the aforementioned
four measures are assessed.
No guarantee exists such that MongoDB itself is a HIPAA
complaint database [36], especially with the community edition
with less security mechanisms. Therefore in the future, it may be
good idea to move to MongoDB enterprise edition or any other
type of NoSQL database which provides more security
mechanisms, but for the moment, due to some funding limitations,
the project is only working on the community edition with the
Singularity containers, where the LXC gives some added security
by having O/S level virtualizing [37]. Therefore in the next
section, as a hybrid solution by using both MongoDB and
Singularity, this paper presents some security mechanisms to meet
each aforementioned requirement to protect the EXPOSOME
data, such that the database security reaches as close as possible to
the HIPAA requirements.
5 METHODS AND TECHNIQUES TO
ASSURE SECURITY IN MONGODB AND
SINGULARITY
It is not guaranteed that either MongoDB inbuilt security
mechanisms or Singularity security mechanisms are compliant
with HIPAA requirements [38]. Therefore, in this section, the
paper presents how both MongoDB and Singularity security
features integrate to build a more secure framework to analyze
data securely than through using their individual methods and
techniques for authentication, authorization, encryption, and
auditing.
5.1 Authentication
By default MongoDB community edition does not have any
authentication mechanism even though it supports multiple
authentication mechanisms, such as SCRAM-SHA-1, MongoDB-
CR, and x.509 Certificate authentication [39]. MongoDB no
longer defaults to the MongoDB-CR protocol due to the lack of
A Review of MongoDB and Singularity Container Security in
regards to HIPAA Regulations
UCC '17, December 2017, Austin, Texas USA
5
security and x.509 certificate authentication is not an efficient
mechanism and also has some implementation difficulties. In this
project, primarily the SCRAM-SHA-1 mechanism is used to
implement the authentication mechanism as it is the default
mechanism for MongoDB versions beginning with the 3.0 series.
In a particular conversation with SCRAM-SHA-1, the client and
server exchange two round trips of messages which require
hashing and verification. After that, SCRAM (Salted challenge
response authentication mechanism) messages are converted into
binary data and move over the TCP protocol along with metadata
[40]. Also SHA-1 (Secure Hash Algorithm 1) is a cryptographic
hash function which produces a strong message digest (Hash
value) 160-bits in length [41]. This salted challenge response
authentication mechanism with SHA-1 verifies strongly and
efficiently the identity of clients by using the credentials, such as
username and password, before clients can connect to the system.
5.2 Authorization
The EXPOSOME MongoDB “Role-Based Access Control
(RBAC)” mechanism decides which database resources and
operations are accessed by the aforementioned verified users.
Authorization can be enabled by using –auth” or the
“security.authorization” settings since, by default authorization is
not enabled with MongoDB. Nine different access control roles
defined by the system are available with MongoDB to implement
authentication [42]. In order to implement authentication step by
step, the EXPOSOME database policies are used that have a
defined list of administrators and a list of general users. Other
requirements are given, such as password complexity levels and
database accessibility permission levels based on roles by
following the principle of minimal authority to make roles that
define the least access a user needs. This authentication
mechanism can grant one or more roles defined according to the
project policies, such that the users are able to access needed
database resources and operations. Outside of the role
assignments of the EXPOSOME database, the user has no access
to the system.
5.3 Encryption
It is important to make sure that the data should be encrypted both
in communication and in storage [43]. Data can be protected in
transmission by using inbuilt data encrypting techniques and by
limiting network exposure. All the MongoDB network traffic is
encrypted with the Transport Layer Security and Secure Sockets
Layer (TLS/SSL) [44]; hence, traffic is only readable to the
intended user. The TLS/SSL encryption mechanism is utilized to
encrypt communication between “mongod” and “mongos” as well
MongoDb with other applications, such as Python and R. The
MongoDB server is running in a trusted network environment
with limited interfaces of MongoDB instances waiting or listening
for incoming connections.
At the storage layer, MongoDB encrypts data files by
configuring WiredTiger storage engine’s native encryption
mechanism [45]. From MongoDB 3.2 onwards, WiredTiger
becomes the default storage engine with multiple features, such as
encryption, compression, check-pointing and document-level
concurrency model [46]. By using both encryption in
communication and encryption in storage along with good
security policies, data is protected such that database security
complies with HIPAA requirements [47].
5.4 Auditing
Unfortunately, MongoDB community edition does not provide
any inbuilt auditing tool unlike the enterprise edition [48], but it
does provide some login mechanisms, such as Operations log,
Diaglog, Verbose logging, Profiler Audit logs, and Mongosniff, to
record various types of logs [49]. No third party auditing tool is
employed in the TTU EXPOSOME project; nevertheless, the log
files generated by the aforementioned mechanisms can be
examined to detect suspicious activities in the system. In the
meantime, since the database is running on top of Singularity,
those who want to access MongoDB must go through the
Singularity containers. Users are tracked through Singularity
logging mechanisms since Singularity has an enriched logging
mechanism [50]. By using both MongoDB and Singularity
logging mechanisms, an enriched auditing mechanism is provided
to the entire project. In the future, an auditing tool is planned that
has abilities to alert administrators when unauthorized access or
operations are detected, to create custom alerts, and to generate
reports by using MongoDB and Singularity log files [51],
according to the project auditing policies defined by the system
administrator.
In addition to the security provided by MongoDB through
authentication, authorization, and encryption mechanisms,
Singularity provides a secure environment to analyze data when
researchers are executing applications. Singularity avoids the
privilege escalation attack unlike some other LXCs, since the user
can be the root inside the container, if and only if the user is the
root outside the container [52]. The user cannot get access to all
the files and directories on the underlying host machine.
Singularity together with MongoDB provides a more secure
framework to analyze data such that the system complies with the
four main requirements defined by HIPAA.
6 CONCLUSIONS
This paper reviews the security capabilities of free and open
source MongoDB community edition and Singularity with the
EXPOSOME project, such that the final system security closely
complies with HIPAA data security requirements even with some
limitations of those tools. Even though HIPAA data security
entails many more requirements to be fulfilled, such as
authentication, authorization, encryption, auditing, confidentiality,
integrity, and availability, in order for the system to be fully
compliant, the paper focuses only on four main requirements of
authentication, authorization, encryption, and auditing. Before
addressing the four main HIPAA data security requirements, this
paper briefly presents some reasons to use LXCs instead of VMs
by answering the first research question “what do Linux
UCC '17, December 2017, Austin, Texas USA
Akalanka Mailewa Dissanayaka et al.
6
Singularity containers (LXCs) offer over virtual machines (VMs)
in securing protected medical data?” Thereafter, it discusses the
current problem as the security requirement of the existing
system. Next, the paper provides some solutions to overcome the
problem by answering the second research question “what
advantages do LXCs offer with MongoDB usage for HIPAA
compliance?” by considering the aforementioned four main areas.
Finally the paper presents, available methods and techniques in
MongoDB and Singularity to apply to the abovementioned four
areas in the EXPOSOME project, such that the system security is
as close as possible with the HIPAA data security requirement.
7 FUTURE WORKS
So far only the basic secure data analytic framework is considered
with MongoDB and Singularity that complies very closely with
the HIPAA data security requirement. In order to improve the
security of the current system in the future, it is important to make
sure that the entire system complies with more HIPAA
requirements, such as ensuring confidentiality, integrity,
availability, risk analysis, risk management, administrative
safeguards, technical safeguards, physical safeguards, and
organizational needs. Three main tasks are identified to achieve
such system security: 1. Vulnerability and threat analysis of the
current system, 2. A new security model proposal with enhanced
features to protect the system from any vulnerability found in task
one to make a virtual data analytic framework with MongoDB and
Singularity, and 3. Introduce security mechanisms to ensure
network security, web application security, and physical security.
REFERENCES
[1]
A. M. Joy, "Performance comparison between linux containers and virtual
machines," in Computer Engineering and Applications (ICACEA), 2015
International Conference on Advances in, 2015, pp. 342-346.
[2]
P. Li and L. W. Toderick, "Cloud in cloud: approaches and implementations,"
presented at the Proceedings of the 2010 ACM conference on Information
technology education, Midland, Michigan, USA, 2010.
[3]
M. Bolte, M. Sievers, G. Birkenheuer, O. Nieh, #246, rster, et al., "Non-
intrusive virtualization management using libvirt," presented at the Proceedings
of the Conference on Design, Automation and Test in Europe, Dresden,
Germany, 2010.
[4]
E. Eberbach and A. Reuter, "Toward El Dorado for Cloud Computing:
Lightweight VMs, Containers, Meta-Containers and Oracles," presented at the
Proceedings of the 2015 European Conference on Software Architecture
Workshops, Dubrovnik, Cavtat, Croatia, 2015.
[5]
S. Soltesz, H. P, #246, tzl, M. E. Fiuczynski, A. Bavier, et al., "Container-based
operating system virtualization: a scalable, high-performance alternative to
hypervisors," presented at the Proceedings of the 2nd ACM SIGOPS/EuroSys
European Conference on Computer Systems 2007, Lisbon, Portugal, 2007.
[6]
E. Bauman, G. Ayoade, and Z. Lin, "A Survey on Hypervisor-Based
Monitoring: Approaches, Applications, and Evolutions," ACM Comput. Surv.,
vol. 48, pp. 1-33, 2015.
[7]
D. Perez-Botero, J. Szefer, and R. B. Lee, "Characterizing hypervisor
vulnerabilities in cloud computing servers," presented at the Proceedings of the
2013 international workshop on Security in cloud computing, Hangzhou,
China, 2013.
[8]
J. Szefer, E. Keller, R. B. Lee, and J. Rexford, "Eliminating the hypervisor
attack surface for a more secure cloud," presented at the Proceedings of the
18th ACM conference on Computer and communications security, Chicago,
Illinois, USA, 2011.
[9]
E. Keller, J. Szefer, J. Rexford, and R. B. Lee, "NoHype: virtualized cloud
infrastructure without the virtualization," presented at the Proceedings of the
37th annual international symposium on Computer architecture, Saint-Malo,
France, 2010.
[10]
M. Noorafiza, H. Maeda, T. Kinoshita, and R. Uda, "Virtual machine remote
detection method using network timestamp in cloud computing," in Internet
Technology and Secured Transactions (ICITST), 2013 8th International
Conference for, 2013, pp. 375-380.
[11]
R. Dua, A. R. Raja, and D. Kakadia, "Virtualization vs containerization to
support paas," in Cloud Engineering (IC2E), 2014 IEEE International
Conference on, 2014, pp. 610-614.
[12]
B. Lantz, B. Heller, and N. McKeown, "A network in a laptop: rapid
prototyping for software-defined networks," presented at the Proceedings of the
9th ACM SIGCOMM Workshop on Hot Topics in Networks, Monterey,
California, 2010.
[13]
Y. Wang, J. Wei, and K. Vangury, "Bring your own device security issues and
challenges," in Consumer Communications and Networking Conference
(CCNC), 2014 IEEE 11th, 2014, pp. 80-85.
[14]
E. Reshetova, J. Karhunen, T. Nyman, and N. Asokan, "Security of OS-level
virtualization technologies," in Nordic Conference on Secure IT Systems, 2014,
pp. 77-93.
[15]
C.-C. Tsai, K. S. Arora, N. Bandi, B. Jain, W. Jannen, J. John, et al.,
"Cooperation and security isolation of library OSes for multi-process
applications," presented at the Proceedings of the Ninth European Conference
on Computer Systems, Amsterdam, The Netherlands, 2014.
[16]
G. M. Kurtzer, V. Sochat, and M. W. Bauer, "Singularity: Scientific containers
for mobility of compute," PloS one, vol. 12, p. e0177459, 2017.
[17]
F. Reynaud, F.-X. Aguessy, O. Bettan, M. Bouet, and V. Conan, "Attacks
against Network Functions Virtualization and Software-Defined Networking:
State-of-the-art," in NetSoft Conference and Workshops (NetSoft), 2016 IEEE,
2016, pp. 471-476.
[18]
J. A. Zounmevo, S. Perarnau, K. Iskra, K. Yoshii, R. Gioiosa, B. C. Van Essen,
et al., "A container-based approach to OS specialization for exascale
computing," in Cloud Engineering (IC2E), 2015 IEEE International Conference
on, 2015, pp. 359-364.
[19]
T. Combe, A. Martin, and R. Di Pietro, "To Docker or not to Docker: A
security perspective," IEEE Cloud Computing, vol. 3, pp. 54-62, 2016.
[20]
F. Araujo, K. W. Hamlen, S. Biedermann, and S. Katzenbeisser, "From Patches
to Honey-Patches: Lightweight Attacker Misdirection, Deception, and
Disinformation," presented at the Proceedings of the 2014 ACM SIGSAC
Conference on Computer and Communications Security, Scottsdale, Arizona,
USA, 2014.
[21]
N. Santos, R. Rodrigues, and B. Ford, "Enhancing the OS against security
threats in system administration," presented at the Proceedings of the 13th
International Middleware Conference, ontreal, Quebec, Canada, 2012.
[22]
C.-J. Chung, P. Khatkar, T. Xing, J. Lee, and D. Huang, "NICE: Network
intrusion detection and countermeasure selection in virtual network systems,"
IEEE transactions on dependable and secure computing, vol. 10, pp. 198-211,
2013.
[23]
A. Kansal, F. Zhao, J. Liu, N. Kothari, and A. A. Bhattacharya, "Virtual
machine power metering and provisioning," presented at the Proceedings of the
1st ACM symposium on Cloud computing, Indianapolis, Indiana, USA, 2010.
[24]
M. G. Xavier, M. V. Neves, and C. A. F. De Rose, "A performance comparison
of container-based virtualization systems for mapreduce clusters," in Parallel,
Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro
International Conference on, 2014, pp. 299-306.
[25]
C. Boettiger, "An introduction to Docker for reproducible research," SIGOPS
Oper. Syst. Rev., vol. 49, pp. 71-79, 2015.
[26]
L. Benedicic, F. A. Cruz, A. Madonna, and K. Mariotti, "Portable, high-
performance containers for HPC," arXiv preprint arXiv:1704.03383, 2017.
[27]
H. Kharrazi, R. Chisholm, D. VanNasdale, and B. Thompson, "Mobile personal
health records: an evaluation of features and functionality," International
journal of medical informatics, vol. 81, pp. 579-593, 2012.
[28]
L. S. Gittner, B. J. Kilbourne, R. Vadapalli, H. M. Khan, and M. A. Langston,
"A multifactorial obesity model developed from nationwide public health
exposome data and modern computational analyses," Obesity Research &
Clinical Practice, 2017.
[29]
A. Alshalan, S. Pisharody, and D. Huang, "A Survey of Mobile VPN
Technologies," IEEE Communications Surveys & Tutorials, vol. 18, pp. 1177 -
1196, 2016.
[30]
S. Julian, M. Shuey, and S. Cook, "Containers in Research: Initial Experiences
with Lightweight Infrastructure," presented at the Proceedings of the XSEDE16
Conference on Diversity, Big Data, and Science at Scale, Miami, USA, 2016.
[31]
B. Gerofi, R. Riesen, R. W. Wisniewski, and Y. Ishikawa, "Toward Full
Specialization of the HPC Software Stack: Reconciling Application Containers
and Lightweight Multi-kernels," presented at the Proceedings of the 7th
International Workshop on Runtime and Operating Systems for
Supercomputers ROSS 2017, Washingon, DC, USA, 2017.
[32]
K. J. Gorgolewski, F. Alfaro-Almagro, T. Auer, P. Bellec, M. Capotă, M. M.
Chakravarty, et al., "BIDS apps: Improving ease of use, accessibility, and
reproducibility of neuroimaging data analysis methods," PLoS computational
biology, vol. 13, p. e1005209, 2017.
[33]
V. Abramova and J. Bernardino, "NoSQL databases: MongoDB vs cassandra,"
in Proceedings of the international C* conference on computer science and
A Review of MongoDB and Singularity Container Security in
regards to HIPAA Regulations
UCC '17, December 2017, Austin, Texas USA
7
software engineering, 2013, pp. 14-22.
[34]
S. Kumar, J. Shekhar, and H. Gupta, "Agent based Security Model for Cloud
Big Data," in Proceedings of the Second International Conference on
Information and Communication Technology for Competitive Strategies, 2016,
p. 142.
[35]
V. C. Hu, T. Grance, D. F. Ferraiolo, and D. R. Kuhn, "An access control
scheme for big data processing," in Collaborative Computing: Networking,
Applications and Worksharing (CollaborateCom), 2014 International
Conference on, 2014, pp. 1-7.
[36]
F. Zafar, A. Khan, S. Suhail, I. Ahmed, K. Hameed, H. M. Khan, et al.,
"Trustworthy Data: A Survey, Taxonomy and future trends of Secure
Provenance Schemes," Journal of Network and Computer Applications, 2017.
[37]
B. Duncan, A. Bratterud, and A. Happe, "Enhancing cloud security and
privacy: Time for a new approach?," in Innovative Computing Technology
(INTECH), 2016 Sixth International Conference on, 2016, pp. 110-115.
[38]
M. Bahrami, A. Malvankar, K. K. Budhraja, C. Kundu, M. Singhal, and A.
Kundu, "Compliance-Aware Provisioning of Containers on Cloud."
[39]
V. J. Dindoliwala and R. D. Morena, "Survey on Security Mechanisms In
NoSQL Databases," International Journal of Advanced Research in Computer
Science, vol. 8, 2017.
[40]
X. Wang, A. Madaan, E. Siow, and T. Tiropanis, "Sharing Databases on the
Web with Porter Proxy," presented at the Proceedings of the 26th International
Conference on World Wide Web Companion, Perth, Australia, 2017.
[41]
H. E. Michail, G. S. Athanasiou, G. Theodoridis, A. Gregoriades, and C. E.
Goutis, "Design and implementation of totally-self checking SHA-1 and SHA-
256 hash functions’ architect ures," Microprocessors and Microsystems, vol. 45,
pp. 227-240, 2016.
[42]
L. Li, K. Qian, Q. Chen, R. Hasan, and G. Shao, "Developing Hands-on
Labware for Emerging Database Security," presented at the Proceedings of the
17th Annual Conference on Information Technology Education, Boston,
Massachusetts, USA, 2016.
[43]
P. Huang, B. Li, L. Guo, Z. Jin, and Y. Chen, "A robust and reusable ecg-based
authentication and data encryption scheme for ehealth systems," in Global
Communications Conference (GLOBECOM), 2016 IEEE, 2016, pp. 1-6.
[44]
M. Humphrey, J. Steele, I. K. Kim, M. G. Kahn, J. Bondy, and M. Ames,
"CloudDRN: A Lightweight, End-to-End System for Sharing Distributed
Research Data in the Cloud," in eScience (eScience), 2013 IEEE 9th
International Conference on, 2013, pp. 254-261.
[45]
V. Sherimon and P. Sherimon, "From Relational Model to Rich Document Data
Models-Best Practices Using MongoDB," 2017.
[46]
Y. Son, H. Kang, H. Han, and H. Y. Yeom, "Improving Performance of Cloud
Key-Value Storage Using Flushing Optimization," in Foundations and
Applications of Self* Systems, IEEE International Workshops on, 2016, pp.
42-47.
[47]
J. M. Kiel, F. A. Ciamacco, and B. T. Steines, "Privacy and Data Security:
HIPAA and HITECH," in Healthcare Information Management Systems, ed:
Springer, 2016, pp. 437-449.
[48]
S. Sathyadevan, N. Muraleedharan, and S. P. Rajan, "Enhancement of Data
Level Security in MongoDB," in Intelligent Distributed Computing, ed:
Springer, 2015, pp. 199-212.
[49]
P. Murugesan and I. Ray, "Audit log management in MongoDB," in Services
(SERVICES), 2014 IEEE World Congress on, 2014, pp. 53-57.
[50]
S. Das, T. Glatard, C. Rogers, J. Saigle, S. Paiva, L. MacIntyre, et al.,
"Cyberinfrastructure for Open Science at the Montreal Neurological Institute,"
Frontiers in neuroinformatics, vol. 10, 2016.
[51]
K. Dwivedi and S. K. Dubey, "Implementation of Data Analytics for MongoDB
Using Trigger Utility," in Computational Intelligence in Data MiningVolume
1, ed: Springer, 2016, pp. 39-47.
[52]
A. Azab, "Enabling Docker Containers for High-Performance and Many-Task
Computing," in Cloud Engineering (IC2E), 2017 IEEE International
Conference on, 2017, pp. 279-285.
... In recent years, Machine Learning and Deep Learning algorithms in anomaly detection have garnered huge interest [4] [23]. Anomaly-based intrusion detection is essentially a classification problem and Machine Learning and Deep Learning algorithms have proven to be useful in Network Intrusion Detection [5] [6]. Machine Learning is a branch of Artificial Intelligence, and it gives computers the ability to learn without being explicitly programmed [23]. ...
... A notable dataset is NSL-KDD dataset which is the benchmark dataset in intrusion detection studies which is an improved version of the KDD Cup '99 dataset that was developed in the year 1999. It does not represent today's modern network attacks [6]. It contains old network traffic and does not have real-time properties. ...
... This dataset contains benign and common network attacks that are like true real-world data. The dataset includes the results of the network traffic analyses using CICFlowMeter with various labels based on the timestamp, source, and destination IPs, source and destination ports, etc. [6][21]. ...
Conference Paper
Full-text available
In today's world, businesses and services are shifted to a digital transformation. As a result, network traffic has tremendously increased over the years. With that, network threats and attacks are growing and with that, the importance of intrusion detection systems has increased. The traditional signature-based approach to intrusion detection is not sufficient to detect intrusions, so anomaly-based intrusion detection came into play. There are many methods to Anomaly-based intrusion detection methods that can classify unknown network attacks. To detect network anomalies, Machine Learning and Deep Learning techniques are applied, and a considerable number of studies are done in this field. This research presents classification models built using supervised Machine Learning algorithms. The algorithms Logistic Regression, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes, Decision Tree and Random Forest on multiple datasets of realistic evaluation dataset CICIDS-2017. The results show that Random Forest outperforms other supervised algorithms with as high as 99.93% accuracy using 14 features selected using Pearson's correlation coefficient method.
... Examples of malware or malicious software are internet works, viruses, or trojan horses that can be detrimental to an organization and its end users [5]. These threats, if executed, can have detrimental consequences for organizations that own software systems as well as their end users, often resulting in cybertheft or cyberespionage, causing losses of proprietary, financial or personal information, which benefits the attacked, in a lot of cases without the victim even knowing [6]. Some approaches to mitigating these risks associated with software systems involve developing and implementing security software like antivirus protection, cryptography, firewalls etc. ...
... Accountability also involves having control and proper verification and authorization systems in place, as well as policies backed by law that outline consequences for mismanagement of the information system access, or any corrupt behavior or activity. These policies are then made transparent and available to end users so they have more confidence on how seriously an organization takes the security of their information and data and the measures that are in place for the protection of end users' data, finances, and any other information they share with the organization [6][7] [8]. ...
... RSA based data integrity checks ensure data integrity by using cryptographic techniques and RSA digital signature to verify that any data transmitted is not tampered with anything else [6]. ...
Conference Paper
Full-text available
The use of software to support the information infrastructure that governments, critical infrastructure providers and businesses worldwide rely on for their daily operations and business processes is gradually becoming unavoidable. Commercial off-the shelf software is widely and increasingly used by these organizations to automate processes with information technology. That notwithstanding, cyber-attacks are becoming stealthier and more sophisticated, which has led to a complex and dynamic risk environment for IT-based operations which users are working to better understand and manage. This has made users become increasingly concerned about the integrity, security and reliability of commercial software. To meet up with these concerns and meet customer requirements, vendors have undertaken significant efforts to reduce vulnerabilities, improve resistance to attack and protect the integrity of the products they sell. These efforts are often referred to as “software assurance.” Software assurance is becoming very important for organizations critical to public safety and economic and national security. These users require a high level of confidence that commercial software is as secure as possible, something only achieved when software is created using best practices for secure software development. Therefore, in this paper, we explore the need for information assurance and its importance for both organizations and end users, methodologies and best practices for software security and information assurance, and we also conducted a survey to understand end users’ opinions on the methodologies researched in this paper and their impact.
... Furthermore, blockchain transactions are stored in a fully decentralized P2P network that repeats data storage, eliminating the possibility of data loss. The following are some of the main blockchain pillars and concepts [8] [9]: The Merkle tree's root is a hash that ensures the integrity of all transactions in a block, including their order. The hash pointer for the entire block is the Merkle tree root hash, plus the previous block's hash pointer and any consensus information that makes the node valid. ...
... [9][10]: 1. A transaction: is a single entry in the ledger that can specify a piece of information or an operation over prior transactions, such as sending funds from one transaction to another public address. ...
Conference Paper
Full-text available
Intrusions in the computing networking world have been a highly common unwanted malicious activity from the beginning of computing networks. For the past decade variety of security measures have been implemented, but as technology has advanced, so have the security threats. With the entire world relying on computers, whether directly or indirectly, preventing unwanted activities and threats that can disrupt computing infrastructures is a critical concern. Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) are common security techniques used to protect computing resources, which are mostly found in a network. As the threat of cyber-attacks increases, new security techniques are required. Blockchain has the potential to be used in the intrusion detection and prevention area since it can guarantee data integrity and maintain process transparency. Therefore, this survey paper presents a review of the intersection between IDSs and IPSs and blockchain technologies. We will discuss the history of intrusion detection and prevention and blockchain, describe how blockchain technologies might be applied into IDS and IPS, and identify open challenges in this area.
... The cloud storage cannot use the data as plaintext as the data will be needed to be transferred over a network and if it is transferred as plaintext, it gives rise to security issues. Hence, Cloud technology has an approach of employing multiple third-party servers for the data storage instead of using a single dedicated server as used in traditional data storage networks [3][4] [5]. In this, the location of data storage is not known to the data owner or the actual data user and it is only the concern of the cloud storage provider, and the provider alone can secure the data which is not completely trusted by the users. ...
Article
Full-text available
Now more than ever has it become important to keep the information confidential in an age that is losing its value of individual privacy. In this cloud computing era, regardless of the power of the cloud computing concept, many people do not know that their information can be used and sent to third parties from their cloud storage provider. Today the use of cloud storage is well established however the security of protecting the data on the cloud is a limited thought for most users. Therefore, this study aims to experimentally research which encryption program works best when storing data onto three of the main cloud storage providers currently available on the market. This study will go over the hardware and network impact as well as the time to encrypt and decrypt the data. This study will determine if "7zip" or "rclone" encryption programs work best with these three cloud storage. The data will be collected using NetData tool and accordingly determine which encryption application works best with which cloud storage provider. Thereafter, based on the data analysis, it is recommended that experimental outcomes to all users to keep their sensitive data secured and safe from snooping or prevent private information from being collected and sold to third parties with the help of black market.
... It is a free and opensource collection of client-server software that also offers to host services. Anybody can also install and use NextCloud on a private server [15] [16]. NextCloud claims to prioritize security and privacy and has several external penetration testers and experts. ...
Conference Paper
Full-text available
In recent years cloud computing has moved from an idea phase to a necessity in the business and personal use world, it has gone from being a theory to being an easily accessible need for organizations and individuals. A few years ago, having a storage infrastructure was an expensive idea that had to come with the buying, storage, securing, and maintenance of the equipment used to develop the infrastructure. Cloud computing came with a lot of benefits such as cost-benefit, could be configured, accessed anywhere, being a reliable service, it also brought some negative aspects too which were centered on security and privacy concern. Cloud computing has brought about different types of cloud delivery services such as SaaS, PaaS. This paper focuses on two cloud services where one is "NextCloud" which is a private open-source cloud provider with an end-to-end solution. The other one is "Dropbox" which is a public cloud provider. This study compares the two services based on security, authentication, and privacy of each services. We will conduct a detailed analysis of the benefits of one service over the other and their shortcoming. The results shows that both services deal with Confidentiality, Integrity and Availability a little differently but they ultimately arrive at the same goal. Finally, this study concludes that each of these two services has their suited demographic such as Dropbox is for users that are not overly concerned about security and NextCloud required to have a more expertise on computers to set up and maintain the private cloud. In addition NextCloud has concern for security and a sense of control.
... The customer metadata makes up the rest of the customer's data. That refers to all data that cannot be classified as customer content like auto-generated project numbers, timestamps, and IP-Addresses [10][11]. ...
Article
Full-text available
Cloud storage services such as GoogleCloud and NextCloud have become increasingly popular among Internet users and businesses. Despite the many encrypted file cloud systems being implemented worldwide today for different purposes, we are still faced with the problem of their usage, security, and performance. Although some cloud storage solutions are very efficient in communication across different clients, others are better in file encryption, such as images, videos, and text files. Therefore, it is evident that the efficiency of these algorithms varies based on the purpose and type of encryption and compression. This paper focuses on the comparative analysis of NextCloud with composed end-to-end solutions that use both an unencrypted cloud storage and an encrypted solution. In this paper, we measured the network use, file output size, and computation time of given workloads for two different services to thoroughly evaluate the efficiency of NextCloud and GoogleCloud. Our findings concluded that there is similar network usage and synchronization time. However, GoogleCloud had more CPU utilization than NextCloud. On the other hand, NextCloud had a longer delay when uploading files to their cloud service. Our experimental results show that the evaluation model is considered robust if its output and forecasts are consistently accurate, even if one or more of the input variables or assumptions are drastically changed due to unforeseen circumstances.
... Phishing Attack This is about on how the attacker tries all the possible ways of attacking the victim by finding all the combinations of the code and passwords. The more complex the code is, the more time will be taken for the attacker to find it out [29]. ...
Article
Full-text available
Cloud computing is growing tremendously in recent years. Many organizations are switching their traditional computing model to a cloud based because of its low cost and pay-as-you-go manner. Although Cloud Service Provider (CSP) ensures that the data stored in their remote cloud server will be intact and secure. But there are many data integrity issues exist that needed to be addressed. In Cloud environment, lack of data integrity is a major concern. In this paper, we have surveyed several past studies which identifies the issues related to the cloud data storage security such as data theft, unavailability, and data breach of cloud server data. We have also provided a detailed analysis of types of data integrity attacks and their mitigation techniques.
... OpenFlow is configur ed for communication among network entities. OpenFlow supports TLS v1.2 and above and PKI can be used to establish certificate validity chains [31]. Th e framework uses ovs-pki script to create an initial PKI structure [32]. ...
Article
Full-text available
Software Defined Networking (SDN) is a rapidly growing technology that is enabling innovation on how network systems are designed and managed. Like any other technology, SDN is susceptible to numerous security threats. The separation of planes and centralized control topology of SDN makes it vulnerable to myriad of attacks. There has been rapid implementation of SDN in variety of networks and this is only growing with time. T here are only handful of resources for enterprise networks that are actively transitioning their networks to SDN and require security specification.This paper proposes a security framework that integrates basic security requirements for SDN. First, the security vulnerabilities are identified by assessin g the SDN topology. This provides understanding of the issue which is utilized to implement security mechanisms that help mitigate the vulnerabilities. The framework utilizes open-source tools and techniques and integrates well-known security mechanisms. Throughout the paper, the proposed framework is implemented and tested for its success in satisfying some of the basic and crucial security requirements. In doing so, this research offers viable security mechanisms for enterprise networks that are looking for performance and cost-effective security solutions
... As a solution to the aforementioned system security requirement, Mailewa Dissanayaka et al. [7] proposed and published a secure data analytic framework. This research is the follow up testing phase to find the currently available vulnerabilities of our previously proposed model to answer the research question, ''RQ: How to verify the security of the proposed system including what are the loopholes of the current system?'' ...
Article
Full-text available
It is essential to ensure the data security of data analytical frameworks as any security vulnerability existing in the system can lead to a data loss or data breach. This vulnerability may occur due to attacks from live attackers as well as automated bots. However inside attacks are also becoming more frequent because of incorrectly implemented security requirements and access control policies. Thus, it is important to understand security goals and formulate security requirements and access control policies accordingly. Therefore, it is equally important to identify the existing security vulnerabilities of a given software system. To find the available vulnerabilities against any system, it is mandatory to conduct vulnerability assessments as scheduled tasks in a regular manner. Thus, an easily deployable, easily maintainable, accurate vulnerability assessment testbed or a model is helpful as facilitated by Linux containers. Nowadays Linux containers (LXCs) which have operating system level virtualization, are very popular over virtual machines (VMs) which have hypervisor or kernel level virtualization in high performance computing (HPC) due to reasons, such as high portability, high performance, efficiency and high security (Chae et al in Clust Comput 22:1765-1775, 2019. https://doi.org/10.1007/s10586-017-1511-2). Hence, LXCs can make an efficient and scalable vulnerability assessment testbed or a model by using already developed analyzing tools such as OpenVas, Dagda, PortSpider, MongoAudit, NMap, Metasploit Framework, Nessus, OWASP Zed Attack Proxy, and OpenSCAP, to assure the required security level of a given system very easily. To verify the overall security of any given software system, this paper first introduces a virtual, portable and easily deployable vulnerability assessment general testbed within the Linux container network. Next, the paper presents, how to conduct experiments using this testbed on a MongoDB database implemented in Singularity Linux containers to find the available vulnerabilities in 1. MongoDB application itself, 2. Images accompanied by containers, 3. Host, and 4. Network by integrating seven tools: OpenVas, Dagda, PortSpider, MongoAudit, NMap, Metasploit Framework, and Nessus to the container-based testbed. Finally, it discusses how to use generated results to improve the security level of the given system.
Conference Paper
Full-text available
With large number of datasets now available through the Web, data-sharing ecosystems such as the Web Observatory have emerged. The Web Observatory provides an active decentralised ecosystem for datasets and applications based on a number Web Observatory sites, each of which can run in a different administrative domain. On a Web Observatory site users can publish and securely access datasets across domains via a harmonised API and reverse proxies for access control. However, that API provides a different interface to that of the databases on which datasets are stored and, consequently, existing applications that consume data from specific databases require major modification to be added to the Web observatory ecosystem. In this paper we propose a lightweight architecture called Porter Proxy to address this concern. Porter Proxy exposes the same interfaces as databases as requested by the users while enforcing access control. Characteristics of the proposed Porter Proxy architecture are evaluated based on adversarial scenario-handling in Web Observatory eco-system.
Article
Full-text available
Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity containers, developers can work in reproducible environments of their choosing and design, and these complete environments can easily be copied and executed on other platforms. Singularity is an open source initiative that harnesses the expertise of system and software engineers and researchers alike, and integrates seamlessly into common workflows for both of these groups. As its primary use case, Singularity brings mobility of computing to both users and HPC centers, providing a secure means to capture and distribute software and compute environments. This ability to create and deploy reproducible environments across these centers, a previously unmet need, makes Singularity a game changing development for computational science.
Article
Full-text available
Building and deploying software on high-end computing systems is a challenging task. High performance applications have to reliably run across multiple platforms and environments, and make use of site-specific resources while resolving complicated software-stack dependencies. Containers are a type of lightweight virtualization technology that attempt to solve this problem by packaging applications and their environments into standard units of software that are: portable, easy to build and deploy, have a small footprint, and low runtime overhead. In this work we present an extension to the container runtime of Shifter that provides containerized applications with a mechanism to access GPU accelerators and specialized networking from the host system, effectively enabling performance portability of containers across HPC resources. The presented extension makes possible to rapidly deploy high-performance software on supercomputers from containerized applications that have been developed, built, and tested in non-HPC commodity hardware, e.g. the laptop or workstation of a researcher.
Article
Full-text available
The rate of progress in human neurosciences is limited by the inability to easily apply a wide range of analysis methods to the plethora of different datasets acquired in labs around the world. In this work, we introduce a framework for creating, testing, versioning and archiving portable applications for analyzing neuroimaging data organized and described in compliance with the Brain Imaging Data Structure (BIDS). The portability of these applications (BIDS Apps) is achieved by using container technologies that encapsulate all binary and other dependencies in one convenient package. BIDS Apps run on all three major operating systems with no need for complex setup and configuration and thanks to the comprehensiveness of the BIDS standard they require little manual user input. Previous containerized data processing solutions were limited to single user environments and not compatible with most multi-tenant High Performance Computing systems. BIDS Apps overcome this limitation by taking advantage of the Singularity container technology. As a proof of concept, this work is accompanied by 22 ready to use BIDS Apps, packaging a diverse set of commonly used neuroimaging algorithms.
Conference Paper
Application containers enable users to have greater control of their user-space execution environment by bundling application code with all the necessary libraries in a single software package. Lightweight multi-kernels leverage multi-core CPUs to run separate operating system (OS) kernels on different CPU cores, usually a lightweight kernel (LWK) and Linux. A multi-kernel's primary goal is attaining LWK scalability and performance in combination with support for the Linux APIs and environment. Both of these technologies are designed to address the increasing hardware complexity and the growing software diversity of High Performance Computing (HPC) systems. While containers enable specialization of user-space components, the LWK part of a multi-kernel system is also a form of software specialization, but targeting kernel space. This paper proposes a framework for combining application containers with multi-kernel operating systems thereby enabling specialization across the software stack. We provide an overview of the Linux container technologies and the challenges we faced to bring these two technologies together. Results from previous work show that multi-kernels can achieve better isolation than Linux. In this work, we deployed our framework on 1,024 Intel Xeon Phi Knights Landing nodes. We highlight two important results obtained from running at a larger scale. First, we show that containers impose zero runtime overhead even at scale. Second, by taking advantage of our integrated framework, we demonstrate that users can transparently benefit from lightweight multi-kernels, attaining identical speedups to the native multi-kernel execution.
Article
Data is a valuable asset for the success of business and organizations these days, as it is effectively utilized for decision making, risk assessment, prioritizing the goals and performance evaluation. Extreme reliance on data demands quality assurance and trust on processes. Data Provenance is information that can be used to reason about the current state of a data object. Provenance can be broadly described as the information that explains where a piece of data object came from, how it was derived or created, who was involved in said creation, manipulations involved, processes applied, etc. It consists of information that had an effect on the data, evolving to its present state. Provenance has been used widely for the authenticity of data and processes. Despite having such a wide range of uses and applications, provenance poses vexing privacy and integrity challenges. Provenance data itself is, therefore, critical and it must be secured. Over the years, a number of secure provenance schemes have been proposed. This paper aims to enhance the understanding of secure provenance schemes and its associated security issues. In this paper, we have discussed why secure provenance is needed, what are its essential characteristics, and what objectives it serves? We describe the lifecycle of secure provenance and highlighted how trust is achieved in different domains by its application. Firstly, a detailed taxonomy of existing secure provenance schemes is presented. Then, a comparative analysis of existing secure provenance schemes, which highlights their strengths and weaknesses is also provided. Furthermore; we highlight future trends, which should be focused upon by the research community.
Article
Statement of the problem: Obesity is both multifactorial and multimodal, making it difficult to identify, unravel and distinguish causative and contributing factors. The lack of a clear model of aetiology hampers the design and evaluation of interventions to prevent and reduce obesity. Methods: Using modern graph-theoretical algorithms, we are able to coalesce and analyse thousands of inter-dependent variables and interpret their putative relationships to obesity. Our modelling is different from traditional approaches; we make no a priori assumptions about the population, and model instead based on the actual characteristics of a population. Paracliques, noise-resistant collections of highly-correlated variables, are differentially distilled from data taken over counties associated with low versus high obesity rates. Factor analysis is then applied and a model is developed. Results and conclusions: Latent variables concentrated around social deprivation, community infrastructure and climate, and especially heat stress were connected to obesity. Infrastructure, environment and community organisation differed in counties with low versus high obesity rates. Clear connections of community infrastructure with obesity in our results lead us to conclude that community level interventions are critical. This effort suggests that it might be useful to study and plan interventions around community organisation and structure, rather than just the individual, to combat the nation's obesity epidemic.