Conference PaperPDF Available

A Scalable Bioinformatics Analysis Platform based on Microservices Architecture

Authors:

Abstract and Figures

With the advancement of technologies, web services play a significant role to maintain infrastructure in healthcare domain due to the increasing demand of performance. In such systems, adoption o f novel technologies is necessary to increase the productivity and reduce the burden o f maintenance associated with legacy systems. Microservices architecture has become prominent in deploying server-side enterprise applications by allowing maintainable functionalities. However, it is challenging to utilize microservices in the domain o f bioinformatics, although it enables independent process execution and maintenance. This paper introduces the utilization o f microservices architecture to build an optimized platform for bioinformatics analyses. We present a hybrid architecture that consists o f different hardware platforms to execute accelerated computational services, independently. The core communication is based on an Application Programming Interface (API) gateway. Furthermore, the paper presents the evaluation o f results related to the performance o f the proposed solution under varying biological sequences as inputs and algorithms.
Content may be subject to copyright.
Paper No: SC 11 Smart Computing
A Scalable Bioinformatics Analysis Platform based on Microservices
Architecture
S. Rajapaksa, A. Wickramarachchi, V.
Mallawaarachchi
Department o f Computer Science an d
Engineering
University o f Moratuwa
Sri Lanka
sandunip@cse.mrt.ac.lk
Abstract
With the advancement o f technologies, web services
play a significant role to maintain infrastructure in
healthcare domain due to the increasing demand of
performance. In such systems, adoption o f novel
technologies is necessary to increase the productivity and
reduce the burden o f maintenance associated with legacy
systems. Microservices architecture has become
prominent in deploying server-side enterprise
applications by allowing maintainable functionalities.
However, it is challenging to utilize microservices in the
domain o f bioinformatics, although it enables
independent process execution and maintenance. This
pape r introduces the utilization o f microservices
architecture to build an optimized platform fo r
bioinformatics analyses. We present a hybrid architecture
that consists o f different hardware platforms to execute
accelerated computational services, independently. The
core communication is based on an Application
Programming Interface (API) gateway. Furthermore, the
pape r presents the evaluation of results related to the
performance o f the proposed solution under varying
biological sequences as inputs and algorithms.
Keywords: Bioinformatics analysis, microservices
architecture, enterprise applications
1. Introduction
Bioinformatics is an emerging discipline that draws upon
the strengths of computer sciences and mathematics to
analyze information flow in biological systems. This
consists of diverse areas such as genomics and
proteomics, where the subject experts utilize different
technologies for the advances in each field [1]. Many
bioinformatics computations are accessed via web
services, while the underlying technologies differ vastly.
This enables the provision of robust Application
Programming Interfaces (APIs), which can be used by
experts to perform the required computations [2]. Current
W. Rasanjana, I. Perera, D. M eedeniya
Departm ent o f Computer Science a nd
Engineering
University o f Moratuwa
Sri Lanka
research focuses on web technologies, cloud computing,
distributed and parallel computing infrastructures, to
enable responsive and efficient services, considering the
heterogeneous nature of the hardware and software
technologies.
Bioinformatics domain requires software solutions
with high performance and computational capacities.
The developers are motivated to split applications into
small, easily maintainable functional units that can be
replicated on demand [3, 4]. The microservices
architecture abstracts the underlying technologies by
splitting a complete solution to a set of self-contained
services. These can be developed, deployed and scaled
independently, thus enables a language agnostic
communication with services using JavaScript Object
Notation (JSON) [5]. Thus, microservices architecture
has become popular, emphasizing the design and
development of highly maintainable and scalable
software [6]. Adoption of microservices in
bioinformatics analyses enables the development of a
platform that supports to use heterogeneous technologies
and can be made available for external users using an
API gateway [2].
Although there are existing computerized
bioinformatics analysis tools, and web services, the
efforts made to improve their architecture and
performance using the state of the art microservice
architecture are limited [3]. Thus, the services provided
by vendors such as NCBI [7], Tavema [8] and Galaxy
[9] remain to use traditional software architectural
approaches and deployment methodologies.
Additionally, the use of optimized algorithms in the
existing bioinformatics computational services is
limited, although they provide high throughputs, due to
the integration of incompatibilities in hardware and
software levels of the technologies [8]. Thus, it is
challenging to overcome the heterogeneity o f the
executables and the nature of computations, when
providing a unified platform for algorithmic execution.
Smart Computing and Systems Engineering, 2019
Department o f Industrial Management, Faculty of Science, University o f Kelaniya, Sri Lanka. 70
This paper presents an approach to create a
bioinformatics platform to support the execution of
different bioinformatics computations with the use of
microservices architecture. The solution introduces a
novel architecture, which utilizes the API gateway
pattern that exposes microservices to the external users
via a single API masking the underlying discrepancies.
Section II elaborates the literature on the usage of
microservices in bioinformatics computations and
existing services. Section III describes the architecture of
the proposed system and Section IV explains the
methodology. The experimental and evaluation results
are presented in Section V. Finally, Section VI concludes
the paper with the inferences obtained from the results
and possible future extensions.
2. Literature review
2.1 Microservices
Microservices are self-contained units o f functionality
with loosely coupled dependencies on other services and
are designed, developed, tested and released
independently [10]. They can be reused across many
different solutions and can scale appropriately. Although
the microservices technology was introduced into the
UNIX kernel as early as the 1970s, it has emerged in web
technologies only recently. Shadija et al. [4] has analysed
computational systems for their granularity and
performance and suggested that microservices can
increase the system granularity, thus increasing the
scalability and maintainability. The microservices
deployed over many server nodes cause a delay due to
network overheads but can perform better by optimizing
the execution environments.
At present, bioinformatics analyses are used for
simulation and testing of different domains such as
genetic studies and drug discovery. The need for
microservices based solutions has been highlighted, with
the increasing need for efficient, maintainable and
scalable computational techniques [3]. The growing need
for microservices based solutions to address
bioinformatics computations is shown by Williams et al.
[3]. Literature has highlighted the developments in the
related field due to the increased agility of the
microservices architecture with heterogeneous analysis
techniques.
2.2 Containers
A container is a lightweight, stand-alone, executable
package of software and a widely adopted method to
develop microservices. A container includes all the
required artefacts such as the code, runtime, system tools,
libraries and settings [11]. Docker [12] is a well-known
container platform provider. It provides the ability to
package applications along with their dependencies into
lightweight containers that can be easily moved between
different distros, start up quickly and independently.
Many related works that emphasize the importance of
using microservices in the domain o f bioinformatics are
provided as deployable containers.
CodonGenie [13] web-based tool is an ambiguous
codons design tool that supports protein mutagenesis
applications. It is designed as a microservices so that it
can be integrated to applications by a RESTful web
service API as well as Docker container. ChEMBL API
allows integrating data of bioactive molecules with drug
like properties to other applications [14]. Moreover,
Khoonsari et al. [15], have introduced a generic method
using the microservices architecture, where software
tools are encapsulated as Docker containers that can be
connected to scientific workflows and executed in
parallel. The approach utilizes containers to provide
microservices and the containers are managed using
Kubemetes [16]. The service caters for functions in
workflows related to the domain of metabolomics data
analysis.
2.3 Microservices in Biology, Medicine and
Bioinformatics
A Reaction balancing web service for computational
systems Biology has described by Dobson et al. [17].
They have implemented a RESTful web service that
offers a language-agnostic way o f binding services
together. This platform is built using many technologies
in JAVA and Python. This is a web-based solution,
where message passing is based on JSON
communication. The underlying logic has used many
technologies with different forms of input types. Unlike
in a service-oriented architectural design, the system is
developed as a web service, which connects the
individual components with more maintainability.
Fjukstad et al. [18] have presented a data exploration
application for systems biology using microservices.
They have used microservices due to the language-
agnostic means o f communication between the building
components. Although this gives a solution for data
exploration using visualization components such as heat
maps, the other functionalities related to genome analysis
have not considered. A similar research by Hill et al.
[19], have performed medical data processing by
incorporating IoT devices to gather data and build a
community healthcare system. However, they have not
addressed the bioinformatics related use cases and
complex computational requirements.
Arkas-Quantification and Arkas-Analysis [20] are
cloud-scale RNAseq pipelines, which are versioned into
Docker containers and publicly deployed in Illumina’s
BaseSpace platform [21]. PAPAyA [1] is a cloud-based
Smart Computing and Systems Engineering, 2019
Department of Industrial Management, Faculty of Science, University o f Kelaniya, Sri Lanka. 71
framework that provides genomic processing services
forbespoke therapy guidance. It provides diverse
pipelines and tasks (detecting variants, mutations, copy
number variation, differential gene expression and DNA
méthylation) as Linux-based containers.
According to the literature, most of the research has
been done using cloud services that provide RESTful
services. Many solutions are presented as individual
services with isolated APIs and should be integrated
through a single API gateway. Thus, an aggregation of a
set o f microservices is required for a language agnostic
execution of subroutines in workflow modelling. This
enables the workflow modelling tool [22], to operate
seamlessly by calling an API gateway that communicates
with relevant service to complete workflow. Although the
related work has identified importance of microservices
to provide robust analytical capabilities, there has been
limited emphasis on the usage of microservices
architecture in bioinformatics workflow design.
3. System architecture
3.1 System components
We describe the architecture of the proposed
microservices based platform to support optimized
bioinformatics workflow design and modelling [22, 23].
It utilizes both central processing units (CPU) and
graphics processing units (GPU) powered service
containers to support optimized execution of algorithms.
The proposed solution provides the end users with an
API, which exposes algorithmic functionality in a
language-agnostic manner using JSON message passing.
The proposed solution consists of three major
components; Web Server, AP I Gateway and
Microservices, as shown in Figure 1. The service layer
contains highly decoupled service instances that are
wired to the external API using the gateway. The Web
Server enables the functions to be accessed in the form of
a REST API. The requests to the web server are
forwarded to the API gateway for service invocation.
Figure 1. System components oftne
microservices based solution.
Since the services are running their own REST APIs to
handle and process requests, an API gateway is required
to connect each service and control its access. This
component also manages to add, remove and migrate
services to support further functionality. Addition o f new
services to the API gateway is indicated by the addition
of API resources along with resource methods (GET or
POST). The new resources can be configured with path
parameters other than query parameters.
Microservices is the collection o f the actual executable
components and deploy in several environments that
support CPU and GPU computing to optimal execution
of algorithms. The services with data requirements are
provided with independent databases to retrieve data.
The microservices architecture is adopted due to its
scalability with the addition of services. This allows
distributed execution of bioinformatics related analyses
using a cluster of computing nodes that support both
GPUs and CPUs. In the scenarios, where databases are
used to perform queries, separate database instances can
be maintained alongside the microservice in its own
container space. Depending on the resource utilization
and the growing demand of an instance, the number of
computing nodes can be increased. Further, extra nodes
to support different technologies and computing
capabilities can be added. The API gateway manages the
addition of services and routing requests to those
services.
The deployment of the new services involves the steps
of spinning up a microservice container or a process,
assigning an API resource along with a method and
wiring up the API gateway resource endpoint with the
desired microservice endpoint and port. Services such as
BLAST [24] that require data to be fetched are provided
with independent databases. The services which require
specific hardware configurations run on clusters that
possess such capabilities. The system consists of a web
application that renders visualized components and
requires sending data once they are processed.
3.2 Methodology
Our solution processes data using two types o f
clusters; GPU and CPU clusters. GPU clusters execute
algorithms that exploit GPU level parallelism such as
GPU BLAST [25], SW-CUDA [26], while the CPU
clusters execute algorithms that are not optimized for
GPUs such as Clustal Omega [27] and T-Coffee [28].
The use of GPUs is highlighted due to the capability of
executing GPU powered containers, which can cater to
many implementations that exploit GPU parallelism.
CPU clusters are configured with different specifications
as demanded by the process. The services that run I/O
intensive operations such as BLAST require higher disk
space allocation and a RAM of nearly 16GB to perform
optimally. However, multiple sequence alignment
Externa
\p\
Ui cr o sc r vi ce s Nucleotid e
AP Gateway Dat ab as e
3LA ST S ervice
Smart Computing and Systems Engineering, 2019
Department of Industrial Management, Faculty of Science, University o f Kelaniya, Sri Lanka. 72
algorithms do not require more disk space, but they can
be made run faster by having a RAM of reasonable
capacity. Section V describes the evaluation of the
services for memory footprint in detail.
perform GPU computations and CPU computations
separately.
The two clusters are exposed to the external API via
the API gateway.
3.3 Interaction model
Majority of the data processing in our solution
happens in an asynchronous manner due to the longer
times consumed for heavy computations. Hence, the end
results are provided for users in three ways, (1) HTTP
requests and Websocket push, (2) Webhook calls and (3)
provision of a Results endpoint. Different services are
configured to use different forms of response methods
depending on the time complexity o f the operations. For
example, results of BLAST processes are made available
through a results endpoint and a WebSocket push,
whereas Clustal Omega results are directly sent as HTTP
responses. This is because BLAST results take longer
times than a normal HTTP timeout and Clustal Omega
provides results much faster. This eliminates the
resources being used up by connection waiting times.
1) HTTP requests and WebSocket push
This provides REST call access via HTTP requests
and WebSocket access to the services. WebSocket
methodology keeps alive WebSocket throughout the
service execution and data is pushed back to the user. All
the services are designed to support this functionality
since this can be used by other web-based applications
that use the microservices to obtain a greater user
experience.
12) Webhook Call
Webhook Call enables the provision of another API
endpoint to receive the execution results. This is only
used for heavy execution services such as BLAST. The
results will be sent to the provided URL by means of a
POST request once the computations are completed.
13) Results Endpoint
This generates a new endpoint in each microservice,
providing direct access to the results. The results will be
available for a limited time, after which a scheduled task
will clear the results to save the storage space to serve
future demands.
4 Methodology
4.1 Hardware arrangement
The system comprises two types of computations that
require CPUs and GPUs. Therefore, two clusters are
utilized in the system to perform relevant executions.
Figure 2 shows the processing units in our solution. The
solution is implemented on top o f the two clusters to
Figure 2. Arrangement o f processing units.
Table 1 states the specifications o f the hardware used
in each cluster. The system consists of a CUDA powered
GPU and a CPU that can run up to 8 threads using
hyperthreading.
Table 1. Hardware specifications.
S p e c if ic a t i o n C P U G P U
Model i7 4770 3.4GHz GeForce GTX 480
Physical Cores 4 384 (CUDA)
Logical Cores 8 384 (CUDA)
Memory lane size 8GB 4GB
4.2 Service access endpoints
The provided services are accessed via the API
gateway. The API gateway can be configured to forward
the incoming requests as new services, are being added to
the system. Since there are two types of services as a web
application and REST services, they are identified
through the API URL.
Service endpoints are accessed via the following
address.
http://192.248.8.242/api/services/<service-
name>
Web endpoints are access via the following address
http://192.248.8.242/api/web/<web-
application-name>
Figure 3 illustrates the JSON formatted request object
for the BLAST service. The users can send all flags along
with the request to perform a BLAST search as
demonstrated.
{'data": '< SE QU EN CE DA TA IN F AST A
FORM AT> " , "gpu": 'true",
"th rea ds ': "4"
}
Figure 3. Request body to BLAST service.
Smart Computing and Systems Engineering, 2019
Department of Industrial Management, Faculty of Science, University o f Kelaniya, Sri Lanka. 73
Figure 4 shows the response object for an HTTP REST
invocation of the BLAST service. The users are provided
with a results endpoint as a results file or a message
indicating the status of the service as queued, in progress
or failed.
{"resultsEndpoint":
"<DOMA IN>/api/gpu-blast/results/id_1bas 98xA"
}
Figure 4. Response JSON for BLAST service.
4.3 Service execution and process
management
The system consists of several services running in
each cluster as shown in Figure 2. Thus, a given service
only allows the execution of a single process at a time.
Figure 5 illustrates the execution of requests given to a
service. Initially, all the requests are queued and followed
by sequential execution within the computing unit in a
First-In-First-Out (FIFO) manner. A job ID is assigned at
the beginning of execution to provide a result uniquely
for each job. This job ID is used to generate the results
link. This approach ensures that the resources of the
system will not be overwhelmed due to concurrent
requests. Additionally, the methodology makes available
the resources for all the services running on a given
computing unit. Further, the queue is used to manage the
concurrency to achieve the optimum throughput of each
of the services by exploiting the maximum concurrency.
Figure 5. Request processing.
Execution of algorithms such as BLAST requires a
separate database to perform the search. Therefore, a
database is maintained independently for that particular
service. Services can be configured to fetch data via FTP
access. For example, BLAST service can access the
NCBI FTP Site [29].
5. Results and discussion
The system is tested for response times and compared
the features to obtain the outstanding metrics. We have
evaluated the system to compare the performance of
different services running in heterogeneous environments
while being subjected to varying concurrency levels
during high demands. Figure 6, shows the time taken by
the BLAST search service to get results by executing on
CPUs and GPUs for varying lengths of input sequences.
The service searches through the env nr protein database
from NCBI BLAST FTP Server [29], which consists of
7,007,470 protein sequences with 1,397,713,333
characters.
Figure 6. Execution time of BLAST in CPU and GPU
vs sequence length.
It is evident that the times taken for CPU computations
have increased rapidly with the increasing length of the
input sequence, whereas times taken for the GPU
computations have a slight increase with the increase of
the sequence length. Although in sequential execution,
the time was increasing rapidly, the time increase for
GPU execution was not so steep, leading to an increase in
speedups. Therefore, the GPU powered microservices
based solution provides better performance while saving
resources and time for more computational requests from
users.
Figure 7. Service execution time in parallel and
individually vs length of the sequence.
Additionally, the system is tested for the performance
of three Multiple Sequence Alignment (MSA) tools;
Smart Computing and Systems Engineering, 2019
Department of Industrial Management, Faculty of Science, University o f Kelaniya, Sri Lanka. 74
Clustal Omega [27], DIALIGN [30] and T-Coffee [28].
Figure 7, shows the times taken for invocation of MSA
services that execute in sequentially and concurrently for
different lengths of the sequences. The results were
obtained to ensure that the execution of multiple services
on the same computing unit does not cause to delay
operations due to the process of scheduling scheme of
operating system. The experiment was conducted using 5
sequences from the env_nr protein database of lengths
varying from 500 to 1000 characters as inputs for each
MSA tool. According to the performance measures, the
parallel execution required nearly the same time as that
of the process that consumed the most time. Thus, the
parallel execution of independent services does not
hinder the performance of other services running in the
same computing unit.
The proposed solution was further evaluated based on
the ability to handle concurrency. Exhaustive tests were
conducted for the three MSA tools Clustal Omega,
DIALIGN and Т-Coffee using up to 10 concurrent
service executions (CSE) for each. We have obtained the
measurements related to the usage of system memory and
consumption of time. This information can be used to
decide on the amount of allowable concurrency per
microservice container and the resources to be allocated
for each of them. The test was conducted using 5
sequences from the env nr protein database, where each
sequence was 500 characters long.
performance significantly. Thus, the implementation of
the service must use a concurrent queue that limits the
number of active executions at any given time.
Furthermore, the system is tested for the use of
memory along with the increase of the concurrency,
which determines the nature of the physical resources
demanded by the services. Figure 9, shows the maximum
memory used by each service with the increase of the
concurrency. The memory footprint of DIALIGN is not
affected much by the concurrency.
Figure 8. Times taken by each MSA service vs the
number ofC SE.
Table 2. Feature comparison between the implemented solution and existing systems.
F e a tu r e S ta n d a lo n e t oo l s W e b ap p l ic a ti on s M i c ro s er v ic c s b a se d s o lu ti o n
T e ch n ol og y J AV A / C + +, C L I p ro g ra m s W e b b a se d M i cr os er v ic es b as ed
F au lt I so la ti on E nt ir e sy s te m c an fa il E nt ir e sy s te m c an f ai l I m pr ov e d f au lt is o la tio n
S ca la b ili ty N o t sc al ab le H o ri zo nt al ly sc al ab le In d iv id u al s er vi ce s c an b e s ca le d a t a m o r e gr an u la r le ve l.
R e al -t im e p ro ce s si ng N o s up p o rt N o s u pp o rt Pu b lis h -s ub s cr ib e f ra m ew o r k en a bl es r ea l- ti m e da ta pr oc e ss in g
T ec h n ol og y s ta ck
support
W h o le s ys te m i s de v el op e d
u si ng a s in g le t ec h no lo g y
W h o le s ys te m i s d ev el op e d
u si ng a si ng le t ec h no lo g y
S er v ic es c an b e d ev e lo pe d u s in g di ff er e nt te c hn o lo gi es
s ui ta bl e fo r th e a pp li ca ti on .
G P U s up p or t L im i te d by m ac h in e sp e cs E x p en si v e an d in e ff ic ie nt t o
ru n a ll o il a si ng le G P U cl us te r
C an h a ve G PU cl u st er s de p lo ye d f o r s pe c if ic s er vi ce s
Figure 8 shows the times taken by each of the services
Clustal Omega, T-Coffee and DIALIGN to complete
execution under varying levels of concurrency.
According to the results, the increase of concurrency has
degraded the overall performance of the service.
DIALIGN shows a low gradient whereas the times
observed for Clustal Omega and T-Coffee have a steep
increase along with the increasing number of CSE.
Clustal Omega has a 5-fold time increase of 10 CSE.
However, up to a concurrency of 4 service executions can
be permitted as the increase of the time is nearly twice
that of a single execution. Similarly, for T-Coffee, a
concurrency of 1 to 3 service executions can be allowed.
For DIALIGN the maximum concurrency of 10 service
executions can be allowed without degrading the
However, Clustal Omega has linearly increasing
memory usage with concurrency depicting a positive
correlation. According to the results, the memory usage
for T-Coffee has plateaued after a concurrency of 3 CSE.
Although the memory demand has increased along with
the growth of the CSE, the memory requirement for
DIALIGN is not high, which agrees with the observed
times. Thus, a maximum concurrency o f 10 service
executions can be allowed for this service. In contrast,
Clustal Omega has increased its memory usage more than
10 times as the concurrency has increased. Thus, a
concurrency level of 1 to 5 service executions can be
safely allowed and limited the maximum memory under
1GB. However, considering the time, an optimum
concurrency of 1 to 4 service executions can be
considered as ideal. Similarly, for T-Coffee, an ideal
Smart Computing and Systems Engineering, 2019
Department of Industrial Management, Faculty of Science, University o f Kelaniya, Sri Lanka. 75
concurrency level of 1 to 2 service executions can be
agreed, given the maximum memory, demand lies around
1GB.
According to the results, for an optimal service
operation, Clustal Omega should be limited to a
concurrency of 4 service executions, Т-Coffee to 2
service executions and DIALIGN with a maximum
concurrency of 10 service executions. Here, all the
services can be expected to run, each consuming a
maximum memory of 1GB and provide results in less
than 5 seconds of waiting time, which is reasonable for
an HTTP request to complete in an analytical use case.
The evaluation is based on a single operating
environment without using containers, which may result
in a slightly lower performance due to the virtualization
overheads of the Docker platform.
-♦-Clustal Omega -*-T"Coffcc -*-DiALIG N
Figure 9. Maximum memory utilized vs level of
concurrency (CSE).
We have compared the existing bioinformatics
analysis platform architectures and the proposed
microservices solution as shown in Table 2. Since the
microservices architecture has become popular in
addressing many issues and bottlenecks in the cloud
computing domain, this can be used in bioinformatics
domain. Further, the microservices based solution
outstands among other standalone and web-based
solutions in terms of fault tolerance, scalability, real-time
processing and technological flexibility [4].
6. Conclusion and future work
The microservices architecture has become one of the
most trending architectures for cloud computing and web
services. This paper has used microservices based
architectural pattern for algorithm execution in
bioinformatics platforms. The adopted architecture o f the
proposed solution allows the integration of heterogeneous
techniques and provides a hybrid platform with both
GPU and CPU computing units. This enabled a language-
agnostic means of building a robust bioinformatics
platform for computations with the integration of
different technologies while using message passing for
communication.
This work can be extended by integrating more
services and evaluating their performance in different
environments with the use of cloud computing services.
Furthermore, we intend to integrate this solution
architecture for our proposed work on workflow
modelling tool to execute bioinformatics workflows
enabling the seamless integration of function in the form
of independent services.
7. References
[1] F. Andry, N. Dimitrova, A. Mankovic, V. Agrawal, A.
Bder and A. David, “PAPAyA: A Highly Scalable Cloud-
based Framework for Genomic Processing,” in
Proceedings of the 9th International Joint Conference on
Biomedical E ngineering Systems and Technologies
(BIOSTEC 2016), Rome, Italy, 2016, pp. 198-206.
[2] C. Richardson, “API gateway pattern,” [Online],
Available:
http://microservices.io/pattems/apigatew ay.html
[Accessed 25 February 2018].
[3] C. L. Williams, J. C. Sica, R. T. Killen and U. G. J. Balis,
“The growing need for microservices in bioinformatics,”
Jour nal o f Pathalogy Informatics, vol. 7, pp. 45-48, 2016.
[4] D. Shadija, M. Rezai and R. Hill, “Microservices:
Granularity vs. Performance,” in Proceedings of the 10th
Internatio nal Conference on Utility and Cloud
Computing, Austin, Texas, USA, 2017.
[5] J. Thönes, “Microservices,” IEEE Software, vol. 32, no. 1,
2015, pp. 113 - 116.
[6] N. Dragoni, S. Giallorenzo, A. L. Lafuente, M. Mazzara,
F. Montesi, R. Mustafin and L. Safina, “Microservices:
Yesterday, Today, and Tomorrow,” in Present and
Ulterior Software Engineering, Cham, Springer, 2017.
[7] “National Center for Biotechnology Information,”
National Center for Biotechnology Information, U.S.
National Library o f Medicine, [Online], Available:
https://www.ncbi.nlm.nih.gov. [Accessed 4 January
2018].
[8] T. Oinn et al., “Tavema: a tool for the composition and
enactment of bioinformatics workflows,” Bioinformatics,
vol. 20, no. 17, 2004, pp. 3045-3054.
[9] J. G oecks,A. Nekrutenko ,J. Taylor and The Galaxy
Team, “Galaxy: a comprehensive approach for supporting
accessible, reproducible, and transparent computational
research in the life sciences,” Genome Biology, vol. 11,
no. 8, p. R86, 2010.
[10] B. Familiar, Microservices, IoT and Azure: Leveraging
DevOps and Microservice Architecture to deliver SaaS
Solutions, New York, United States: Apress, 2015.
[11] “What is a Container | Docker,” Docker, [Online],
Available: https://www.docker.com/what-container.
[Accessed 22 March 2017].
Smart Computing and Systems Engineering, 2019
Department of Industrial Management, Faculty of Science, University o f Kelaniya, Sri Lanka. 76
[12] D. Merkel, “Docker: lightweight Linux containers for
consistent development and deployment,L inux Journal,
no. 239, p. Article No. 2, 2014.
[13] N. Swainston, A. Currin, L. Green, R. Breitling, P. J. Day
and D. B. Kell, “CodonGenie: optimized ambiguous
codon design tools,” Pe erJ Computer Science, vol. 3, no.
el2 0, 2017.
[14] M. M. Nowotka, A. Gaulton, D. Mendez, A. P. Bento, A.
Hersey and A. Leach, “Using ChEMBL web services for
building applications and data processing workflows
relevant to drug discovery,” Expert Opinion on Drug
Discovery, vol. 12, no. 8, 2017, pp. 757-767.
[15] P. E. Khoonsari et al., “Interoperable and scalable
metabolomics data analysis with microservices,bioRxiv
213603, 2017.
[16] “Kubemetes |Production-Grade Container Orchestration,”
[Online], Available: https://kubemetes.io . [Accessed 25
February 2018].
[17] P. D. Dobson, P. Mendes, D. B. Kell and N. Swainston,
“A Metabolic Reaction Balancing Web Service for
Computational Systems Biology,” bioRxiv 187328, 2017.
[18] B. Fjukstad, V. Dumeaux,K. S. Olsen, E. L und,M .
Hallett and L. A. Bongo, “Building applications for
interactive data exploration in systems biology,” in
Proceedings of the 8th AC M International Conference on
Bioinformatics, Computational Biology and Health
Informatics, Boston, Massachusetts, USA, 2017, pp. 556
561.
[19] R. Hill, D. Shadija and M. Rezai, “Enabling Community
Health Care with Microservices,” in Proceedings o f the
16th IEEE International Conference on Ubiquitous
Computing and Communications, Guangzhou, China,
2017.
[20] A. R. Colombo, T. J. Triche Jr and G. Ramsingh, “Arkas:
Rapid reproducible RNAseq analy sis,” F1000Research,
vol. 6, no. 586,2017.
[21] “Welcome - BaseSpace Sequence Hub,” Illumina,
[Online], Available:
https://basespace.illumina.com/hom e/index. [Accessed 16
January 2018].
[22] A. Welivita, I. Perera, D. Meedeniya, A. Wickramarachchi
and V. Mallawaarachchi, “Managing Complex Workflows
in Bioinformatics - An Interactive Toolkit with GPU
Acceleration,” IEEE Transactions on Nano Bioscience,
2018, (to appear).
[23] A. Welivita, I. Perera and D. Meedeniya, “An Interactive
Workflow Generator to Support Bioinformatics Analysis
through GPU Acceleration,” in Proceedings of the IE EE
International Conference on Bioinformatics and
Biomedicine, Kansas City, MO, USA, pp. 457-462, 2017.
[24] “BLAST: Basic Local A lignmentSearch Tool,” NCBI,
[Online].Available:
https://blast.ncbi.nlm.nih.gov/Blast.cgi. [Accessed 6
January 2018].
[25] “GPU-Blast | Sahindis,” [Online], Available:
http://archimedes.cheme.cmu.edu/?q=gpublast. [Accessed
3 January 2018].
[26] S. A. Manavski and G. Valle, “CUDA compatible GPU
cards as e cient hardware accelerators for Smith-
Waterman sequence alignment,” BMC Bioinformatics, vol.
9, no. 2, 2008.
[27] “Clustal Omega - fast, accurate, scalable multiple sequence
alignment for proteins,” [Online], Available:
http://www.clustal.org/omega/. [Accessed 12 January
2018].
[28] “Т-Coffee Home Page,” [Online], Available:
http://www.tcoffee.org/Projects/tcoffee/. [Accessed 12
January 2018].
[29] “BLAST FTP Site - BLAST Help - NCBI Bookshelf,”
National Center for Biotechnology Information, U.S.
National Library of Medicine, [Online], Available:
https://www.ncbi.nlm.nih.gov/books/NBK62345/.
[Accessed 4 January 2018].
[30] L. Al Ait, Z. Yamak and B. Morgenstern, “DIALIGN at
GOBICS - multiple sequence alignment using various
sources of external information,” Nucleic A cid s Research,
vol. 41, no. W l, p. W3-W7, 2013.
Smart Computing and Systems Engineering, 2019
Department of Industrial Management, Faculty of Science, University o f Kelaniya, Sri Lanka. 77
... Mikroservis temelli sağlık sistemleri de son yıllarda pek çok sistemde kendini göstermeye başlamıştır. [22] yaşlı hastaları takibi için mikroservis temelli bir IoT sistemini tasarlamış, [23] ise mikroservis alt yapısı kullanan bir bioenformatik platformunu tanıtmıştır. Bununla birlikte Aile Hekimlikleri gibi alt sağlık birimlerinin hasta takibini ve ana sağlık sistemi ile entegrasyonu mikroservis temelli ve bulut altyapısı üzerinden sağlayan net bir sistem tasarımı yazarların bildiği kadarıyla henüz önerilmemiştir. ...
Article
Gecikme ve sistem yükü, bulut sistemler için iki kritik konudur. Bu konular akıllı şehir, akıllı sağlık sistemleri gibi büyük projelerde daha da önem kazanmaktadır. Son yıllarda kenar/uç ve sis gibi bulut teknolojileri bu iki kritik konuda önemli kazanımlar sağlamayı başarmıştır. Ancak bununla birlikte bu tür sistemlerde veri iletişimi, analiz ve değerlendirme gibi işlemlerin nasıl sağlanacağı da iyi planlanmalıdır. Bu tip bulut teknolojilerinde monolotik yazılım mimarileri yerine mikroservis temelli mimarilerin tercih edilmesi daha esnek çözümler sağlayabilmektedir. Bu çalışmada mikroservis temelli uç bulut teknolojisi alt yapısı kullanan bir hasta takip sistemi önerilmektedir. Önerilen sistem sayesinde, kritik durumlu hastaların takip edilmesi ve hastada gerçekleşebilecek acil durumlar hiyerarşik bir şekilde değerlendirilebilmektedir. Önerilen sistemin özgün yanı, kullanılan sistemin devre kesici mekanizması kullanan mikroservis yazılım mimarisini kullanması ve tüm mikroservisleri konteyner alt yapısı ile kontrol edebilmesidir. Bu özellikleri sayesinde sistem yükü ve cevap gecikmesinde önemli iyileştirmeler elde edilmiştir.
Article
Full-text available
Motivation: Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. Results: We developed a virtual research environment which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics, and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science. Availability and implementation: The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the virtual research environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects. Supplementary information: Supplementary data are available at Bioinformatics online.
Conference Paper
Full-text available
Next Generation Sequencing has introduced novel means of sequencing millions of DNA molecules simultaneously and has opened up new avenues in the field of bioinformatics that requires high performance computing technologies. Bioinformatics pipelines are constructed to carry out bioinformatics analyses in a fast and efficient manner. Workflow systems are developed to simplify the construction of pipelines and automate analyses. Still, with the availability of large amounts of sequence data, it has become challenging to have results within a reasonable amount of time. The research proposes a GPU accelerated generic software system to construct bioinformatics workflows. The system allows performing analyses through dedicated GPU computing resources, while incorporating novel web technologies to support specific requirements of bioinformatics software. The results indicate a speedup of x3.11 when a workflow is run on the GPU accelerated system than on a CPU. System usability scale score of 77.5 suggests good usability for the system.
Article
Full-text available
Introduction: ChEMBL is a manually curated database of bioactivity data on small drug-like molecules, used by drug discovery scientists. Among many access methods, a REST API provides programmatic access, allowing the remote retrieval of ChEMBL data and its integration into other applications. This approach allows scientists to move from a world where they go to the ChEMBL web site to search for relevant data, to one where ChEMBL data can be simply integrated into their everyday tools and work environment. Areas covered: This review highlights some of the audiences who may benefit from using the ChEMBL API, and the goals they can address, through the description of several use cases. The examples cover a team communication tool (Slack), a data analytics platform (KNIME), batch job management software (Luigi) and Rich Internet Applications. Expert opinion: The advent of web technologies, cloud computing and micro services oriented architectures have made REST APIs an essential ingredient of modern software development models. The widespread availability of tools consuming RESTful resources have made them useful for many groups of users. The ChEMBL API is a valuable resource of drug discovery bioactivity data for professional chemists, chemistry students, data scientists, scientific and web developers.
Article
Full-text available
The recently introduced Kallisto pseudoaligner has radically simplified the quantification of transcripts in RNA-sequencing experiments. We offer cloud-scale RNAseq pipelines Arkas-Quantification, which deploys Kallisto for parallel cloud computations, and Arkas-Analysis, which annotates the Kallisto results by extracting structured information directly from source FASTA files with per-contig metadata and calculates the differential expression and gene-set enrichment analysis on both coding genes and transcripts. The biologically informative downstream gene-set analysis maintains special focus on Reactome annotations while supporting ENSEMBL transcriptomes. The Arkas cloud quantification pipeline includes support for custom user-uploaded FASTA files, selection for bias correction and pseudoBAM output. The option to retain pseudoBAM output for structural variant detection and annotation provides a middle ground between de novo transcriptome assembly and routine quantification, while consuming a fraction of the resources used by popular fusion detection pipelines. Illumina's BaseSpace cloud computing environment, where these two applications are hosted, offers a massively parallel distributive quantification step for users where investigators are better served by cloud-based computing platforms due to inherent efficiencies of scale.
Chapter
Full-text available
Microservices is an architectural style inspired by service-oriented computing that has recently started gaining popularity. Before presenting the current state-of-the-art in the field, this chapter reviews the history of software architecture, the reasons that led to the diffusion of objects and services first, and microservices later. Finally, open problems and future challenges are introduced. This survey primarily addresses newcomers to the discipline, while offering an academic viewpoint on the topic. In addition, we investigate some practical issues and point out some potential solutions.
Article
Bioinformatics research continues to advance at an increasing scale with the help of techniques such as next generation sequencing and the availability of tool support to automate bioinformatics processes. With this growth, a large amount of biological data gets accumulated at an unprecedented rate demanding high performance and high throughput computing technologies for processing such datasets. Use of hardware accelerators such as Graphics Processing Units (GPUs) and distributed computing, accelerate the processing of big data in high performance computing environments. They enable higher degrees of parallelism to be achieved, thereby increasing the throughput. In this paper, we introduce BioWorkflow, an interactive workflow management system to automate the bioinformatics analyses with the capability of scheduling parallel tasks with the use of GPU-accelerated and distributed computing. The paper describes a case study carried out to evaluate the performance of a complex workflow with branching executed by BioWorkflow. The results indicate gains of x2.89 magnitude by utilizing GPUs and gains in speed by average x2.832 magnitude (over n=5 scenarios) by parallel execution of graph nodes during multiple sequence alignment (MSA) calculations. Combined speedups achieved x1.71 times for complex workflows. This confirms the expected higher speedups when having parallelism through GPUacceleration and concurrent execution of workflow nodes than the mainstream sequential workflow execution. The tool also provides a comprehensive user interface with better interactivity for managing complex workflows; System usability scale score of 82.9 confirmed high usability for the system.
Conference Paper
Microservice Architectures (MA) have the potential to increase the agility of software development. In an era where businesses require software applications to evolve to support emerging software requirements, particularly for Internet of Things (IoT) applications, we examine the issue of microservice granularity and explore its effect upon application latency. Two approaches to microservice deployment are simulated; the first with microservices in a single container, and the second with microservices partitioned across separate containers. We observed a negligible increase in service latency for the multiple container deployment over a single container.
Conference Paper
The significant increase in the rate of data generation by the systems biology community creates a need for interactive exploration tools to explore the resultant datasets. Such tools need to combine advanced statistical analyses, prior knowledge from biological databases, and interactive visualizations with intuitive user interfaces. Each specific research question potentially requires a specialized user interface and visualization methods. Although some features are application-specific, the underlying components of the data analysis tool can be shared and reused. Our approach for developing data exploration tools in systems biology builds on the microservice architecture that separates an application into smaller components which can communicate using language-agnostic protocols. We show that this design is well suited for bioinformatics applications where different tools written in different languages by different research groups is the norm. Packaging each service in a software container enables re-use and sharing of key components between applications, reducing development, deployment, and maintenance time. We demonstrate the viability of our approach through a web application, entitled MIxT blood-tumor, for exploring and comparing transcriptional profiles from blood and tumor samples in breast cancer patients. The application integrates advanced statistical software, up-to-date information from biological databases, and modern data visualization libraries.
Article
CodonGenie, freely available from http://codon.synbiochem.co.uk , is a simple web application for designing ambiguous codons to support protein mutagenesis applications. Ambiguous codons are derived from specific heterogeneous nucleotide mixtures, which create sequence degeneracy when synthesised in a DNA library. In directed evolution studies, such codons are carefully selected to encode multiple amino acids. For example, the codon NTN, where the code N denotes a mixture of all four nucleotides, will encode a mixture of phenylalanine, leucine, isoleucine, methionine and valine. Given a user-defined target collection of amino acids matched to an intended host organism, CodonGenie designs and analyses all ambiguous codons that encode the required amino acids. The codons are ranked according to their efficiency in encoding the required amino acids while minimising the inclusion of additional amino acids and stop codons. Organism-specific codon usage is also considered.