Content uploaded by Suharjito Suharjito
Author content
All content in this area was uploaded by Suharjito Suharjito on Jun 11, 2016
Content may be subject to copyright.
Indonesian Journal of Electrical Engineering and Computer Science
Vol. 2, No. 3, June 2016, pp. 720 ~ 728
DOI: 10.11591/ijeecs.v2.i3.pp720-728 720
Received March 4, 2016; Revised May 13, 2016; Accepted May 28, 2016
Hybrid Disk Drive Configuration on Database Server
Virtualization
Ferdy Nirwansyah*1, Suharjito2
Magister in Information Technology, Binus Graduate Program, Bina Nusantara University, Jakarta,
Indonesia
*Corresponding author, e-mail: ferdy.nirwansyah@binus.ac.id; suharjito@binus.edu
Abstract
Solid State Drive (SSD) is a revolutionary new storage technology. Enterprise storage system
using full SSD is still very expensive, while hard disk drive (HDD) is still widely used. This study discusses
hybrid configuration storage in virtualized server database with benchmark against four hybrid storage
configuration for four databases, ORACLE, SQL Server, MySQL and PostgreSQL on Windows Server
virtualization. Benchmark using TPC-C and TPC-H to get the best performance of four configurations were
tested. The results of this study indicate HDD storage configurations as visual disk drive OS and SSD as
visual disk drives database get better performance as on line transaction processing (OLTP) and on line
analytical processing (OLAP) database server compared with SSD as visual disk drive OS and HDD as a
visual disk drive database. Based on the data research TPC-C, OLTP get best performance at HDD
storage configurations as visual disk drive OS and SSD as a visual disk drives database and temporary
files.
Keywords: Database, High availability, Server virtualization, Hybrid storage
Copyright © 2016 Institute of Advanced Engineering and Science. All rights reserved.
1. Introduction
Database becomes an integral part of our daily life [1]. In the modern business world,
the database became operational support company called on line transaction processing
(OLTP). The database is also used to assist companies in analyzing and making decisions,
known as the on line analytical processing (OLAP). Database has transaction data and
supporting data needed. More the number of transactions on the application and the database,
more complex the application and database infrastructure that must be owned by the company
[2]. Infrastructure applications and databases is essential in order to improve high availability.
To improve the performance can be with performance and tuning, namely on: first, server
environment such as mainboard, processor, RAM, Lan card and others. Second, storage
environment, third, database environment, fourth, network environment, fifth, desktop computer
environment [3].
Along with the development of technology, high availability can be improved with
virtualization. Management of server hardware and software becomes more practical. Some of
the features that can be done are: management of hardware resources to each OS in a VM, for
example, processor, RAM and storage, monitoring resource allocation to each OS and others.
At the storage level, we recognize several types of disk, namely: solid state drive (SSD)
and magnetic drive (SAS and SATA), sometimes called the hard disk drive (HDD). SSD is a
revolutionary new storage technologies and a positive impact on the performance of system and
database. Investment of enterprise storage system that uses full SSD storage system is still
very expensive, on the other hand there are many enterprise storage system using HDD as
legacy system.
With the condition of existing storage, it influenced the design of the IT infrastructure in
enterprise storage systems. In order for the utilization of all storage resources and the resulting
optimal performance, there should be between SSD and HDD configuration. This also applies to
the configuration of storage on the database server. This study will do a hybrid configuration
database storage on the virtualization server.
To get the best possible configuration should be performed benchmark. The criteria for
a good benchmark for the performance are as follows: first, representative, second, relevant,
ISSN: 2502-4752
IJEECS Vol. 2, No. 3, June 2016 : 720 – 728
721
third, portable, fourth, scalable, fifth, and sixth verifiable, simple. There are some benchmarks
that reach active industry standards. The most commonly used is the TPC and SPEC. For this
study, using a TPC benchmark performance measurement. Because the focus of research in
the Database OLTP and OLAP will use the TPC-C and TPC-H. On the other hand, to complete
this research, performance measurement testing of four databases are ORACLE, SQL Server,
MySQL and PostgreSQL.
2. Related Works
Mao makes parity-based framework Hybrid Disk Array Architecture (HPDA) that
combine group and two HDD SSDs to improve performance and reliability of SSD-based
storage systems. In HPDA, SSD (data disk) and part of the HDD (parity disk) write array RAID4.
Reliability analysis showed that the reliability HPDA, in terms of Mean Time to Data Loss
(MTTDL), higher than the HDD or SSD. HPDA prototype implementation and performance
evaluation showed that significantly exceeds HPDA SSD and HDD [4].
Bassil presents a comparative study on the performance of the system over the DBMS.
Testing is in MSSQL Server 2008, Oracle11g, IBMDB2, MySQL5.5 and MS Access 2010. This
test aims to execute a different SQL query with different levels of complexity five DBMS tested.
This will open the way to make a head-to-head comparative evaluation shows the average
execution time, memory usage and CPU utilization of each DBMS after completion of the test.
The test results showed no DBMS that has the most excellent performance. IBM DB2 is the
fastest DBMS, but MS Access has a lower CPU utilization than any other DBMS and IBM DB2
is the most widely consumed main memory [5].
Kim made a system called the Hybrid Store where there are 2 diprovide: Hybrid-Plan
and HybridDyn. Hybrid-Plan will improve capacity planning for administrators with the overall
goal of operational cost-budgets. HybridDyn improve the performance / lifetime guarantees
during episodes of deviation of the workload. Testing and implementation need to be known
advantages and disadvantages of this framework. To evaluate in terms of performance and total
cost. Analysis speed performance hybrid store is nearing full SSD storage, but on top of the full
storage HDD [6].
Bausch, Petrov, and Buchmann observed the different performance of the join algorithm
available in PostgreSQL on SSD and HDD. First observing, point query shows the performance
improvement of up to fifty times. Second, the range of the query to perform is well on the SSD.
Join algorithm behaves differently depending on how well they conform to the nature of SSD or
HDD [7].
According topropose and design a systematic exploration of the use of SSD to improve
performance buffer manager of the DBMS. They propose three alternatives that differ primarily
in how the buffer manager copes with dirty pages to be removed from the buffer pool. They
implement this alternative, as well as other new algorithm proposed for this study (TAC), in
SQLServer, and perform testing using various benchmarks (TPC-C and TPC-H) on some scale
factors. Empirical evaluation showed significant performance improvements of their methods of
improvement on the configuration of the HDD (up to 9.4X), and up to 6.8X more acceleration
with TAC [8].
In another study, presenting analytical tool to assess the configuration formed by the
combination of all kinds of resources. Using the tool to analyze the logical volume collected
statistics collected from 120 large production systems. This study showed a combination of
SSD, SCSI, and SATA configurations in many cases be better than just using SCSI devices in
all key aspects: price, performance and power consumption. This contrasts with other recent
studies on enterprise system smaller pessimistic about profits SDD in enterprise settings [9].
Jo creates HybridCopy-on-Write (CoW) disk storage that combines SSD and HDD to
Consolidate environment. The proposed scheme puts read-only disk image template on the
SSD, while write operations to the HDD. Creating an efficient combination of SSD and HDD in
consolidate environment. Hybrid storage CoW is clearly beneficial, for performance and cost
effective. The drawback is working in VMware level so as to measure the performance of each
VM hard. From the test results obtained CoW Hybrid storage performance over storage,
enabling full HDD but still below the full SSD storage [10].
Wu and Reddy make a framework by making drivers on Linux. Management of storage
capacity on a storage system used a hybrid SSD and HDD. This Framework does a
IJEECS ISSN: 2502-4752
Hybrid Disk Drive Configuration on Database Server Virtualization (Ferdy Nirwansyah)
722
combination of SSD and HDD configuration. Testing and implementation need to be known
advantages and disadvantages of this framework. Hybrid HDD benchmark and performance
striping and his claim could rise to 50% in some cases [11].
Lee makes three different types of SSD model from Samsung. Shows how SSD
technology has advanced to reverse the trend of widening the performance gap between
processor and storage. This study also shows that even a single SSD can outperform RAID 0
with eight 15K-RPM enterprise-class disk drives on transaction throughput, cost effectiveness
and power consumption [12].
Park presents the techniques used to increase the reliability and performance of the
new SSD RAID system. First, they analyzed the SSD RAID mechanism and then develop a
methodology adapted to the new RAID SSD storage. Via trace-driven simulation, they evaluate
performance optimized SSDs use RAID mechanism. The proposed method improves the
reliability of SSD is 2% higher than the existing RAID systems and improve I / O performance of
SSDs 28% higher than the existing RAID systems [13].
3. Research Method
SSD technology revolutionizing storage has the potential to change the architectural
principles DBMS [6]. But the SSD itself is still quite expensive. On the other hand enterprise
storage systems still use HDD as legacy system. In this connection, then there will be questions
like:
1. How to improve performance of database servers with hybrid storage configuration that is
optimal in the server virtualization?
2. How to utilize all the resources that exist in storage without compromising the performance
of the database server?
To take advantage of all the resources existing hard drive we use hybrid technique.
Hybrid technique configures the virtual drive in Windows Server as database server and part of
VMWare server virtualization. This research makes hybrid technique by performing virtual disk
configuration of hard disk drives for the OS and database using different storage. The
advantage with doing hybrid technique on virtualization is in terms of practical and convenience
than do the hybrid at the storage level.
The steps of this study: study of literature, installation of the instrument, create a
database and then loading data of fourth database for TPC-C and TPC-H testing as well as the
replacement scheme configuration of the virtual drive, collecting the test data, the performance
evaluation system configurations and then the conclusions and suggestions.
First step, the research begins by determining the background and purpose of the study
as well as defining the scope. The literature study is done to deepen the understanding of the
hybrid technique of virtual disks to virtual drives on Windows Server virtualization server. In
addition, a literature study was also conducted to find out the results of hybrid storage technique
has ever done.
Second step, the research instrument is installation of VMWare, Storage, Windows
Server, and SQL Server. Hardware for the study was:
1. Intel Modular Server Chassis MFSYS25V2: 14 drive carriers, 1 GbE switch, two power
supplies, two power supply fan.Node I: Intel MFS2600KI Compute Module, 2 x Intel (R)
Xeon (R) E5-2660 CPU 0 @ 2.20GHz 8 Cores, 24 GB DDR3 RAM.
2. 2 pieces hard drive Seagate Savvio HDD 300 GB 10K RPM 2.5 ".
3. 4 pieces harddiskFource CORSAIR GS 240GB SSD 2.5 ".
4. 8 pieces SATA hard drives Seagate Momentus 500 GB 7.2K RPM 2.5 ".
5. Switch hub Cisco SG500-28 28 ports.
Firstly, install VMWare on the server, then installation of 4 pieces disk SSD using RAID
10 for the virtual disk drive OS Windows Server in a VM and 8 pieces harddisk HDD using RAID
10 as a virtual disk drive for database ORACLE, SQL Server, MySQL and PostgreSQL. Then
installation of Windows Server as a Virtual Machine (VM) in the VMWare virtual disk drive in
settings uses a configuration that has been provided. After that, do installation HammerDB on
Windows Server. Installation of the four databases, namely: ORACLE, SQL Server, MySQL and
PostgreSQL. Configure the database so that each database is directed to SSD on drive E.
Temporary OS and database files are directed to SSD on drive F. This preparation is done for
testing in the first research. First research infrastructure schemes can be seen in Figure 1.
ISSN: 2502-4752
IJEECS Vol. 2, No. 3, June 2016 : 720 – 728
723
Figure 1. First research infrastructure schemes
Third step, create databases and then loading data for testing, also backup image of
Windows Server. The fourth step is testing the TPC-C and TPC-H. After completed, change
virtual disk drive configuration by using a restore image of Windows Server. Make configuration
of the hard disk with SSD for OS and the HDD for database. Temporary files OS and database
are directed to HDD in drive F. Install a VM from a backup image before. After that, TPC-C and
TPC-H test similar to previous research. The second research infrastructure schemes can be
seen in Figure 2.
Figure 2. Second research infrastructure schemes
The third research, make configuration of HDD for OS and SSD for database.
Temporary files OS and database are directed to the SSD on drive F. After that, TPC-C and
TPC-H test similar to previous research. Third research infrastructure schemes can be seen in
Figure 3.
IJEECS ISSN: 2502-4752
Hybrid Disk Drive Configuration on Database Server Virtualization (Ferdy Nirwansyah)
724
Figure 3. Third research infrastructure schemes
Then fourth research, make configuration of SSD for OS and HDD for database. OS
and database temporary files are directed to the HDD in drive F. Do the testing of TPC-C and
TPC-H similar to previous research. Fourth research infrastructure schemes can be seen in
Figure 4.
Figure 4. Fourth research infrastructure schemes
Used method for this study from create databases to output generated is using
HammerDB. There are four databases that will be examined, ie ORACLE XE, SQLServer,
MySQL and PostgreSQL. There are four configurations of virtual disk drives to be tested, first, a
virtual disk drive for the OS using SSDs and virtual disk drive database using the HDD but the
temporary files of Windows and database using the HDD, second, virtual disk drive for the OS
using the HDD and virtual disk drive database using SSD but the temporary files of Windows
and database using the HDD, third, virtual disk drive for the OS using the HDD and virtual disk
drive database using the SSD but the temporary files of Windows and database using SSD,
fourth, virtual disk drive for the OS using SSDs and virtual disk drive database using HDD but
Windows temporary files and databases using SSD. There are two schemes used by
HammerDB, TPC-C for OLTP and TPC-H for OLAP.
Methods of data collection is to record the results of tests on four databases with four
virtual drive configuration of two different schemes. TPC-C is calculated by TPM. TPC-H is
ISSN: 2502-4752
IJEECS Vol. 2, No. 3, June 2016 : 720 – 728
725
calculated based QPhH of 22 queries are executed. At each TPC scheme conducted two
experiments on each disk configuration and database. So with this scenario has been carried
out 64 experiments. TPC-C using 5 warehouse for 10 and 50 virtual user, rampup time 30
minutes and 10 minutes time duration. TPC-H using SF 1 for 10 and 100 virtual user and 1
query set. In each test produce results in log files. On PostgreSQL, TPC-H for query 17, 20, 21
did not include in test because it requires quite a long time for execute the query.
3. Results and Analysis
Four hard drive configurations for OLTP database using the TPC-C scheme, obtained
database performance comparison of each configuration. On ORACLE, 10 virtual users with
configuration III in the first rank, then second configuration I and configuration II and IV in third
and fourth. Configuration I and III was using HDD as OS and SSD as storage for the database.
The difference is the temporary files on the OS and database configuration I directed to the
HDD but the SSD to configuration III. Configuration III can raise the performance becomes
18.87% compared to the configuration I. For 50 virtual users configuration condition of the order
of performance equal to 10 virtual users. Configuration III succeeded in raising the performance
of 119.04% of the configuration I.
On SQL Server performance configuration III on 10 virtual users slightly outperformed
the configuration I, while the configuration of the IV and II in third and fourth. For 50 user virtual
condition equal to 10 virtual users. Configuration III slightly outperformed the first configuration,
and then configures the IV and II in third and fourth. Not seen significant performance gains of
configuration III to I. For the 50 virtual users actually decrease performance 12:14%.
MySQL occur on different conditions, for 10 virtual user’s configuration I III configuration
outperformed the performance difference 16:44%, followed by the configuration of II and IV in
third and fourth. To 50 virtual users, configuration III is superior to 5:55% compared with the
configuration I. This was followed by the configuration of the IV and II.
On PostgreSQL to 10 virtual users, configuration III ranked first with 18:43% increase in
performance compared Configuration I. Configuration IV and II in third and fourth. To 50 virtual
user configuration III is superior 21:36% of I. Further configuration: Configuration II and IV in
third and fourth. Figure 5 shows the performance of the OLTP database on each disk
configuration.
IJEECS ISSN: 2502-4752
Hybrid Disk Drive Configuration on Database Server Virtualization (Ferdy Nirwansyah)
726
Figure 5. Graph of TPC-C per Database
Four hard drive configurations for OLAP database schema using TPC-H, obtained
database performance comparison of each configuration. On ORACLE, 10 virtual users
configuration III ranked first with 5.64% increase in the performance of configuration I. IV and II
in third and fourth. For 100 virtual user’s configuration condition of the order of performance are
configuration III ranked first with 14:02% performance improvement on the configuration I, IV
and II in third and fourth.
On SQLServer, configuration I outperformed configuration III of performance on 10
virtual users with performance differences 2:59%, while the configuration II and IV in third and
fourth. For 100 virtual users configuration I outperformed configuration III 4.87% difference in
performance, configuration II and configuration IV in third and fourth.
On MySQL, 10 virtual users configuration III outperformed configuration I with the
performance difference is very slightly by 1.73%, followed by the configuration of the IV and II in
third and fourth. For 100 virtual users, configuration I is superior to configuration III, then
followed by the configuration of the IV and II.
On PostgreSQL to 10 virtual users, configuration III ranked first followed configurations I
ranked second and configuration IV and II in third and fourth. For 100 virtual users configuration
ISSN: 2502-4752
IJEECS Vol. 2, No. 3, June 2016 : 720 – 728
727
I is superior to 3:18% of configuration III; next, the configuration IV and II in third and fourth.
Figure 6 shows the performance of OLAP database on each disk configuration.
Figure 6. Graph of TPC-H per Database
4. Conclusion and Future Works
Based on the data TPC-C resulting from this research, OLTP get best performance in
the third configuration in which the OS using the HDD, database using SSD, OS and database
temporary files using SSD. On ORACLE even increased 119.04 % at 50 virtual users compare
with Configuration I. Configuration II and IV is not recommended to use as OLTP database
server. In OLAP, best hard drive configurations are fairly balanced between the configuration I
and III. Require further testing to determine the better performance among two configurations.
Configuration II and IV are not recommended to use as the OLAP database server. SQL Server
on OLAP get the highest performance, this is because standard configuration parameter of SQL
Server is able to make utilization of storage utilization, memory and processor optimally.
PostgreSQL is not recommended for use as an OLAP because there are limitations on the
database engine.
To get the best performance in the OLTP database server can use a hybrid
configuration with the configuration III. In OLAP, configuration I and III of benchmark data is
impartial, has not seen a significant difference. Need to do further testing to Scale Factor (SF)
10 or more and number of virtual users. But this test will require hardware that is higher than in
this research.
The ratio of the hard drive is used in this study is 1: 2. When using a hard drive with 1: 1
ratio the configuration III will get a better ratio of performance to the configuration I compared
IJEECS ISSN: 2502-4752
Hybrid Disk Drive Configuration on Database Server Virtualization (Ferdy Nirwansyah)
728
this research. But these results will not be relevant when used for the ratio was raised to 1:3 or
1:4 and more. Due to the increase in the ratio of use of hard disk, the speed of the HDD storage
will be increasingly offset by the performance of SSD storage.
References
[1] T Connolly and C Begg, “Database Systems”. Sixth Edition, Pearson. 2015.
[2] RK Laday, H Sukoco and Y Nurhadryani. “Distributed System and Multimaster Replication Model on
Reliability Optimation Database”. TELKOMNIKA Indonesian Journal of Electrical Engineering. 2015;
13(3): 529–536.
[3] R Schiesser. “IT systems management”. (2nd ed.). Prentice Hall. 2010: 110–126.
[4] B Mao, H Jiang, S Wu, L Tian, D Feng, J Chen, and L Zeng. “Hpda”. ACM Trans. Storage. 2012;
8(1): 1–20.
[5] Y Bassil. “A Comparative Study on the Performance of the Top DBMS Systems”. arXiv Prepr.
arXiv1205.2889. 2012: 20–31.
[6] Y Kim, A Gupta, B Urgaonkar, P Berman and A Sivasubramaniam. “HybridStore: A cost-efficient,
high-performance storage system combining SSDs and HDDs”. IEEE Int. Work. Model. Anal. Simul.
Comput. Telecommun. Syst. - Proc. 2011: 227–236.
[7] D Bausch, I Petrov and A Buchmann. “On the performance of database query processing algorithms
on flash solid state disks”. Proc. - Int. Work. Database Expert Syst. Appl. DEXA. 2011: 139–144.
[8] J Do, D Zhang, JM Patel, DJ DeWitt, JF Naughton and A Halverson. “Turbocharging DBMS buffer
pool using SSDs”. Proc. 2011 Int. Conf. Manag. data - SIGMOD ’11. 2011: 1113.
[9] R Shaull, T Ron and A Littman. “Enterprise Storage Provisioning with Flash Drive”. 2010.
[10] H Jo, Y Kwon, H Kim, E Seo, J Lee and S Maeng. “SSD-HDD-hybrid virtual disk in consolidated
environments”. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes
Bioinformatics). 2010; 2009(6043 LNCS): 375–384.
[11] X Wu and ALN Reddy. “Managing storage space in a flash and disk hybrid storage system”. Proc. -
IEEE Comput. Soc. Annu. Int. Symp. Model. Anal. Simul. Comput. Telecommun. Syst. MASCOTS.
2009: 610–613.
[12] SW Lee, B Moon and C Park. “Advances in flash memory SSD technology for enterprise database
applications”. Proc. 35th SIGMOD Int. Conf. Manag. data SIGMOD 09. 2009; 14(3): 863–870.
[13] K Park, DH Lee, Y Woo, G Lee, JH Lee and DH Kim. “Reliability and performance enhancement
technique for SSD array storage system using RAID mechanism”. 2009 9th Int. Symp. Commun. Inf.
Technol. Isc. 2009. 2009: 140–145.