Page 1

Real-Time Syst (2011) 47: 41–71

DOI 10.1007/s11241-010-9111-8

Hardware design of a new genetic based disk

scheduling method

Hossein Rahmani ·Mohammad Reza Bonyadi ·

Amir Momeni ·Mohsen Ebrahimi Moghaddam ·

Maghsoud Abbaspour

Published online: 24 November 2010

© Springer Science+Business Media, LLC 2010

Abstract Disk management is an increasingly important aspect of operating sys-

tems research and development because it has great effect on system performance.

As the gap between processor and disk performance continues to increase in modern

systems, access to mass storage is a common bottleneck that ultimately limits overall

system performance. In this paper, we propose hardware architecture of a new genetic

based real-time disk scheduling method. Also, to have a precise simulation, a neural

network is proposed to simulate seek-time of disks. Simulation results showed the

hardware implementation of proposed algorithm outperformed software implemen-

tation in term of execution time, and other related works in terms of number of tasks

that miss deadlines and average seeks.

Keywords Disk scheduling · Neural network · Genetic algorithm · Hardware design

1 Introduction

While processing speed in computer systems have been increased rapidly in last

two decades, the improvement in access time to stable and reliable mass storage

has not been as much as processing time. On average, processing speed tends to

increase 55% per year, while disk access speed only increases about 7% (Iyer

H. Rahmani · M.R. Bonyadi · A. Momeni · M.E. Moghaddam (?) · M. Abbaspour

Electrical and Computer Engineering Department, Shahid Beheshti University G.C., Tehran, Iran

e-mail: m_moghadam@sbu.ac.ir

H. Rahmani

e-mail: h.rahmani@mail.sbu.ac.ir

M.R. Bonyadi

e-mail: m_bonyadi@std.sbu.ac.ir

M. Abbaspour

e-mail: Maghsoud@sbu.ac.ir

Page 2

42 Real-Time Syst (2011) 47: 41–71

2001). This causes a large bottleneck in modern computers (Buddhikot et al. 1994;

Liu et al. 2005) and tasks that have a specified deadline such as created ones in mul-

timedia applications. On the other hand, one of the requirements of real-time appli-

cations is Quality of Service (QOS) guarantee by operating system (Plagemann et

al. 2000). These applications are categorized based on the strictness of their QOS re-

quirements as soft or hard real-time applications (Santos et al. 2008). In soft real-time

applications such as video/audio playback, the most important QOS requirements are

minimizing the number of requests that miss deadlines instead of maximizing the

system throughput (Thomas et al. 1996; Ulusoy and Belford 1993).

Real-time disk scheduling algorithms are considered as soft real time and their

goal is to find a feasible schedule with maximal throughput. Failure in servicing real-

time disk requests may result in disk buffer overflow or playback jitters (Gemmell

and Christoduoulakis 1992). Therefore, due to relatively slow speed of disks the role

of an efficient disk scheduler algorithm is very crucial to guarantee QOS.

In order to satisfy an I/O request, at first the disk head should be moved to the

specified track and sector. Moving the head between cylinders takes a relatively long

time so in order to maximize the number of I/O requests which can be satisfied, the

scheduling policy should try to minimize the movement of the head. Therefore, there

is a trade-off between throughput (the average number of requests satisfied in unit

time) and response time (the average time between a request arriving and it being

satisfied) that should be considered by scheduling policy.

The disk scheduling problem without the real-time constraints has been shown to

be NP-complete (Wong 1980) if the seek-time function is not linear. Also, the general

real-time disk scheduling with linear seek-time function is NP-complete (Huang et

al. 2005; Lu anf Yuan 2007). There are some heuristic methods to find a good answer

but not the best solution for NP-complete problems such as Genetic algorithm, Ant

colony, etc; we used Genetic algorithm to solve disk scheduling problem here. The

reason for selecting Genetic algorithm was that it is simple, it may be implemented

with low time complexity and it is easy to be implemented by hardware. Genetic

algorithms propose a way to find a good but not the best solution of NP-Complete

problems, therefore, Genetic algorithm may be employed in solving disk scheduling

problem.

Here, we propose a genetic based method to schedule disk requests. In the pro-

posed method, a novel coding approach is presented and genetic operators have been

adjusted to get best performance. Because of simple coding, the genetic operators

are simple and fast. In the scheduling phase, to overcome the overhead of proposed

method running time, the proposed algorithm has been designed by hardware. Also,

a neural network has been designed to simulate seek time of disks. Experimental

results were satisfactory and they showed the proposed method worked better than

related ones in terms of miss ratio and average seeks.

The rest of paper is organized as follows: Sect. 2 shows some related works and

Sect. 3 describes the novel contributions of this article, in Sect. 4, the real-time disk

scheduling problem is described briefly, in Sect. 5, disk modeling based on neural

network is introduced and in Sect. 6, the proposed disk scheduling method based

on Genetic algorithm is described. In Sect. 7 the hardware architecture of proposed

method is introduced. Section 8, describes the evaluation results and simulation way

and in Sect. 9, paper is concluded.

Page 3

Real-Time Syst (2011) 47: 41–7143

2 Related works

In FCFS algorithm (Hofri 1980), the disk controller processes the I/O requests in

the order in which they arrive. This policy aims to minimize response time with

few regard to throughput. Therefore, FCFS decreases the response time while other

traditional disk scheduling algorithms, such as SCAN, C-SCAN, LOOK, C-LOOK,

and SSTF (Hofri 1980; John et al. 1991; Chen et al. 1992; Geist and Daniel 1987;

Denning 1967; Coffman et al. 1972; Worthington et al. 1994) reduce average seeks

and increase its throughput.

Group-Sweeping Scheduling (GSS) (Yu et al., 1992, 1993) is another disk

scheduling strategy in which requests are served in cycles with a round-robin manner.

To reduce disk arm movements, the set of streams is divided into some groups that

are served in a fixed order and streams within a group are served according to SCAN.

The mentioned algorithms do not consider real-time constraints of I/O tasks and

therefore are not suitable to be applied directly on a real-time system (Gemmell and

Christoduoulakis 1992; Hofri 1980). On the other hand, some other disk scheduling

algorithms such as Earliest-deadline-first (EDF) (Sohn and Kim 1997) addresses this

issue without considering disk-seek time. EDF policy is optimal if the tasks are in-

dependent (Reddy et al. 2005; Chang et al. 2007), therefore, it is not proper for real

time disk tasks because in the disk-scheduling problem, the service time of a task de-

pends on the track location of the previous task (Reddy and Wyllie 1994). Therefore,

it seems to have a proper algorithm for real time disk scheduling, it is better to modify

and combine some methods.

Earliest-Deadline-SCAN (D-SCAN) (Abbott and Molina 1990) is a modification

of the traditional SCAN algorithm to consider request deadlines. In D-SCAN, the

track location of the request with earliest deadline is used to determine the scan direc-

tion. In Feasible-Deadline-SCAN (FD-SCAN) algorithm (Abbott and Molina 1990),

the track of the request with the feasible deadline is used to determine the scan direc-

tion.

Asanothercombinedmethod;SCAN-EDFisoneofthewell-knownreal-timedisk

scheduling algorithms (Reddy et al. 2005; Tanenbaum 2001). This method improves

both disk seek-time and miss ratio and utilizes SCAN to reschedule tasks in a real-

time EDF schedule. Since tasks rescheduled in SCAN-EDF should have the similar

deadlines, its efficiency depends on the number of tasks with the similar deadlines.

If all tasks have not near deadlines, the schedule result of SCAN-EDF would be the

same as EDF. In SCAN-EDF algorithm, rescheduling is only possible within a local

group of requests. To overcome this problem, Deadline-Modification-SCAN (DM-

SCAN) (Chang et al. 1998) suggests the use of Scannable-groups. In this algorithm,

request deadlines are reduced several times during the process of rescheduling to pre-

serve EDF schedule. Unlike DM-SCAN, Reschedulable-group-SCAN (RG-SCAN)

(Chang et al. 2001) does not require its input disk requests to be sorted by their dead-

lines. It also forms larger groups without any deadline modification.

In SCAN-EDF, DM-SCAN and RG-SCAN algorithms rescheduling is only pos-

sible within a local group of requests. Chang et al. (2007) suggests Global Seek-

optimizing Real-time (GSR) disk scheduling algorithm that groups the EDF input

tasks based on their scan direction. These tasks are moved to their suitable groups to

Page 4

44 Real-Time Syst (2011) 47: 41–71

improve the system performance in terms of increased disk throughput and decreased

number of requests that miss deadlines. GSR schedules are always feasible if the in-

put real-time disk requests are EDF feasible, but they may be infeasible if the input

schedule is infeasible. Also, this method can be applied on periodic and aperiodic

real time tasks.

There are some genetic disk scheduling methods in the literature; for example,

B.C.H. Turton et al. presented an algorithm that uses the concept of fine-grain par-

allel genetic algorithms (Turton and Arsalan 1995). In the fine-grain implementa-

tion, each chromosome is assigned to one processor and number of processors is

equal to the number of initial population. Therefore, very memory elements are

required and its hardware implementation is too expensive. Also, Turton and Ar-

salan (1995) uses order-based crossover that require more execution time than sin-

gle point crossover and it is not concerned with real time traffic. In another work,

the effect of different order-based crossovers on the performance of an anticipa-

tory disk scheduling algorithm that is tuned by a genetic algorithm has been stud-

ied in Selvi and Rajaram (2007). In these genetic based methods, no solution

has been considered to reduce the number of requests that miss deadlines. How-

ever, one of the main concerns in genetic algorithms is their convergence time

that makes them inappropriate for real-time systems. To solve this problem, im-

plementing genetic methods by hardware may be considered (Chen et al. 2008;

Tachibana et al. 2006, 2007; Lei et al. 2002). Also, in another algorithm presented by

Ökdem and Karaboga (2006), ant colony (ACO) approach has been used to schedule

disk requests. This method works based on the travelling salesman problem (TSP)

idea and aims to reduce the response time. However in this method, no solution has

been considered to reduce the number of requests that miss deadlines.

3 Novel contributions of this article

Our contributions to the advances in the state-of-the-art disk scheduling methods are

summarized as follows:

(1) Disk modeling based on neural network. In disk scheduling algorithms, calcu-

lations of head seek time is not an easy and precise work. Hence, to make easy

the calculation of hard disk responses and according to the lack of a good math-

ematical model for all new hard disks, a modeling based on neural network is

proposed. The proposed algorithm extract the seek time curve of head moving

from a specified track to another one by learning a neural network. This neural

network can be used in disk scheduling to estimate seek time precisely.

(2) Disk scheduling method based on genetic algorithm. In this article, a method

for disk scheduling based on GA is proposed. The proposed method has a new

coding scheme and a penalty function which is utilized in its fitness function.

Because of the proposed simple coding scheme, the GA operators (includes

crossover and mutation) is very simple and fast. In the implementation, the single

point crossover with probability Pchas been used to combine chromosomes and

prepare offspring. Furthermore, the uniform mutation method with probability

Pmhas been used as mutation operator and roulette wheel algorithm has been

considered as selection procedure.

Page 5

Real-Time Syst (2011) 47: 41–7145

(3) Hardware architecture of GA based disk scheduling algorithm. To implement

the proposed disk scheduling algorithm, we employed simple pipeline hardware

architecture proposed in Chen et al. (2008), Tachibana et al. (2006), Lei et al.

(2002). The proposed architecture contains eight basic modules that are Random

Number Generator Module (RNGM), Population Update Module (PUM), Selec-

tion Module (SM), Crossover Module (CM), Mutation Module (MM), Decoder

Module (DM), Seek-time Module (STM) and Fitness Module (FM). To the best

of our knowledge, no other researchers have provided a hardware module using

sequential GA for real-time disk scheduling since now, however, the proposed

method in Turton and Arsalan (1995) provides a hardware architecture of par-

allel GA for non-real time tasks, therefore, designing mentioned modules in an

optimal way is the most novelty of this part.

4 Problem description

Each disk request Tiin a real-time environment is defined by its ready time ri, dead-

line time di, sector number li, data size bi, and its corresponding track location ai.

Ready time is the earliest time at which a disk task can start. Deadline time is the lat-

est time at which disk task should be completed. The actual starting and completing

time of a disk task are called start time siand fulfill (finish) time firespectively.

Assume that the schedule sequence consists of two sequential tasks Tjand Ti. To

serve the disk request Ti, the disk-head moves from previous task cylinder (aj) to

the requested one (ai) by a seek-time cost. Then a rotational latency is used for the

desired sector. Finally, the requested data (bi) are transferred from disk to buffer in a

transfer time. The service time of task Ti(cj,i) is calculated as follows:

cj,i= seek_time(|aj−ai|)+rotational_latency(li)+transfer_time(bi)

Also, the start time and finish time of a real-time task Tiwith schedule sequence

TjTiare computed by si= max{ri,fj} and fi= si+cj,i, respectively.

Inareal-timesystem,ahigherlevelreal-timetaskmaybeissuesmultiplereal-time

I/Odiskrequests.ThedeadlineofeachI/Odiskrequestisassignedbythehigherlevel

task such that its deadline should satisfy following equations:

(1)

for each IO request (i) issues by higher level task:

⎧

⎩

fhigher level task= shigher level task+chigher level task

number of requests issue

by higher level task

⎨

fhigher level task≤ dhigher level task

dIO request i< dhigher level task

sIO request i< shigher level task

fIO request i= shigher level task+chigher level task≤ dIO request i

+

?

i=1

(cIO request i+wIO request i)

(2)

Page 6

46Real-Time Syst (2011) 47: 41–71

Where

fhigherleveltask, dhigherleveltask, stand for start time, service time, finish time, dead-

line, and waiting time for service of I/O request and service time (the time required

to process all I/O disk requests and other processing related to this higher level task

without considering the service time of I/O request), finish time, deadline of higher

level task respectively. In these equations, we assumed the CPU processing of a

higher level task is non-preemptive and a higher level task has no CPU processing

when an I/O disk request is performing. Also, disk scheduler does not service any

I/O request, when the CPU processes the result of an I/O request or other data.

One of the main problems in disk scheduling algorithms is to determine hard

disk seek time. Extracting these parameters has been the subject of intense re-

searches for more than a decade (Ruemmler and Wilkes 1994; Gim et al. 2008;

Worthington et al. 1995; Aboutabl et al. 1997; Schlosser et al. 2005). Despite its

importance in performance evaluation, it is hardly possible to build practically mean-

ingful model due to complexity. The most widely used model for seek time is the one

proposed by Ruemmler and Wilkes (1994). This model suggests that when seek dis-

tance is less than a certain threshold value, seek time is proportional to square-root of

seek distance. When seek distance is greater than the threshold, seek time is linearly

proportional to the seek distance. Equation (3) illustrates this equation that holds

while the distance d denotes cylindrical distance. Coefficients p, q, r and s are real

numbers that are related to hard disks and are different for each one (p,q,r,s ∈ R)

?p +q√s

For example in a HP 97560 hard disk, the seek-time (in millisecond) with move-

ment distance dj,i= |ai−aj| is calculated as follows (Ruemmler and Wilkes 1994):

?

sI/O task,

cI/O task,

fI/O task,

dI/O task,

wI/O task,

chigherleveltask,

seek_time(d) =

if (d ≤ m)

if (d > m)r +sd

(3)

seek_time(dj,i) =

3.24+0.4?dj,i(ms)dj,i≤ 383

dj,i> 383

8.00+0.008∗dj,i(ms)

(4)

The coefficients of (4) are achieved by analyzing HP97560 hard disk in Ruemmler

and Wilkes (1994).

A seek curve graph displays seek time as a function of seek distance. Different

algorithms have been used for extracting seek curves depending on whether the end

goalisaccuratediskmodelingorhigh-performancediskrequestscheduling(Ruemm-

ler and Wilkes 1994; Gim et al. 2008; Worthington et al. 1995; Aboutabl et al. 1997;

Schlosser et al. 2005; Chiueh and Huang 2002; Schindler et al. 2002). Because, it

is hardly possible to build practically meaningful formula due to complexity of seek

curve, a neural network based model has been proposed here (see Sect. 4).

Definition 1 A schedule t : T1T2···Ti···Tnis called feasible if all real-time disk

tasks Ti, forall (i : 1...n), satisfy real-time requirements: ri≤ siand fi≤ di.

In overall, a goal of real-time disk scheduling problem is defined as follows:

Page 7

Real-Time Syst (2011) 47: 41–7147

Consider a set of n real-time disk tasks t = {T1T2···Ti···Tn}. Finding a feasible

schedule t?: Tw(1)Tw(2)···Tw(i)···Tw(n)with maximal throughput (or minimum av-

erage seeks), is the goal of real-time disk schedulers. The index function w(i), for

i = 1 to n, is a permutation of {1,2,...,n}.

5 Disk modeling using a neural network

In disk scheduling procedure, calculations of head seek-time is imprecise and time

consuming in a real experiment. Hence, to facilitate the calculation of hard disk re-

sponses and according to lack of a good mathematical model for all new hard disks,

a modeling based on artificial neural network (ANN) is proposed here. In this model,

a supervised neural network is used that is trained to imitate the behavior of the

hard disk at hand. In the following, the model and its evaluation results have been

described.

5.1 Disk modeling by ANN

As it was mentioned, we modeled hard disks by a fast, reliable, and general mathe-

matical model based on neural networks. For all hard disks, a 3 : h(f1) : 1(fo) neural

network is proposed. The start position, end position, and movement distance of disk

head are the inputs of neural network and the seek time is its output. However, using

start and end positions of head are sufficient as network inputs but to increase the

learning speed of proposed neural network, the third input has been also considered.

Figure 1 shows the topology of the proposed network.

The Matlab neural network toolbox (Demuth and Beale 1993) has been used in

implementing the proposed network. Moreover, to find the best and effective number

of hidden neurons and the type of transfer functions, we tested several networks with

Fig. 1 The network topology with 3 inputs, n hidden neurons in one hidden layer and one output neuron

Page 8

48Real-Time Syst (2011) 47: 41–71

Table 1 Disk parameters of

HPC2200A

Cylinders per disk

Tracks per cylinder

Sectors per track

Sector size

Revolution speed

Transfer rate

1449

19

72

512 bytes

7200 RPM

10 MBps

variable number of hidden neurons and transfer functions on a hard disk which its

properties has been shown in Table 1.

Indeed, we provided a data set contained 60% of all possible input vectors that

were randomly generated (each example includes a random start position, a random

stop position and their movement distance), applied them on the mentioned hard disk

and provided their seek times (the time that the hard disk needs to move its head from

the start position to the stop position). On the other hand, we first obtain mappings of

each Logical Block Numbers (LBN) to its physical location, given by the (cylinder,

head, sector) tuple and then to measure seek time, we choose a pair of cylinders

separated by the desired distance, issue a pair of read commands to sectors in those

cylindersandmeasurethetimebetweentheircompletions.Moredetailsforextracting

seek curve from hard disks may be found in Schlosser et al. (2005). These data was

fed to networks for training. To find the best topology for the network, we considered

several networks with following structures:

net = {3 : h(f1) : 1(fo)|h ∈ {1,11,21,...,91}∧f1,fo∈ {LogSig,TanSig,PureLin}}

(5)

This equation shows a set of networks with different number of neurons in hidden

layer and different transfer functions for hidden and output layers. In this equation,

h is the number of neurons and f1is the used transfer function in hidden layer and

fois the used transfer function in output layer where

PureLin(n) = n

LogSig(n) =

TanSig(n) =en−e−n

(6)

1

1+e−n

(7)

en+e−n

(8)

It is worth mentioning that the networks were trained using the L-M (Levenberg-

Marquardt) (Hagan et al. 1996) method and considered 20% of the data for test,

20% for validation and 60% for training. Here, we have to note that the validation

vectors were used to stop training early if the network performance on the validation

vectors fails to improve or remains the same for predefined number of epochs (100

in our tests). Test vectors are used as a further check that the network is generalizing

well, but do not have any effect on training. The training phase was terminated when

at least one of the following conditions emerged:

• Mu > 1e10

Page 9

Real-Time Syst (2011) 47: 41–7149

Fig. 2 The errors (9) of the set

of networks in (5) versus

different transfer function. The

legend of the figure shows the

used transfer functions in hidden

and output layers

• 1000 epochs

• 5 successive iterations the validation data failures

• Gradient < 1e−10

The value of Mu has been elaborated in Hagan et al. (1996). We trained each

network 10 times independently and an error was calculated after training phase. The

error was calculated as follows: We generated all possible head seeks and provided

their corresponding output (seek times of the hard disk). After the training phase was

completed in each run, we applied these data and calculated the error of the network

using the following formula:

?n

The expression E shows the error and n is the number of all possible head seeks.

TestcaseIshows the ith test example contained three filed: head start position, head

stop position and the distance. net (testcasei) calculates the output of the network

with the input Testcasei. Furthermore, O is the real (expected) output value for

Testcasei. Figure 2 exhibits the errors (9) for the set of networks in (5).

The x-axis of the graph shows number of neurons in the hidden layer of the net-

work while the y-axis shows the mean of error over 60% randomly generated test

cases for the hard disk in Table 1. Each curve in the figure exhibits the error of a

network which its transfer functions have been denoted in the legend of the figure. It

is obvious that the network approximated the working function of the hard disk with

mean of error about 0.005 (ms) when the transfer function LogSig and PureLin for

hidden and output layers was used respectively. Moreover, the best performance has

appeared when the number of neurons in the hidden layer is 61 (see Fig. 2). It is worth

mentioning that the error for networks which the PureLin was used for their transfer

function in their hidden layer was not good enough and not included in the figure.

Also, the networks which exploited the LogSig function in their output layer did not

work well and their results have not been presented in the figure. Also, Fig. 3 shows

E =

i=1|net(testcasei)−O(testcasei)|

n

(9)

Page 10

50Real-Time Syst (2011) 47: 41–71

Fig. 3 The mean of errors for

the network with LogSig and

PureLin functions in its hidden

and output layers respectively

and variable number of hidden

neurons. Also, the standard

deviation has been shown

the standard deviation (STD) for each case when the transfer function was LogSig and

PureLin for hidden and output layers respectively.

As it seems from the figure, in addition that the average of error for a network with

61 neurons in its hidden layer is very small, the STD of this network is satisfactory

(about 0.001). Therefore, we used 3 : 61(LogSig) : 1(PureLin) network structure to

model disks seeks time.

To evaluate the performance of proposed network in calculating seek-time, it was

trained with two other disks (HP 97560 and WD AC21000) also. For this purpose, a

training data set contained 60% of all possible input vectors were randomly generated

and their corresponding seek times were extracted from each hard disk (Schlosser et

al. 2005). These data sets were used for training the networks. After training phase,

another data set contained all possible head seeks were generated randomly and fed

to the trained networks to calculate their errors.

Figures 4(a), (c) and (e) show the expected seek-time functions and Figs. 4(b), (d)

and (f) show error values (between the real values of seek times for the mentioned

hard disks and approximated values by ANN) for all possible head seeks for the

mentioned hard disks. In Figs. 4(b), (d) and (f), the z-axis show the error values

between the approximated seek-time using proposed ANN and the working functions

of three mentioned hard disks (HPC2200A, HP 97560, and WD AC21000). The x and

y axis show the start and end positions, respectively. It is obvious that the error value

(which is in ms) is very small in proposed model in comparison with real simulation

results. The error value for the hard disk HPC2200A was 0.009 (ms) and this was

same for the hard disk HP 97560 in average. The error for the hard disk WD AC21000

was 0.011 (ms) in average.

As it was mentioned last, it is worth mentioning that the structure of neural net-

work consists of neurons number has been determined based on described test data

and a specified hard disk type (HPC2200A) and this structure assumed as a public

one, because the seek-time curves of most widely used hard-disks such as: Vendor,

Seagate, Hitachi, Samsung, HP and etc. are similar (Gim et al. 2008) (as shown in

Figs. 4(a), (c) and (e) for three different hard-disks) and only their timing parame-

ters (p, q, r and s in (3)) are different. This assumption has been verified by results,

Page 11

Real-Time Syst (2011) 47: 41–7151

Fig. 4 (a), (c) and (e): The expected seek times (left curves) for three different hard disks HPC2200A,

HP 97560 and WD AC21000, respectively. The x and y axis show start and end track while z axis shows

seek time. (b), (d) and (f): The error values between approximated seek times by proposed ANN and

working functions of three different hard disks HPC2200A, HP 97560 and WD AC21000, respectively.

The x and y axis show the start and end track while z axis show the error value

because the extracted structure (number of neurons in hidden layer and transform

functions) was used for two different hard disks (HP 97560 and WD AC21000) and

the results were as well as the first one. It is obvious that for each separate disk, the

ANN should be learned independently (we did it for three different hard disks and

results are reported in follow).

Page 12

52 Real-Time Syst (2011) 47: 41–71

6 Disk scheduling method based on genetic algorithm

Genetic algorithms (GAs) are commonly used as adaptive approaches that provide

a randomized and global search method based on the mechanics of natural selection

and genetics in order to find solutions. GAs are different from the traditional opti-

mization and searching procedures in four cases: (1) GAs work with a coded parame-

ter set, not the parameters themselves, (2) GAs search from randomly selected points,

not from a single point, (3) GAs use objective function information, and (4) GAs use

probabilistic transition rules, not deterministic ones.

In this section, a method for disk scheduling based on GA is proposed. In the

following sub-section, at first a new coding scheme is proposed and evaluated. Then,

population initialization way has been introduced. Next, a penalty function which is

utilized in the fitness function is introduced, afterward; genetic operators have been

described.

6.1 Coding scheme

The proposed coding scheme consists of a set of integer numbers in the interval

[0,Nt − 1] where jth genome in the chromosome is a random integer number in

the interval [0,Nt − j − 1] and Nt is number of tasks and j is index of chromosome

array, therefore, the length of each individual is equal to the number of tasks. Figure 5

shows the proposed coding scheme.

As an example, Table 2 shows a disk scheduling problem which contains 5 tasks.

Figure 6 shows a sample chromosome which is used to code a five task scheduling

problem such as problem in Table 2. As it is shown in Fig. 6, the cell values are in

the interval [0...4].

Fig. 5 Proposed coding scheme

Table 2 A sample disk scheduling problem, each task requests data on specified cylinder and transfer

time of requested data is specified in last column

Task Deadline

(Milli-sec.)

Cylinder numberData transfer time

(Milli-sec.)

T0

T1

T2

T3

T4

12

6

12

5

18

5

3

6

1

3

3

3

3

3

3

Fig. 6 An example of proposed chromosome structure which codes a 5 task problem such as problem in

Table 2

Page 13

Real-Time Syst (2011) 47: 41–7153

In fact, each chromosome shows a sequence of numbers between 0 and Nt −1

and each genome points to the index of an array which includes unscheduled tasks.

Algorithm 1 shows the pseudo code for the decoding procedure.

Algorithm 1 Decodes the input chromosome (C) and prepares the corresponding

order of task indexes (T )

Input: Chromosome C

Output: Order T

T: empty sequence (?Null?)

Q: {0,1,2,3,...,Nt −1}

For i : 0 to length(C)

S = C[i];

Add Q[S] to end of array T;

Eliminate Sth element of Q;

End for

Return T;

In this algorithm, C is an input chromosome which should be decoded to show

tasks order. Table 3 exhibits the results of applying the Algorithm 1 on the chromo-

some in Fig. 6. As it was mentioned, five tasks are considered for this example. Note

that the index for all arrays is considered from 0.

According to the table, the value of T after applying Algorithm 1 on the chromo-

some in Fig. 7 is ?3,0,2,4,1? which means the tasks should be executed in this order

(left to right).

Table 4 shows the execution details of tasks in Table 2 by using the result sequence

from Table 3. For simplicity, we assume the starting head position is in cylinder 0 and

Table 3 Results of applying the Algorithm 1 on the chromosome in Fig. 7

iQC[i]

Length(Q)SQ[s]

T

0

1

2

3

4

{0,1,2,3,4}

{0,1,2,4}

{1,2,4}

{1,4}

{1}

3

4

4

1

2

5

4

3

2

1

3

0

1

1

0

3

0

2

4

1

?3?

?3,0?

?3,0,2?

?3,0,2,4?

?3,0,2,4,1?

Fig. 7 Performing single point crossover on two parents to produce two off springs

Page 14

54Real-Time Syst (2011) 47: 41–71

Table 4 The tasks parameters when chromosome in Fig. 7 is used for scheduling of tasks in Table 2

Task Current head

position

Current timeSeek timeData transfer

time

Completion

time

Missed

T3

T0

T2

T4

T1

0

1

5

5

3

0

4

11

11

16

|1−0| = 1

|5−1| = 4

|6−5| = 1

|3−5| = 2

|3−3| = 0

3

3

3

3

3

0+1+3 = 4

4+4+3 = 11

11+1+3 = 15

11+2+3 = 16

16+0+3 = 19

No

No

Yes

No

Yes

the seek-time function is defined by (10).

seek_time(|ai−aj|) = |ai−aj|

(10)

As it seems from Table 4, if the tasks in Table 2 are scheduled according to the

chromosome in Fig. 7, 2 tasks are missed and the procedure ends in the time 16. It is

worth mentioning that if a task is missed, it is not considered in next calculations for

current head position and current time.

6.2 Population initialization

In the proposed method, the population is initialized via a random process. Algo-

rithm 2 shows the initialization procedure.

Algorithm 2 Initialize population (Pop)

Input: Number of chromosomes (NOC), Number of tasks (Nt)

Output: Pop (Initialized population)

Consider a two dimensional array Pop that has NOC columns and Nt row

For i : 1 to NOC

For j : 0 to Nt −1

Pop[i][j] = rand(Nt −j);

Return Pop;

This algorithm gets the number of chromosomes and number of tasks as its inputs

and initializes the population randomly. Indeed, the population (Pop) is an array that

its ith row shows the ith chromosome and the jth column of the ith row shows the

jth genome in ith chromosome. Also, the rand(x) is a function that returns a random

integer number in the interval [0,x −1].

6.3 Fitness function

In this sub-section the proposed fitness function is presented. First, term “Makespan”

is defined and then the fitness evaluation procedure is introduced.

Makespan is the time that the last task in the execution order completes its work.

As an instance, the makespan for the order in Table 4 is 16 because the last task (T4)

completes its work in time 16 and the average seeks is1+4+2

3

= 2.33. Note that the

Page 15

Real-Time Syst (2011) 47: 41–7155

value of makespan (16 in the mentioned example) is smaller than or equal to worst

deadline among all tasks (18 in the mentioned example).

In disk scheduling problem, the goal is to find an order for executing some tasks

suchthatthemakespanandnumberoftasksthatmisstheirdeadlinestobeminimized.

In this way, a fitness function has been designed which try to find the optimal/near-

optimal makespan with regards to number of tasks that miss their deadlines. In this

function, with increasing the number of tasks that miss deadlines, the penalty value

is increased such that the value of fitness becomes different for the chromosomes

with the same number of tasks that miss deadlines and different values for makespan.

Equation (11) shows the formula for the proposed fitness function

Fitness(C) = makespan(C)+Miss×D_max

?

???

penaltyvalue

(11)

where Miss is number of misses occurs in the sequence C and Dmaxis the maxi-

mum value for deadline among all tasks in the problem at hand. As (11) shows, the

fitness function is minimized when the value of Miss is zero and the makespan has

its minimum value. Indeed, the penalty value is big enough to distinguish a feasible

order (an order with no miss) and an order with just one miss, because the value of

makespan is always smaller than or equal to Dmax. This causes that a schedule with or

without tasks that miss deadlines is distinguished. By using this function, when tasks

that miss deadlines are minimized, number of completed tasks is maximized. On the

other hand, by minimizing the makespan, the throughput increases and the average

seek decreases. Therefore, the proposed fitness function models increasing through-

put, decreasing miss ratio and decreasing makespan, simultaneously. It is worthwhile

to note that, according to the fitness function, a chromosome with smaller number

of tasks that miss deadlines (even with big makespan) is preferred to a chromosome

which presents a sequence with greater number of tasks that miss deadlines (even

with small makespan). To further visualization, the fitness value for the chromosome

in Fig. 6 for the tasks in Table 2 is 16+2∗18 = 52.

Finally, the algorithm will not schedule a request at all if it cannot meet its dead-

line. In the other hand, the algorithm effectively performs scheduling and admission

control.

6.4 Genetic operators

In GA algorithms, the new chromosome, called off springs, are formed by (1) merg-

ing two chromosomes from the current population together using a crossover opera-

tor or (2) modifying a chromosome using mutation operator. Because of the proposed

simple coding scheme, the GA operators (includes crossover and mutation) is very

simple and fast. In the implementation, the single point crossover (Holland 1975)

with probability Pchas been used to combine chromosomes and prepare offspring.

Furthermore, the uniform mutation method (Holland 1975) with probability Pmhas

been used as mutation operator.

Crossover, the main genetic operator, generate valid offspring by combining fea-

tures of two parent chromosomes. As shown in Fig. 7, the simple crossover operator

(singlepointcrossover)selectsacrossoverpoint,whichisrandomlyselectedbetween