ArticlePDF Available

An Operating Systems Vade Mecum

Authors:

Abstract

ion. Operating systems, like other software components, construct higher-level (virtual) resources out of lower-level (physical) ones. Details of the lower-level structures are hidden, and higher-level structures are introduced. From the point of view of a process, the physical machine is enhanced by the operating system into a virtual machine. Enhancement includes both simplicity (hiding details) and function (introducing new structures). Neither time (ability to execute) nor space (main store) appears to be shared with other processes. The virtual machine is thus simpler than the physical machine. The process interface provides extra instructions that improve on the basic hardware instruction set, particularly with respect to transput. The virtual machine is thus more functional than the physical machine. From the point of view of the user, the operating system provides services that are not present in the underlying machine. These services include loading and running programs, prov...
An Operating
Systems
Vade Mecum
Raphael A. Finkel
University of Wisconsin at Madison
Prentice Hall
Englewood Cliffs, New Jersey 07632
PREFACE
Traditionally, a vade mecum (pronounced ‘‘VAHdee MAYkem’’) is a laboratory manual
that guides the student step by step through complex procedures. Operating systems are
complex mixtures of policy and mechanism, of algorithm and heuristic, and of theoretical
goals and practical experience. This vade mecum tries to unify these diverse points of
view and guide the novice step by step through the complexities of the subject. As a text,
this book is intended for a first course in operating systems at the undergraduate level.
The subject has so many individual parts that its practitioners and teachers often concen-
trate on subareas and ignore the larger concepts that govern the entire subject. I have
tried to rectify that situation in this book by structuring the presentation about the dual
ideas of resource management and beautification.
To unify disparate threads of discussion, I have taken the liberty introducing names
for recurrent themes and glorifying them with the title ‘‘principles.’’ I hope that this
organization and nomenclature will help the reader to understand the subject better and to
integrate new ideas into the same framework.
Each technical term that is introduced in the text is printed in boldface the first
time it appears. All boldface entries are collected and defined in the glossary. I have
striven to use a consistent nomenclature throughout the book. At times this nomenclature
is at odds with accepted American practice. For example, I prefer to call computer
memory ‘‘store.’’ This name allows me to unify the various levels of the storage hierar-
chy, including registers, main store, swap space, files on disk, and files on magnetic tape.
I also prefer the single word ‘‘transput’’ to the clumsier but more common term
‘‘input/output.’’ Although I found this term jarring at first, I have grown to like it.
Each chapter closes with a section on perspective, suggestions for further reading,
and exercises. The perspective section steps back from the details of the subject and
summarizes the choices that actual operating systems have made and rules of thumb for
distinguishing alternatives. It also fits the subject of the chapter into the larger picture.
The suggestions for further reading have two purposes. First, they show the reader where
more information on the subject discussed in the chapter may be found. Second, they
point to research articles related to the subject that describe actual implementations and
xi
areas of current activity. The exercises also serve two purposes. First, they allow the
student to test his or her understanding of the subject presented in the text by working
exercises directly related to the material. More importantly, they push the student
beyond the confines of the material presented to consider new situations and to evaluate
new policies. Subjects that are only hinted at in the text are developed more thoroughly
in this latter type of exercise.
A course in operating systems is not complete without computer projects. Unfor-
tunately, such exercises require a substantial investment in software. The most success-
ful projects for a first course in operating systems involve implementing parts of an
operating system. A complete operating system can be presented to the class, with well-
defined modules and interfaces, and the class can be assigned the task of replacing
modules with ones of their own design. A less ambitious project has the students first
build a simple scheduler for a simulated machine. After it is completed, it can be
enhanced by adding virtual memory, transput, and other features. If the necessary
software is not available for these assignments, students can be asked to simulate particu-
lar policies or algorithms in isolation from a complete operating system. Several exer-
cises in the book give guidelines for this sort of project.
This second edition of the text differs from the first in several ways. Many figures
have been added, both to clarify individual points and to unify the treatment of different
subjects with similar diagrams. For example, the history of operating system styles is
now adorned with consistent pictures. The nuts and bolts of process switching has been
moved from Chapter 2 to Chapter 1, and a new section on virtual-machine operating sys-
tems has been added. The discussion of page-replacement policies in Chapter 2 has been
enhanced with fault-rate graphs drawn from a simulation. Analysis and simulation are
described near the end of Chapter 2. Chapter 9 on co-operating processes has been
enlarged with a major new section on the communication-kernel approach. The Hys-
teresis Principle has been introduced. Minor errors and inconsistencies have been fixed
throughout the text.
I owe a debt of gratitude to the many people who helped me write this text. The
students of Bart Miller’s operating system class all wrote book reviews for an early draft.
My parents Asher and Miriam and my brothers Joel and Barry devoted countless hours to
a careful reading of the entire text, suggesting clarifications and rewording throughout.
My colleagues Bart Miller, Mary Vernon, and Michael Carey read and criticized indivi-
dual chapters. Michael Scott’s careful reading of Chapter 8 and his keen insight into
language issues in general were of great help. I am also grateful to Charles Shub, Felix
Wu, Aaron Gordon, Mike Litskow, Ianne H. Koritzinsky, Shlomo Weiss, Bryan Rosen-
burg, and Hari Madduri for their helpful comments. This book was prepared using the
Troff program on a Unix operating system. I would have been lost without it. Finally, I
would like to thank my wife, Beth Goldstein, for her support and patience, and my
daughter, Penina, for being wonderful.
Raphael A. Finkel
University of WisconsinMadison
12 Preface
chapter 1
INTRODUCTION
The development of operating systems has progressed enormously in the last few
decades. Open shop computing has given way to batch processing, which in turn has
yielded to interactive multiprogramming as emphasis shifted first to efficient use of
expensive machines and then to effective shared use. The recent trend to personal
workstations has turned full circle back to the earliest beginnings: a single user associated
with a single machine.
The long and complex journey has not been pointless. As computer use has
expanded, so has the sophistication with which we deal with all of its aspects. Ad hoc
mechanisms have become standardized, then formalized, then analyzed. Standing at the
present and looking back, we have the advantage of seeing the larger picture and of
knowing how originally unconnected ideas fit together and reinforce each other. We
realize that the proper software for a personal workstation has more in common with
large operating systems for interactive multiprogramming than it has with open shop
computing.
The future holds the promise of increased use of networking to interconnect com-
puters in various ways. Multiple-user machines will soon be built of communities of
communicating computers. This new theme has already begun to be played. It is an
exciting time to be studying operating systems.
1 PHILOSOPHY OF THE BOOK
In this book, I have carefully tried to distinguish mechanisms from policies. Mechan-
isms are techniques that perform activities. Policies are rules that decide which activities
to perform. The mechanisms are the ‘‘nuts and bolts’’ of operating systems and often
depend to some extent on the hardware on which the operating system runs.
1
We will give examples of nuts-and-bolts programming in a high-level language.
Most modern operating systems are written in such languages, reserving assembler
language for those few aspects that cannot be captured in a higher level language. For
the sake of concreteness, we will present programs in a Modula-like syntax. Modula is
closely related to Pascal. (See the Further Reading section at the end of the chapter.)
The examples are all carefully annotated, so it should be possible to follow them even if
you have only a nodding acquaintance with any language in the Algol family, such as
Pascal. In most cases, written algorithms are supplemented with pictures.
You should be able to read this book comfortably if you have had an undergradu-
ate course in data structures and in machine organization. Some sophistication in
mathematics (for example, one semester of calculus) would also be helpful but is not
required.
A glossary at the end of the book provides in one place simple definitions of words
that are used repeatedly. When these words appear for the first time in the text, they are
set in boldface type. I have taken some pains to use words consistently in order to make
this book clear. Unfortunately, computer literature often employs the same term to mean
different things in different contexts. (An excellent example is ‘‘heap,’’ which means
‘‘free storage pool’’ in the Pascal context but ‘‘data structure for implementing a priority
queue’’ in the operating systems context.) You will discover that I avoid certain words
altogether. In particular, the word ‘‘system’’ has lost all its power because it has
acquired a multitude of meanings, many of which are fuzzy at best. I treat ‘‘operating
system’’ as a compound word with a well defined connotation that should become
increasingly clear as you study this book. The terms ‘‘control program’’ or ‘‘monitor’’
would have been preferable, but those terms have already become associated with other
specialized meanings.
This book sorts through a large body of knowledge, covering a continuum from
elementary concepts through advanced ideas. Intended primarily as a text for an under-
graduate course in operating systems, it also introduces topics of interest to graduate-
level students and professionals. In addition, the book presents a continuum from terrible
ideas through workable ideas to great ideas. Often the first way one tries to solve a prob-
lem is easy to understand and easy to implement but works poorly. With some clever-
ness, one finds a better method that is far superior to the first. Then a host of other solu-
tions, mostly variations on the second method, come to mind. These refinements are
often worth mentioning but usually not worth implementing. We call this observation the
Law of Diminishing Returns.
The Latin phrase vade mecum means ‘‘walk with me.’’ Books with this title are
usually laboratory manuals explaining techniques step by step. This vade mecum is dif-
ferent in that it explains the problems and explores various solutions before giving
advice. It can be useful both in the classroom and in the home or office library.
Although each chapter is fairly self-contained, some subjects cannot be easily separated
from others and must be explained, at least in part, before all the related issues have been
covered. If you are a novice, your journey should start at the beginning of the book. If
you are an expert looking for a particular subject, try the glossary first, then the index. If
you are a teacher using this book as a text, feel free to choose which algorithms of the
‘‘sophisticated variation’’ type to cover.
Let’s begin the journey: Vade mecum!
2 Introduction Chapter 1
2 THE RESOURCE PRINCIPLE
In this book you will find both specific and general information about the structure and
behavior of operating systems. We will start by presenting two very different
‘‘definitions’’ of an operating system. These definitions will serve to introduce the major
concepts that are elucidated in the rest of the book. Our first definition is called the
Resource Principle. The second is the Beautification Principle. We will also intro-
duce the Level Principle, an important structuring concept for operating systems. Here
is the Resource Principle:
Resource Principle
An operating system is a set of algorithms
that allocate resources to processes.
A resource is a commodity necessary to get work done. The computer’s hardware
provides a number of fundamental resources. Working programs need to reside some-
where in main store (the computer’s memory), must execute instructions, and need some
way to accept data and present results. These needs are related to the fundamental
resources of space, time, and transput (input/output). In addition to these fundamental
resources, the operating system introduces new resources. For example, files are able to
store data. Programs might be able to communicate with each other by means of ports
that connect them. Even higher-level resources can be built on top of these, such as
mailboxes used to pass messages between users.
The notion of a process is central to operating systems, but is notoriously hard to
define. To a first approximation, a process is the execution of a program. It is a funda-
mental entity that requires resources in order to accomplish its task, which is to run the
program to completion.
One can picture processes as actors simultaneously performing one-actor plays in a
theater; stage props are their resources. As actors need props, they request them from the
property manager (the operating system resource allocator). The manager’s job is to
satisfy two conflicting goals on behalf of the actors:
to let each actor have whatever props are needed
to be fair in giving props to each actor.
The manager also is responsible to the owners of the theater (that is, the owners of the
computer), who have invested a considerable sum in its resources. This responsibility
also has several goals:
to make sure the props (resources) are used as much as possible
to finish as many plays (processes) as possible.
The Resource Principle 3
Another way to look at processes is to consider them as agents representing the interests
of users. When a user wants to compose a letter, a process runs the program that con-
verts keystrokes into changes in the document. When the user wants to mail that letter
electronically, a process (perhaps a new one) runs a different program that knows how to
send documents to mailboxes. In general, processes run programs that help the user. In
order to do their work, they may in turn need help from the operating system for such
operations as receiving keystrokes and placing data in long-term storage. They require
resources such as space in main store and machine cycles. The Resource Principle says
that the operating system is in charge of allocating such resources.
3 HISTORICAL DEVELOPMENT
3.1 Open shop
Operating systems developed as technology and demand increased. The earliest comput-
ers were massive, extremely expensive, and difficult to use. Individuals (called users)
would sign up for blocks of time during which they were allowed ‘‘hands-on’’ use of the
computer. This situation is depicted in Figure 1.1, which shows a single program, sub-
mitted by the user through a device like a card reader, executing on the machine. The
machine had two major components, its transput devices and its ability to execute a pro-
gram. A typical session on the IBM 1620, a computer in use around 1964, involved
several steps in order to compile and execute a program. First, the user would load the
first pass of the Fortran compiler. This operation involved clearing main store by typing
a cryptic instruction on the console typewriter; putting the compiler, a 10-inch stack of
punched cards, in the card reader; placing the program to be compiled after the compiler
in the card reader; and then pressing the ‘‘load’’ button on the reader. The output would
be a set of punched cards called ‘‘intermediate output.’’ If there were any compilation
errors, a light would flash on the console, and error messages would appear on the con-
sole typewriter. Assuming everything had gone well so far, the next step would be to
load the second pass of the Fortran compiler just like the first pass, putting the intermedi-
ate output in the card reader as well. If the second pass succeeded, the output was a
second set of punched cards called the ‘‘executable deck.’’ The third step was to shuffle
the executable deck slightly, load it along with a massive subroutine library (another 10
inches of cards), and observe the program as it ran.
The output of a program might appear on cards or paper. Frequently, the output
was wrong. To figure out why involved debugging, which often took the form of peek-
ing directly into main store and even patching the program by using console switches. If
there was not enough time to finish, a frustrated user might get a line-printer listing of
main store (known as a dump of store) to puzzle over at leisure. If the user finished
before the end of the allotted time, the machine might sit idle until the next reserved
4 Introduction Chapter 1
user
printercard readertypewriter
executing job
Figure 1.1 Open shop
block of time.
3.2 Operator-driven shop
The economics of computers made such idle time very expensive. In an effort to avoid
such idleness, installation managers instituted several modifications to the open shop
mechanism just outlined. An operator was hired to perform the repetitive tasks of load-
ing jobs, starting the computer, and collecting the output. This situation is shown in Fig-
ure 1.2. The operator was often much faster than ordinary users at chores such as mount-
ing cards and magnetic tapes, so the setup time between job steps was reduced. If the
program failed, the operator could have the computer produce a dump. It was no longer
feasible for users to inspect main store or patch programs directly. Instead, users would
submit their runs, and the operator would run them as soon as possible. Each user was
charged only for the amount of time the job required.
The operator often reduced setup time by batching similar job steps. For example,
the operator could run the first pass of the Fortran compiler for several jobs, save all the
intermediate output, then load the second pass and run it on all the intermediate output
that had been collected. In addition, the operator could run jobs out of order, perhaps
charging more for giving some jobs priority. Jobs that were known to require a long time
could be delayed until night. The operator could always stop a job that was taking too
long.
Historical development 5
executing job
typewriter card reader printer
operator
users
in out
Figure 1.2 Operator-driven shop
3.3 Offline transput
Much of the operator’s job was mechanical. The next stage of development was to auto-
mate that job, as shown in Figure 1.3. First, input to jobs was collected offline, that is, by
using a separate computer (sometimes called a ‘‘satellite’’) whose only task was the
transfer from cards to tape. Once the tape was full, the operator mounted it on the main
computer. Reading jobs from tape is much faster than reading cards, so less time was
occupied with transput. When the computer finished the jobs on one tape, the operator
would mount the next one. Similarly, output was generated onto tape, an activity that is
much faster than punching cards. This output tape was converted to line-printer listing
offline.
A small resident monitor program, which remained in main store while jobs were
executing, reset the machine after each job was completed and loaded the next one. Con-
ventions were established for cards (or ‘‘card images,’’ as they are called once they are
on tape) to separate jobs and specify their requirements. These conventions were the
rudiments of command languages. For example, one convention was to place an aster-
isk in the first column of control cards, to distinguish them from data cards. The compi-
lation job we just described could be specified in cards that looked like this:
6 Introduction Chapter 1
device control
executing job
resident monitor
tape
users
operator
printercard readertypewriter
tapeprintercard reader
main computer
offline computer
Figure 1.3 Offline transput
*JOB SMITH The user’s name is Smith.
* PASS CHESTNUT Password to prevent others from using Smith’s account
* OPTION TIME=60 Limit of 60 seconds
* OPTION DUMP=YES Produce a dump if any step fails.
*STEP FORT1 Run the first pass of the Fortran compiler.
* OUTPUT TAPE1 Put the intermediate code on tape 1.
* INPUT FOLLOWS Input to the compiler comes on the next cards.
... Fortran program
*STEP FORT2 Run the second pass of the Fortran compiler.
* OUTPUT TAPE2 Put the executable deck on scratch tape 2.
* INPUT TAPE1 Input comes from scratch tape 1.
*STEP LINK Link the executable with the Fortran library.
* INPUT TAPE2 First input is the executable.
* INPUT TAPELIB Second input is a tape with the library.
* OUTPUT TAPE1 Put load image on scratch tape 1.
*STEP TAPE1 Run whatever is on scratch tape 1.
* OUTPUT TAPEOUT Put output on the standard output tape.
* INPUT FOLLOWS Input to the program comes on the next cards.
... Data
The resident monitor had several duties:
to interpret the command language
to perform rudimentary accounting
To provide device-independent input and output by substituting tapes for cards.
Historical development 7
The running program could deal with tapes directly, but as a convenience, the resident
monitor provided a few subroutines for reading from the current input tape and writing to
the current output tape.
3.4 Spooling systems
Next, transput units were designed to run at the same time the computer was computing.
They generated an interrupt when they finished reading or writing a record instead of
requiring the computer to monitor their progress. An interrupt causes the computer to
save some critical information (such as its current program counter) and to branch to a
location specific for the kind of interrupt. Device-service routines, known as a device
drivers, were added to the resident monitor to deal with these interrupts.
Disks were introduced as a secondary storage medium. Now the computer could
be computing one job while reading another onto the disk and printing the results of a
third from the disk. Unlike a tape, the disk allowed programs to be stored anywhere, so
there was no need for the computer to execute jobs in the same order in which they were
entered. A primitive scheduler was added to the resident monitor to sort jobs based on
priority and amount of time needed, both of which could be specified on control cards.
The operator was often retained to perform several tasks.
to mount data tapes needed by jobs (also specified on control cards)
to make policy decisions, such as which priority jobs to run and which to hold
to restart the resident monitor when it failed or was inadvertently destroyed by the
running job.
This mode of running a computer was known as a spooling system, and its resident mon-
itor was the start of modern operating systems. (The word ‘‘spool’’ originally stood for
‘‘simultaneous peripheral operations on line,’’ but it is easier to picture a spool of thread,
where new jobs are wound on the outside, and old ones are extracted from the inside.)
One of the first spooling systems was HASP (the Houston Automatic Spooling Program),
an add-on to OS/360 for the IBM 360 computer family. A spooling system is shown in
Figure 1.4.
Spooling systems prevent users from fiddling with console switches to debug and
patch their programs. The era of spooling systems introduced the long-lived tradition of
the users’ room. The users’ room was characterized by long tables, often overflowing
with oversized fan-fold paper, and a quietly desperate group of users, each politely ignor-
ing the others, drinking copious amounts of coffee, buying junk food from the vending
machines, and staring bleary-eyed at the paper.
3.5 Batch multiprogramming
8 Introduction Chapter 1
resident monitor and scheduler
card reader
device controlinterrupts
disks printers tapes
executing job
Figure 1.4 Spooling system
Spooling systems did not make efficient use of all of their resources. The job that was
currently running might not need the entire main store. A job that performed transput
would idle the computer until the transput was finished. The next improvement was the
introduction of multiprogramming, a scheme in which more than one job is active
simultaneously. We show this situation in Figure 1.5. While one job is waiting for a
transput operation to complete, another can compute. With luck, no time at all is wasted
device interface
process interface
cardsclock
processes
responses
tapesprintersdisks
interrupts device control
kernel
service calls
Figure 1.5 Batch multiprogramming
Historical development 9
waiting for transput. The more jobs run at the same time, the better. However, a
compute-bound job (one that performs little transput but much computation) could
easily prevent transput-bound jobs (those that perform mostly transput) from making
progress. Competition for the time resource and policies for allocating it are the main
theme of Chapter 2.
Multiprogramming also introduces competition for space. The number of jobs that
can be accommodated at one time depends on the size of main store and the hardware
available for dividing up that space. In addition, jobs must be secured against inadvertent
or malicious interference or inspection by other jobs. It is more critical now that the
resident monitor not be destroyed by errant programs, because not one but many jobs will
suffer if it breaks. In Chapter 3, we will examine policies for space allocation and how
each of them provides security.
The form of multiprogramming we have been describing is often called batch mul-
tiprogramming because each job batches together a set of steps. The first might be com-
pilations, the next a linking step to combine all the compiled pieces into one program,
and finally the program would be run. Each of these steps requires the resident monitor
to obtain a program (perhaps from disk) and run it. The steps are fairly independent of
each other. If one compilation succeeds but another fails, it is not necessary to recompile
the successful program. The user can resubmit the job with one of the compilations omit-
ted.
Since job steps are independent, the resident monitor can separate them and apply
policy decisions to each step independently. Each step might have its own time, space,
and transput requirements. In fact, two separate steps of the same job can sometimes be
performed at the same time! The term process was introduced to mean the entity that
performs a single job step. The scheduler creates a new process for each job step. The
process will terminate when its step is finished. The operating system (as the resident
monitor may now be called) keeps track of each process and its needs. A process may
request assistance from the kernel by submitting a service call across the process inter-
face. Executing programs are no longer allowed to control devices directly; otherwise,
they could make conflicting use of devices and prevent the kernel from doing its job.
Instead, processes must use service calls to access devices, and the kernel has complete
control of the device interface.
Granting resources to processes is not a trivial task. A process might require
resources (like tape drives) at various stages in its execution. If a resource is not avail-
able, the scheduler might block the process from continuing until later. The scheduler
must take care not to block any process forever. Chapter 4 deals with the issues raised by
allocation of resources like tape drives that can be reused after one process finishes but
should not be taken from a process while it is running.
Along with batch multiprogramming came new ideas for structuring the operating
system. The kernel of the operating system is composed of routines that manage central
store, time, and devices, and other resources. It responds both to requests from processes
and to interrupts from devices. In fact, the kernel runs only when it is invoked either
from above, by a process, or below, by a device. If no process is ready to run and no
device needs attention, the computer sits idle.
Various activities within the kernel share data, but they must be not be interrupted
when the data are in an inconsistent state. Mechanisms for concurrency control were
developed to ensure that these activities do not interfere with each other. Chapter 8 intro-
duces the mutual-exclusion and synchronization problems associated with concurrency
10 Introduction Chapter 1
control and surveys the solutions that have been found for these problems.
3.6 Interactive multiprogramming
The next step in the development of operating systems was the introduction of interac-
tive multiprogramming, shown in Figure 1.6. The principal user-oriented transput dev-
ice changed from cards or tape to the interactive terminal. Instead of packaging all the
data that a program might need before it starts running, the interactive user is able to sup-
ply input as the program wants it. The data can depend on what the program has pro-
duced so far.
Interactive computing is sometimes added into an existing batch multiprogram-
ming environment. For example, TSO (‘‘timesharing option’’) was an add-on to the
OS/360 operating system. In contrast, batch is sometimes added into an existing interac-
tive environment. Unix installations, for example, often provide a batch service.
Interactive computing caused a revolution in the way computers were used.
Instead of being treated as number crunchers, they became information manipulators.
Interactive text editors allowed users to construct data files online. These files could
represent programs, documents, or data. Instead of speaking of a job composed of steps,
interactive multiprogramming (also called ‘‘timesharing’’) deals with sessions that last
from initial connection (logon) to the point at which that connection is broken (logoff).
service calls
other computers
user interface
kernel
device controlinterrupts
disks printers tapes
responses
processes
clockcards
process interface
device interface
terminals networks
Figure 1.6 Interactive multiprogramming
Historical development 11
During logon, the user typically gives two forms of identification: a name and a pass-
word. (The password is not echoed back to the terminal, or is at least blackened by over-
striking garbage, to avoid disclosing it to onlookers.) These data are converted into a
user identifier that is associated with all the processes that run on behalf of this user and
all the files they create. This identifier helps the kernel decide whom to bill for services
and whether to permit various actions such as modifying files. (We discuss file protec-
tion in Chapter 6.)
During a session, the user imagines that the resources of the entire computer are
devoted to this terminal, even though many sessions may be active simultaneously for
many users. Typically, one process is created at logon time to serve the user. That first
process may start others as needed to accomplish individual steps. This main process is
called the command interpreter. The command interpreter and other interactive facili-
ties are discussed in Chapter 7, which discusses the general subject of the user interface.
The development of computing strategies has not ended. Recent years have seen
the introduction of personal computers. The operating systems of these machines often
provide for interactive computing but not for multiprogramming. CP/M is a good exam-
ple of this approach. Other operating systems provide multiprogramming as well as
interaction and allow the user to start several activities and direct attention to whichever
is currently most interesting.
3.7 Distributed computing
The newest development in operating systems is distributed computation. Computers
can be connected together by a variety of devices. The spectrum ranges from tight cou-
pling, where several computers share main storage, to very loose coupling, where a
number of computers belong to the same international network and can send one another
messages. Chapter 9 discusses inter-process communication and other issues that
become especially important in the distributed-computing domain.
4 THE BEAUTIFICATION PRINCIPLE
We have seen the Resource Principle as a way to define operating systems. An equally
important definition is the Beautification Principle:
12 Introduction Chapter 1
Beautification Principle
An operating system is a set of algorithms that
hide the details of the hardware
and provide a more pleasant environment.
Hiding the details of the hardware has two goals.
Security. We have already seen that the operating system must secure itself and
other processes against accidental or malicious interference. Certain instructions
of the machine, notably those that halt the machine and those that perform transput,
must be removed from the reach of processes. Modern hardware provides several
processor states that restrict the use of instructions. For example, some architec-
tures provide two states, called privileged state and non-privileged state.
Processes run in non-privileged state. Instructions such as those that perform
transput and those that change processor state cause traps when executed in non-
privileged state. These traps force the processor to jump to the operating system
and enter privileged state. The operating system runs in privileged state. All
instructions have their standard meanings in this state. As we will see in Chapter
3, the operating system can restrict access to main store so that processes may not
access all of it.
Abstraction. Operating systems, like other software components, construct
higher-level (virtual) resources out of lower-level (physical) ones. Details of the
lower-level structures are hidden, and higher-level structures are introduced. From
the point of view of a process, the physical machine is enhanced by the operating
system into a virtual machine. Enhancement includes both simplicity (hiding
details) and function (introducing new structures). Neither time (ability to execute)
nor space (main store) appears to be shared with other processes. The virtual
machine is thus simpler than the physical machine. The process interface provides
extra instructions that improve on the basic hardware instruction set, particularly
with respect to transput. The virtual machine is thus more functional than the phy-
sical machine.
From the point of view of the user, the operating system provides services that
are not present in the underlying machine. These services include loading and run-
ning programs, providing for interaction between the user and the running pro-
grams, allowing several programs to run at the same time, maintaining accounts to
charge for services, storing data and programs, and participating in networks of
computers.
An important example of the beautification role of the operating system is found in
transput services. transput devices are extremely difficult to program efficiently and
correctly. Most operating systems provide device drivers that perform transput opera-
tions on behalf of processes. These drivers also ensure that two processes do not
accidentally try to use the same device at once. The operations that are provided are
often at a much higher level than the device itself provides. For example, device-
completion interrupts might be hidden; the operating system might block processes that
perform transput until the transfer completes. Chapter 5 is devoted to a discussion of
The Beautification Principle 13
transput devices and how they are manipulated by the operating system. An abstract file
structure is often imposed on the data stored on disk. This structure is higher-level than
the raw disk. Chapter 6 describes files and how they are implemented.
5 THE KERNEL AND PROCESSES
Before we study how operating systems manage resources such as time and space, we
must first lay some foundations. In particular, you must understand how an operating
system represents processes and how it switches between them. The core of the operat-
ing system is the kernel, a control program that reacts to interrupts from external devices
and to requests for service from processes. We have depicted the kernel in Figures 1.5
and 1.6. The kernel is a permanent resident of the computer. It creates and terminates
processes and responds to their requests for service.
5.1 Context blocks
Each process is represented in the operating system by a collection of data known as the
context block. The context block includes such information as the following.
state and scheduling statistics (described in Chapter 2)
use of main and backing store (described in Chapter 3)
other resources held (described in Chapter 4)
open transput devices (described in Chapter 5)
open files (described in Chapter 6)
accounting statistics
privileges.
We will single out several important pieces of information. Here is a Modula
declaration that will serve our purpose:
1 const
2 MaxNumProcesses = 10; { the number of processes we are
3 willing to let exist at any one time }
4 NumRegisters = 16; { the number of registers this computer has }
14 Introduction Chapter 1
5 type
6 ContextBlockType = { per-process information }
7 record
8 { state vector information }
9 ProgramCounter : Address; { execution address of the program }
10 ProcessorState : integer; { state of processor, including
11 such information as priority and mode. Depends on the
12 computer hardware. }
13 Registers : array 1:NumRegisters of integer;
14 { other information here }
15 end; { ContextBlockType }
16 var
17 ContextBlocks : { all information about processes }
18 array 1:MaxNumProcesses of ContextBlockType;
We will concentrate for the time being on the state vector part of the context block
(lines 813). This is the part of the context block that the operating system keeps avail-
able at all times. Other less frequently used parts of the context block might be stored on
backing store (disk, for example). In the simple state vector shown here, the operating
system records the program counter (line 9) and the processor state (line 10) of the
process. The meaning of these fields depends on the type of computer for which the
operating system is designed. The program counter tells where the next instruction to be
executed by this process is stored, and the processor state indicates hardware priority and
other details that we shall ignore for now. In addition, the state vector holds the values of
the computer’s registers as they were when the process last stopped running.
Assume that process A has been picked to run next. We are not interested at
present in the policy that decided that A should run, only the mechanism by which the
kernel causes A to start executing. The ground rules that we will follow for this exercise
are the following:
(1) There is only one processor, so only one process can run at a time. (Multipro-
cessor computers have several processors, so they can run several processes
simultaneously. We will discuss multiprocessors in Chapter 9.)
(2) The kernel has decided to start running A instead of the process that is currently
using the computing resource.
(3) The state vector for A accurately represents the state of the program counter, the
processor state, and the registers the last time A was running. All these must be
restored as part of turning the processor over to A.
(4) A’s program is currently in main store, and we can ignore all aspects of space
management. Space management is a large subject that we will take up in
Chapter 3.
Switching the processor from executing the kernel to executing A is called context
switching, since the hardware must switch from the context in which it runs the kernel to
the one in which A runs. In Figure 1.7, the currently executing object is highlighted with
a double border. Before the context switch, the kernel is running. Afterwards, process A
is running. Switching back to the kernel from process A is also a context switch. This
kind of context switch happens when A tries to execute a privileged instruction (includ-
ing the service call instruction) or when a device generates an interrupt. Both these situa-
tions will be described in more detail. In either case, a context switch to the kernel gives
it the opportunity to update accounting statistics for the process that was running and to
select which process should be run after the service call or interrupt has been serviced.
The kernel and processes 15
service calls
devices
devices
kernel
kernel
processes
processes
responses
A B C D E
responses
before
after
A B C D E
service calls
Figure 1.7 Context switch
Not only does the kernel have its own register contents and its own program
counter, but it also has special privileges that allow it to access transput devices. These
privileges must be turned off whenever a process is running. Privilege is usually con-
ferred by the processor state, so a process has a different processor state from the kernel.
Some computers provide only separate privileged and non-privileged states, whereas oth-
ers have several gradations between them. The ability to change from one state to
another requires special privilege.
Since this is our first detailed example of the work that an operating system per-
forms, we will say a word about how operating systems are constructed. In the early
days of computers, operating systems were written as single large programs encompass-
ing hundreds of thousands of lines of assembler instructions. Two trends have made the
job of writing operating systems less difficult. First, high-level languages have made
programming the operating system much easier. Second, the discipline of structured pro-
gramming has suggested a modular approach to writing large programs; this approach
allows large problems to be decomposed into smaller pieces. The program that switches
context can be seen as one such small piece. It could be a procedure written in a high-
level language like Modula. These pieces can often be arranged in layers, with each
layer providing services to the ones above it. For example, one could build the layers as
follows:
Context- and process-switch services (lowest layer)
Device drivers
Resource managers for space and time
Service call interpreter (highest layer)
16 Introduction Chapter 1
For example, the CP/M operating system provides three levels: (1) device drivers (the
BIOS section of the kernel), (2) a file manager (BDOS), and (3) an interactive command
interpreter (CCP). It supports only one process and provides no security, so there is no
need for special context-switch services.
Switching context from the kernel back to a process involves copying information
between the context block and hardware registers of the machine. This information
includes the program counter, the processor state, and the contents of addressible regis-
ters. Most high-level languages (including Modula) do not provide the necessary facility
to deal with these hardware issues directly. Luckily, some newer computers (such as the
DEC VAX) have single instructions that do all the context-switch work themselves. Still,
high-level languages are unlikely to generate those instructions. Furthermore, the speed
of context switching is critical because this operation takes place every time an interrupt
is serviced by the kernel or a process makes a request to the kernel. The result is that
context switching is usually performed by a procedure written in assembler language.
5.2 Process lists
The context blocks for processes are stored in lists. Each list is dedicated to some partic-
ular class of processes. These classes can be divided as follows.
Running. The process that is currently executing. On most computers, only one
process is running at any time. However, on multiprocessors, which we discuss in
Chapter 9, several processes can run at once.
Ready. Processes that are ready to run but are not currently running because of a
policy decision. As we will see in Chapter 2, there may be several ready lists.
Waiting. Processes that cannot run now because they have made requests that
have not yet been fulfilled. The kernel might keep a different list for every type of
service that may have been requested. For example, space management sometimes
causes processes to wait in a ‘‘main-store wait’’ list until there is enough room to
run them. A process reading data from a file might wait in a ‘‘file transput wait
list’’ until the data are read in. Each device that a process may use for transput
might have its own wait list. While a process is in a wait list, we say it is blocked.
These lists are commonly called ‘‘queues,’’ but they need not be built as queues usually
are, with entry only at one end and departure from the other. They may be represented
implicitly, with each context block holding a field that indicates which list it is on. They
may be stored in a heap data structure according to some priority so that the one with the
most urgency can be accessed quickly.
5.3 Service calls
The kernel and processes 17
Various events can cause a process to be moved from one list to another. A process
makes a request of the kernel by submitting a service call, which might ask for resources,
return resources, or perform transput. As a result of this call, the scheduler might decide
to place that process back on the ready list and start running another process from the
ready list. This operation, which we call a process switch, usually takes more time than
a simple context switch. After the process switch, a context switch starts executing the
new process. Figure 1.8 shows the effect of the kernel switching process from A to B.
Most operating systems build service calls from instructions that cause processor
traps. Processor traps always switch context to the kernel. On the DEC PDP-11, for
example, the EMT and the TRAP instructions are both used by various operating systems
to achieve communication between processes and the kernel. A similar effect is achieved
on the IBM 360 and 370 computers with the SVC instruction and on the DEC PDP-10
with the UUO instruction. A trap causes the hardware to copy certain hardware registers,
such as the program counter and the processor state, to a safe place (typically onto a
stack). The hardware then loads those hardware registers with the appropriate new con-
text (which includes the program counter for the location in the kernel where its trap-
handler program is stored). It then sets the processor to privileged state. The operating
system must then move the context information saved by the hardware into the context
block of the process that was running at the time of the trap.
Some operating systems use ordinary subroutine-call instructions for service calls.
CP/M, for example, uses a jump to location 5 to invoke service calls. Again, the operat-
ing system may save context information in the context block while it is handling the ser-
vice call.
responses
processes
devices
kernel
A B C D E
service calls
responses
processes
devices
kernel
before
after
A B C D E
service calls
Figure 1.8 Process switch
18 Introduction Chapter 1
Service calls are like subroutine calls from the point of view of the calling process.
Arguments are first placed in a standard place (on a stack, in registers, right after the call,
or in a communication area), and then the service-call instruction is executed. When
control is returned by the kernel to the process, the process is ready to execute the next
instruction. Results passed back from the kernel are stored in registers, on the stack, or in
a communication area.
Sometimes service calls are simple matters that should not cause the process mak-
ing the request to wait. For example, a process might request the current time of day.
This information is readily available, so the kernel just switches context back to the cal-
ling process, giving it the information it wanted. If the process must wait before the ser-
vice it requested can be provided, process switching is involved. We will see that store
management introduces extra costs for process switching.
Interrupts caused by a transput device also cause context to switch from the current
process to the kernel. The same sequence of actions (saving old context, loading new
context, changing state) occurs for interrupts as for traps. The interrupt might indicate
completion of an operation that some other process was waiting for, in which case the
kernel might place that process on the ready list. A policy decision could switch to that
process, at the expense of the former process. We will discuss devices in Chapter 5.
One very important interrupt is generated by a device called the clock. Clocks can
be designed to interrupt periodically (every 60th of a second, for example) or to accept an
interval from the computer and interrupt when that time has expired. If it were not for
clock interrupts, a running process could sit in an accidental infinite loop that performs
no service calls, and the kernel would never be able to wrest control from it. The kernel
therefore depends on the clock to force a context switch back to the kernel so that it can
make new policy decisions.
The fact that processes belong to lists points to an important insight into operating
system structure:
Level Principle
Active entities are data structures when viewed from a lower level.
The Level Principle applies to processes in this way: A process considers itself an active
entity that executes instructions and makes service calls on the kernel. From the kernel’s
point of view, however, the process is just a data structure (largely contained in the con-
text block, but also including all the store used by the process) that can be manipulated.
Manipulations include moving the process from list to list and causing the process to exe-
cute.
Even an executing program like the kernel is subject to the Level Principle. Each
instruction appears to be an active entity that moves information from one place to
another. However, instructions are just data to the hardware, which interprets those
instructions and causes the information to move.
The converse of the Level Principle sometimes sheds new light as well. Objects
that appear as data can occasionally be seen as active objects in some sense. For exam-
ple, adding two numbers can be seen either as an action taken by some external agent or
as an action taken by the first of the numbers on the second. Such a novel approach does
make sense (the Smalltalk language is implemented this way!) and can lead to a useful
The kernel and processes 19
decomposition of work so that it can be distributed. We will discuss distributed work in
Chapter 9.
6 VIRTUAL MACHINES
Although most operating systems interpret the Beautification Principle to mean that
processes should have an enhanced or simplified view of the hardware, some take a dif-
ferent tack. They make the process interface look just like the hardware interface. In
other words, a process is allowed to use all the instructions, even the privileged ones.
The process interface is called a virtual machine because it looks just like the underly-
ing machine. The kernel of such an operating system is called a virtualizing kernel.
We will devote some attention to this style of operating system because it clearly shows
the interplay of traps, context switches, processor states, and the Level Principle.
Virtual machine operating systems are both useful and complex. They are useful
because they allow operating system designers to experiment with new ideas without
interfering with the user community. Before virtual machine operating systems, all
operating system testing had to be performed on separate machines dedicated to that pur-
pose. Any error in the kernel, no matter how trivial, could bring the entire operating sys-
tem down. Any user unfortunate to be running programs at that time could lose a
significant amount of work. With a virtual machine operating system, the test version of
the operating system can run as a process controlled by the virtualizing kernel. The other
processes, which are running ordinary programs, are not affected by errors in the test ver-
sion. The only effect is that they don’t run quite as fast, because the test version com-
petes with them for resources. This arrangement is shown in Figure 1.9.
4321
PPPP
V
test kernel
virtual devices
physical devices
virtualizing kernel
other processestest operating system
Figure 1.9 Testing a new operating system
20 Introduction Chapter 1
A second use of virtual machine operating systems is to integrate batch and
interactive modes by letting them occupy different virtual machines. This scheme,
shown in Figure 1.10, can be a fast way to piece together two fundamentally different
operating systems for the same machine.
The ability to run several operating systems at once on the same machine has other
advantages as well.
It can alleviate the trauma of new operating system releases, since the old release
may be used on one of the virtual machines until users have switched over to the
new release, which is running on a different virtual machine under the control of
the same virtualizing kernel.
It can permit students to write real operating systems without interfering with the
other users of the machine.
It can enhance software reliability by isolating software components in different
virtual machines.
It can enhance security by isolating sensitive programs and data to their own vir-
tual machine.
It can test network facilities, such as those discussed in Chapter 9, by simulating
machine-machine communication between several virtual machines on one physi-
cal machine.
It can provide each user with a separate virtual machine in which a simple single-
user operating system runs. Designing and implementing this simple operating
system can be much easier than designing a multi-user operating system, since it
can ignore protection and multiprogramming issues. The CMS (Conversational
Monitor System) operating system for the IBM 370 computer, for example, is usu-
ally run in a virtual machine controlled by the VM/370 virtualizing kernel. CMS
provides only one process and very little protection. It assumes that the entire
machine is at the disposal of the user. Under VM/370, a new virtual machine is
kernel
virtual devicesvirtual devices
kernel
virtualizing kernel
physical devices
V
P P
1 2
batch operating system interactive operating system
Figure 1.10 Integrating two operating systems
Virtual machines 21
created for each user who logs on. Its control panel is mapped onto the user’s ter-
minal, and buttons are ‘‘pushed’’ by typing in appropriate commands. These com-
mands allow for initial program load (to start up CMS, for instance) and for inter-
rupting the virtual machine.
Virtual-machine operating systems are complex. To provide acceptable speed, the
hardware executes most instructions directly. One might think that the virtualizing ker-
nel V can run all its processes P
i
in privileged state and let them use all the hardware
instructions. However, privileged instructions are just too dangerous to let processes use
directly. What if P
i
executes the halt instruction? Instead, V must run all P
i
in non-
privileged state to prevent them from accidentally or maliciously interfering with each
other and with V itself. In fact, virtual-machine operating systems cannot be imple-
mented on computers where dangerous instructions are ignored or fail to trap in non-
privileged state. For example, the PDP-11/45 in non-privileged state fails to trap on
several dangerous instructions. In general, an instruction is dangerous if it performs
transput, manipulates address-translation registers (discussed in Chapter 3), or manipu-
lates the processor state (including interrupt-return instructions and priority setting
instructions).
To let P
i
imagine it has control of processor states, even though it does not, V
keeps track of the virtual processor state of each P
i
, that is, the processor state of the vir-
tual machine that V emulates on behalf of P
i
. This information is stored in P
i
’s context
block inside of V. All privileged instructions executed by P
i
cause traps to V, which
then emulates the behavior of the bare hardware on behalf of P
i
.
If P
i
was in virtual non-privileged state, V emulates a trap for P
i
. This emulation
puts P
i
in virtual privileged state, although it is still run, as always, in physical
non-privileged state. The program counter for P
i
is reset to the proper trap address
within P
i
’s virtual space. (We will see in Chapter 3 how virtual space is managed
for virtual machine operating systems.) We say that V has reflected the trap to P
i
.
If P
i
was in virtual privileged state, V emulates the action of the instruction itself.
For example, it terminates P
i
on a halt instruction, and it executes transput instruc-
tions interpretatively.
Some dangerous instructions are particularly difficult to emulate. Transput can be
very tricky. Channel programs (discussed in Chapter 5), which control some sophisti-
cated transput devices, must be translated and checked for legality. Self-modifying chan-
nel programs are especially hard to translate. The virtualizing kernel may wish to simu-
late one device by another, for example, simulating a printer on a disk or a small disk on
a part of a larger one. A device-completion interrupt can indicate that a transput opera-
tion started on behalf of some P
i
has terminated. In this case, the interrupt must be
reflected to the appropriate P
i
. In contrast, emulating a single transput operation for P
i
may require several transput operations, so device-completion interrupts often indicate
that V may proceed to the next step of emulating an operation already in progress. Such
interrupts are not reflected. If the computer communicates with devices through registers
with addresses in main store, all access to that region of store must cause traps so that V
can emulate the transput. Address translation also becomes quite complex. We will
defer discussing this subject until Chapter 3.
22 Introduction Chapter 1
A good test of a virtualizing kernel is to let one of its processes be another virtual-
izing kernel. This arrangement can also be useful to test out a new version of V. How-
ever, dangerous operations can be quite slow when there are many levels. The number of
reflections grows exponentially with the number of levels. For example, consider Figure
1.11, in which there are two levels of virtualizing kernel, V
1
and V
2
, above which sits an
ordinary operating system kernel, OS, above which a compiler is running. The compiler
executes a single service call (marked *) at time 1. As far as the compiler is concerned,
OS performs the service and lets the compiler continue (marked c) at time 29. The
dashed line at the level of the compiler indicates the compiler’s perception that no
activity below its level takes place during the interval.
From the point of view of OS, a trap occurs at time 8 (marked by a dot on the
control-flow line). This trap appears to come directly from the compiler, as shown by the
dashed line connecting the compiler at time 1 and the OS at time 8. OS services the trap
(marked s). For simplicity, we assume that it needs to perform only one privileged
instruction (marked p) to service the trap, which it executes at time 9. Lower levels of
software (which OS cannot distinguish from hardware) emulate this instruction, allowing
OS to continue at time 21. It then switches context back to the compiler (marked b) at
time 22. The dashed line from OS at time 22 to the compiler at time 29 shows the effect
of this context switch.
The situation is more complicated from the point of view of V
2
. At time 4, it
receives a trap that tells it that its client has executed a privileged instruction while in vir-
tual non-privileged state. V
2
therefore reflects this trap at time 5 (marked r) back to OS.
Later, at time 12, V
2
receives a second trap, this time because its client has executed a
privileged instruction in virtual privileged state. V
2
services this trap by emulating the
instruction itself at time 13. By time 17, the underlying levels allow it to continue, and at
time 18 it switches context back to OS. The last trap occurs at time 25, when its client
has attempted to perform a context switch (which is privileged) when in virtual
privileged state. V
2
services this trap by changing is client to virtual non-privileged state
and switching back to the client at time 26.
V
1
has the busiest schedule of all. It reflects traps that arrive at time 2, 10, and 23.
*
s p
p
p
sb bb
b
c
c
c
302520151050
s
s
s r
b
br
s
r
s
s br
2
1
V
V
OS
Compiler
Figure 1.11 Emulating a service call
Virtual machines 23
(The trap at time 23 comes from the context-switch instruction executed by OS.) It emu-
lates instructions for its client when traps occur at times 5, 14, 19, and 27.
This example demonstrates the Level Principle: Each software level is just a data
structure as far as its supporting level is concerned. It also shows how a single privileged
instruction in the compiler became two privileged instructions in OS (p and b), which
became four in V
2
(r, p, b, and b) and eight in V
1
. In general, the situation can be far
worse. A single privileged instruction at one level might require many instructions at its
supporting level to emulate it.
The virtualizing kernel can be complex, and the emulation of privileged instruc-
tions can be slow. These drawbacks can be mitigated to some extent.
Don’t try to emulate one device by another.
Disallow some features, such as self-modifying channel programs. Of course, such
a design will compromise the purity of the virtual machine and will make it harder
to transfer programs from bare machines to run under the virtual-machine operat-
ing system.
Provide some extra features so that processes won’t have to use as many privileged
instructions. A certain amount of file support (discussed in Chapter 6) could help.
This design makes it harder to transfer programs from the virtual-machine operat-
ing system to a bare machine.
Use special hardware to pass traps directly to the correct software level. The IBM
370 has some hardware support for VM/370, for example.
7 FURTHER READING
A number of other operating system textbooks are available. Brinch Hansen’s early text
(1973) is especially good in its treatment of concurrency issues. Another early classic
was written by Madnick and Donovan (1972). The standard texts by Shaw (1974) and
Habermann (1976) are full of useful details. Calingaert (1982) presents an overview of a
large number of subjects, whereas Turner (1986) shows a few essential subjects in some
depth. Excellent texts are by Deitel (1983) and by Peterson and Silberschatz (1985).
Some advanced subjects can be found in a new book by Maekawa (1987). Beck’s book
on systems programming (1985) devotes an excellent chapter to operating systems.
Recently, several books have appeared that cover both the theory of operating systems
and an annotated listing of a Unix-like kernel. These texts, including two by Comer
(1984; 1987) and one by Tanenbaum (1987), are an excellent source of information about
operating systems and how to build them. Books describing the internals of particular
operating systems, such as the book on VMS by Kenah and Bate (1984), are often full of
fascinating detail.
The distinction between mechanism and policy was championed by the Hydra
operating system (Levin et al., 1977). The Modula programming language that we use in
examples was defined by Wirth (1972). It is based on his earlier language Pascal (Jensen
and Wirth, 1974). Virtual machines are discussed in a survey article by Goldberg (1974),
24 Introduction Chapter 1
and a case study for the PDP-11 is described by Popek (1975).
8 EXERCISES
1. Compute the utilization for open shop. We define the utilization u as the fraction
of time used for computation. Assume that a typical job requires r = 10 seconds to
read in from cards, c = 3 seconds to compute and p = 30 seconds to print the
results on paper. Programmers sign up for 15-minute slots and run their programs
twice per slot.
2. Compute the utilization for the same situation, using operator-driven shop.
Assume that an operator takes s = 30 seconds to remove the output from one job
and set up the next job. There are enough jobs to ensure that the operator always
has another job to start as soon as one finishes.
3. Compute the utilization for the same situation, using offline transput. Assume that
it takes only 1/100 as long to read information from tape as from cards and only
1/100 as long to write information to tape as to paper. The resident monitor takes
s = 0.1 seconds to reset the machine between jobs. The operator spends 60
seconds to unload and load tapes after every ten jobs. There are several offline
computers, so there is no bottleneck reading jobs and printing results.
4. Compute the utilization for the same situation, using spooling. Assume that the
computer has enough card readers and printers so that there are always jobs wait-
ing to be run and printing is not a bottleneck. It takes only 1/1000 as long to read
or write information from or to disk as from or to cards or paper. The computer
spends 1 percent of its time servicing interrupts for transput; this time is not
counted as computation time. It takes s = 0.01 seconds to reset the machine
between jobs.
5. Construct formulas for exercises 14 that describe the utilization in terms of the
parameters r, c, p, and s.
6. Find out what computers are available at your institution and discover whether they
use spooling, batch, interactive, or some other kind of computing.
7. In an interactive multiprogramming situation, several people could be running the
same program at once. Would there be one process or many to support this situa-
tion?
8. If two processes are using the very same main storage, what data do they not
share?
9. Throughout the book, we will suggest various service calls that the kernel may pro-
vide. Suggest a few that you think every kernel should have.
10. How can a user submit a service call?
Exercises 25
11. How can a device submit a service call?
12. Does a service call always require a context switch? Does it always require a pro-
cess switch?
13. When a device interrupts, is there always a context switch?
14. Describe the user in two ways, using the Level Principle.
15. Is the Pascal compiler part of the kernel?
16. Is the code that causes the disk to send some data into main store part of the ker-
nel?
17. Experiment to find out what restrictions your installation places on passwords.
Does it disallow passwords that are too simple? Does it prevent you from making
very long passwords?
18. Three conventions for communicating the arguments of a service call to the kernel
are to place them on a stack, in registers, or right after the call. What are the
advantages and disadvantages of these three strategies?
19. In the example of virtual machines, with a compiler above an operating system
above two levels of virtualizing kernel, how many privileged instructions would be
executed at each level if the instruction executed by the compiler can be emulated
without use of privileged instructions by the operating system?
26 Introduction Chapter 1
chapter 2
TIME MANAGEMENT
The first resource we will discuss is time. Time management is usually called schedul-
ing. The goal of scheduling is to provide good service to all the processes that are
currently competing for the computing resource, that is, the execution of instructions.
We can distinguish several classes of scheduling based on how often decisions must be
made.
Long-term scheduling decides which jobs or job steps to start next. In a spooling
system, this decision is made when a job finishes and is based on the order in which other
jobs have arrived and on their priorities. In batch multiprogramming, the decision may
be based on the different requirements of the competing jobs and the resources currently
available. Interactive multiprogramming often does not have this level of scheduling at
all; it is up to the user to decide which job steps to run. We will not discuss long-term
scheduling as such in this book, but Chapter 4 is devoted to some of the resource-
contention issues that are central to that subject.
Medium-term scheduling decides which running processes to block (deny ser-
vice) temporarily, because resources (such as main store) are overcommitted or because a
resource request cannot be satisfied at the moment. Chapter 3 discusses the intricacies of
space management and describes policies for medium-term scheduling.
Short-term scheduling, the subject of this chapter, decides how to share the com-
puter among all the processes that currently want to compute. Such decisions may be
made frequently (tens of times each second) to try to provide good service to all the
processes. When the medium- or long-term scheduler readies a process or when a trans-
put event that the process is waiting for finishes, the process arrives in the domain of the
short-term scheduler. It stays there until it terminates, it waits for transput, or a higher-
level scheduler decides to block it. Processes generally alternate between a computing
burst, during which they are in the short-term scheduler, and a transput burst, during
which they are in a wait list.
Figure 2.1 shows these three levels of scheduling. Within the domain of short-term
scheduling, a process may be either running or ready to run. The short-term scheduler is
in charge of deciding which ready process should be running at any time. Within the
27
main-store wait
short term
ready
transput wait
run
ready
medium term
ready
long term
Figure 2.1 Three levels of scheduling
domain of medium-term scheduling, a process may be running (that is, it may have
entered the domain of the short-term scheduler), may be ready to run, or may be waiting
for some resource like transput. The medium-term scheduler is in charge of deciding
when ready processes should be allowed to enter the domain of the short-term scheduler
and when they should leave that domain. This decision is based on an attempt to prevent
overcommitment of space, as we will see in Chapter 3, as well as a desire to balance
compute-bound processes with transput-bound processes. The long-term scheduler dis-
tinguishes between ready and running processes.
We have already seen the distinction between compute-bound and transput-bound
processes. From the point of view of the short-term scheduler, a compute-bound process
remains in view for a long time, since it does not terminate soon and seldom waits for
transput. For this reason, we will call compute-bound processes long processes.
In contrast, a transput-bound process comes and goes very quickly, since it disap-
pears from the view of the short-term scheduler as soon as it waits for transput. A pro-
cess that interacts heavily with the user tends to be transput-bound. The user gives it a
command, which it interprets and acts on. Shortly thereafter, the process is ready to
receive the next command. The user, however, is still puzzling over the response to the
previous command. The process spends most of its time waiting for the user to submit
the next command and only a little time computing in response to the command. Text
editor programs usually exhibit this sort of behavior. Other transput-bound processes are
not interactive at all but spend a lot of time bringing data in from devices or sending data
back out, performing very little computation in between. Programs written in Cobol
often have this flavor. Both kinds of transput-bound process are similar in that small
amounts of computation are sandwiched between longer periods of waiting. For this rea-
son, we will call them short processes.
28 Time Management Chapter 2
It is important to give special service to interactive processes, because otherwise
the user might get very frustrated. A user interacting with a process would ideally like to
see an immediate response to every command. Failing that, ‘‘consistent’’ response is
better than good average response. Response that always takes about 2 seconds is better
than response that averages about 1 second but occasionally takes 10 or only 0.5. For-
mally, a low variance of response time is better than a low mean.
Transput-bound processes must also be given special treatment. Here the mean is
more important than the variance. Let’s take as an example a process that needs to com-
pute for 1 millisecond and then waits 20 milliseconds for transput. This sequence repeats
1000 times. In total, the process needs 1 second of computation and 20 seconds of trans-
put, a total of 21 seconds. But if it is delayed half a second every time it becomes ready,
it will take 521 seconds. Even if some delays are longer, a small average delay will pay
off handsomely. A 1-millisecond average delay will allow the process to finish in 22
seconds.
1 GOALS, MEASURES, AND ASSUMPTIONS
As mentioned earlier, there are several competing goals that scheduling policies aim to
fulfill. One is ‘‘good throughput’’ getting lots of work done. For short-term schedul-
ing, good throughput means minimizing the number of process switches, because each
one costs some time during which no productive work is accomplished.
The other goal is ‘‘good service.’’ We can be more precise by defining three
related measures that tell how well we are treating a particular process. Say a process p
requires t time in execution before it can leave the ready list because it will either finish
or will need to wait for something. Then we define the following service measures for
that process.
response time T: time that p is present; finish time arrival time
missed time M: T t
penalty ratio P: T / t
response ratio R: t / T
The response time T counts not only how long p needs, but also how long it sits in the
ready list while other processes are run. It might wait its turn for a while. Once it starts,
we might be nasty and stop it after some time, letting it continue later. The entire time
that process p is on the ready list (until it leaves our view to go to other lists) is charged
to T. The process is not visible to the short-term scheduler while it is waiting for trans-
put or other resources, and therefore such wait time is not included in T.
The missed time M is the same thing, except we don’t count the amount of time
during which p is actually running. M measures the amount of time during which p
would like to run but is prevented.
The response ratio R and the penalty ratio P are inverses of each other. R
represents the fraction of the time that p is receiving service. If the response ratio R is 1,
then p never sits in the ready list while some other process runs. If the response ratio R
is 1/100, then P = 100 and the process seems to be taking 100 times as long as it should;
Goals, measures, and assumptions 29
the user may be annoyed. A response ratio greater than 1 doesn’t make any sense. Simi-
larly, the penalty ratio P ranges from 1 (which is a perfect value) upward.
If we are discussing a class of processes with similar requirements, like short
processes or long processes, we extend this notation as follows.
T(t): average response time for processes needing t time
M(t): T(t) t
P(t): T(t) / t
R(t): t / T(t)
If the average response measures turn out to be independent of t, we will just write T(),
M(), P(), and R ().
We will also refer on occasion to kernel time and idle time. Kernel time is the
time spent by the kernel in making policy decisions and carrying them out. This figure
includes context-switch and process-switch time. A well-tuned operating system tries to
keep kernel time between 10 and 30 percent. Idle time is spent when the ready list is
empty and no fruitful work can be accomplished.
One surprising theoretical result sheds light on the tradeoff between providing
good service to short and to long processes. It turns out that no matter what scheduling
method you use, if you ignore context- and process-switching costs, you can’t help one
class of jobs without hurting the other class. In fact, a minor improvement for short
processes causes a disproportionate degradation for long processes. We will therefore be
especially interested in comparing various policies with respect to how well they treat
processes with different time requirements.
The values we will get for the service measures under different policies will
depend on how many processes there are, how fast they arrive, and how long they need
to run. A fairly simple set of assumptions will suffice for our purposes. First, we will
assume that processes arrive (into the view of the short-term scheduler) in a pattern
described by the exponential distribution. One way to describe this pattern is to say
that no matter how recently the previous process arrived, the next one will arrive within t
time with probability 1e
−αt
. As t goes to infinity, this probability goes to 1e
−∞
=1.
The average time until the next arrival is 1/α. Another way to describe this pattern is to
say that the probability that k processes will arrive within one time unit is e
−α
α
k
/ k!.
The reason we pick this particular distribution is that even though it looks forbidding, it
turns out that the exponential distribution is the easiest to deal with mathematically and
still mirror the way processes actually do arrive in practice. The symbol α (‘‘alpha’’) is a
parameter of the distribution, which means that we adjust α to form a distribution with
the particular behavior we want. We call α the arrival rate, since as α increases,
arrivals happen more frequently. Figure 2.2 shows the exponential distribution for vari-
ous values of α. The exponential distribution is memoryless: The expected time to the
next arrival is always 1/α, no matter how long it has been since the previous arrival.
Observations on real operating systems have shown that the exponential arrival rate
assumption is reasonable.
Our second assumption is that the service time required by processes also follows
the exponential distribution, this time with parameter β (‘‘beta’’):
Probability(k processes serviced in one time unit) = e
−β
β
k
/ k!
The memoryless property here implies that the expected amount of time still needed by
the current process is always 1/β, no matter how long the process has been running so
far.
30 Time Management Chapter 2
α = 10
α = 5
α = 3
105
k
Probability of
k arrivals
in 1 time unit
Figure 2.2 Probability of k arrivals under the exponential distribution
We will often combine α and β to form ρ (‘‘rho’’), the saturation level, which
represents how busy the computer is on the average. We define ρ to be α / β. If ρ is 0,
new processes never arrive, so the machine is completely idle. If ρ is 1, processes arrive
on the average just exactly at the same rate as they can be finished. If ρ is greater than 1,
processes are coming faster than they can be served. In this last case, we would expect
the ready list to get longer and longer. In fact, even if ρ is just 1, the expected length of
the ready list is unbounded. The value of ρ affects the response measures differently for
different scheduling policies. We will therefore express these measures in terms of ρ.
As you can see, the formal study of time management involves a certain amount of
mathematics. It is important to realize that any comparison of policies requires that there
be a way of describing their behavior (such as R ) and a way to characterize the situation
in which the policies are measured (such as the distribution of service times). We will
express the behavior of various policies using the notation just presented but will omit all
the derivations. We will discuss formal analysis later in the Perspective section.
2 POLICIES
As we have seen, short-term scheduling refers to allocating time to processes that are in
the ready list. Every time a process arrives at the ready list, we will treat it as a new pro-
cess. It may truly be a new process, but more often it is an old one that has been brought
back to the ready list from outside the short-term scheduler. It may have returned
because the transput it was waiting for has completed, some resource it was requesting
has been granted, or the medium-term scheduler has decided to favor it. Every time it
leaves the ready list, either because it has terminated or because it must wait for some-
thing like transput, we will forget about the process. We will say that the process has
departed when we do not need to state why it has left. This narrow view will allow us to
concentrate on the fundamentals of short-term scheduling.
Policies 31
Unfortunately, no policy is truly fair. Any improvements in performance for one
class of processes is at the expense of degraded performance for some other class. We
will therefore examine how policies treat a wide range of classes. For each policy, we
will first show its behavior for a simple set of processes:
Process name Arrival time Service required
A 0 3
B 1 5
C 3 2
D 9 5
E 12 5
This set depicted in Figure 2.3.
1712
149
53
61
30
E
D
C
B
A
20151050
Figure 2.3 Processes requiring service
The time units are not important; if we like, we can imagine they are seconds. We
assume that the arrival time and service time required are whole numbers of whatever
unit we have picked.
Second, we will compare scheduling policies by simulating them for a large
number of arrivals. Figure 2.4 shows the penalty ratio for many of the policies we will
study as a function of time required. This figure is based on a simulation of 50,000
processes. Service times were randomly drawn from an exponential distribution with
β = 1.0, and arrival rates were similarly drawn from an exponential distribution with
α = 0.8. The saturation level was therefore ρ = 0.8. Statistics were gathered on each pro-
cess except for the first 100 to finish in order to measure the steady state, which is the
behavior of a system once initial fluctuations have disappeared. Processes were categor-
ized into 100 service-time percentiles, each of which had about 500 processes. The aver-
age penalty ratio for each percentile is graphed in the figure. The lines in the graph have
been smoothed slightly (simulation results are somewhat bumpy).
The average service time needed by processes in each percentile is shown in Fig-
ure 2.5. Figure 2.6 shows the missed time for each percentile under the various methods.
32 Time Management Chapter 2
1
10
100
FCFS
FCFS
HPRN
HPRN
SPN
SPN
RR
RR
FB
FB
PSPN
PSPN
Percentile of time required
Penalty ratio
1009080706050403020100
Figure 2.4 Penalty ratios of short-term scheduling policies
Time required
Percentile of time required
6
5
4
3
2
1
0
0 10 20 30 40 50 60 70 80 90 100
Figure 2.5 Average service time for each percentile
0 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
8
9
10
FCFS
FCFS
SPN
SPN
PSPN
HPRN
HPRN
FB
FB
RR
RR
Missed time
Percentile of time required
Figure 2.6 Missed time for short-term scheduling policies
Although these simulation results only display the situation for one value of ρ, they pro-
vide useful insight into the general behavior of the algorithms we will study.
2.1 First come, first served (FCFS)
In keeping with the Law of Diminishing Returns, we will start with a method that has
horrid performance. Like many very poor methods, it is easy to dream up, easy to imple-
ment, and easy to disparage. Under FCFS, the short-term scheduler runs each process
until it departs. Processes that arrive while another process is being served wait in line in
the order that they arrive. This method is also called ‘‘first in, first out’’ (FIFO).
Figure 2.7 shows how FCFS schedules our standard set of processes. The dark
regions indicate missed time for each process.
34 Time Management Chapter 2
0 5 10 15 20
A
B
C
D
E
A B C D E
0
20
153
8
10
1
3
9
12
Figure 2.7 FCFS schedule
The following table shows the same information.
Process Arrival Service Start Finish
name time required time time
T M P
A 0 3 0 3 3 0 1.0
B 1 5 3 8 7 2 1.4
C 3 2 8 10 7 5 3.5
D 9 5 10 15 6 1 1.2
E 12 5 15 20 8 3 1.6
Mean 6.2 2.2 1.74
FCFS is an example of a non-preemptive policy, which means that we never
block a process once it has started to run until it leaves the domain of the short-term
scheduler. (We might terminate a process that has exceeded an initial time estimate, but
such termination is different from preemption.) The decision of when to run a particular
process is based solely on the relative order in which it arrives at the ready list.
Non-preemptive policies are an example of the Hysteresis Principle, which we
will encounter repeatedly.
Hysteresis Principle
Resist change.
All change has a cost. In our case, preemption involves switching to a new process,
which requires updating software tables and often hardware tables as well. Non-
preemptive scheduling policies avoid the cost of change by resisting process switching
until it is inevitable.
Policies 35
How well does FCFS work? Long processes love it, and short ones hate it. To see
why, assume that four processes enter the ready list at almost the same time. They
require 1, 100, 1, and 100 seconds, respectively. Here is how the measures come out:
Process Arrival Service Start Finish
name time required time time
T M P
A 0 1 0 1 1 0 1.00
B 0 100 1 101 101 1 1.01
C 0 1 101 102 102 101 102.00
D 0 100 102 202 202 102 2.02
Mean 101.5 51.0 28.1
The penalty ratio P for process C is indefensible. Any short process caught behind a
long one will suffer a wait time much longer than the time it really needs to execute.
Long processes, in contrast, will generally receive reasonable values for P, even if they
have to wait behind a few other processes. Process D had to wait but still got a pretty
good penalty ratio.
The penalty ratio for FCFS, as seen in Figure 2.4, is very bad for short processes.
One third of all processes have a penalty ratio worse than 10. The upper 10 percent of all
processes, however, find FCFS has an excellent penalty ratio (less than 2.5).
The amount of missed time under FCFS is fairly equitable, as seen in Figure 2.6.
All classes of processes had about the same missed time: 4 seconds. This result stands to
reason. The amount of missed time for some process depends on the processes ahead of
it in the ready list and is independent of the amount of service the process itself needs.
Given our assumptions that both arrival and service time fit an exponential distri-
bution, we can express the behavior of the first come, first served scheduling policy
analytically, without resorting to simulation.
M() =
β(1−ρ)
ρ
T(t) = t+
β(1−ρ)
ρ
P(t) = 1+
tβ(1−ρ)
ρ
These formulas represent averages, and only hold for the steady state. The first formula
predicts our simulated result that M = 4, independent of t. The third formula predicts our
result that the penalty ratio is high for short processes and low for long processes.
What if ρ > 1? FCFS eventually services every process, although the missed time
gets longer and longer as ρ remains above 1. (Actually, ρ cannot remain higher than 1
for very long, since the rate at which processes arrive at the short-term scheduler depends
to some extent on the rate at which they leave. We do not actually have an infinite popu-
lation of processes.) We will see that some other methods do not guarantee eventual ser-
vice. The situation in which a process is ready to run but never gets any service is called
starvation. Starvation while waiting for other resources besides time is discussed in
Chapters 4 and 8.
36 Time Management Chapter 2
2.2 Round robin (RR)
Our next example, round robin, is a vast improvement over FCFS. The intent of round
robin is to provide good response ratios for short processes as well as long processes. In
fact, it provides identical average response ratio for all processes, unlike FCFS, which
provides identical average response time.
The round robin policy services a process only for a single quantum q of time.
Typical values of q range between 1/60 and 1 second. If the process has not finished
within its quantum, it is interrupted at the end of the quantum and placed at the rear of the
ready queue. It will wait there for its turn to come around again and then run for another
quantum. This periodic interruption continues until the process departs. Each process is
therefore serviced in bursts until it finishes. New arrivals enter the ready queue at the
rear.
Round robin can be tuned by adjusting the parameter q. If we set q so high that it
exceeds the service requirement for all processes, RR becomes just like FCFS. As q
approaches 0, RR becomes like processor sharing (PS), which means that every process
thinks it is getting constant service from a processor that is slower proportionally to the
number of competing processes. The Hysteresis Principle tells us that we should resist
such frequent switching. In fact, PS is only theoretically interesting, because as q
approaches 0, process switching happens more frequently, and kernel time rises toward
100 percent. The trick is to set q small enough so that RR is fair but high enough so that
kernel time is reasonable.
Figure 2.8 shows how RR schedules our sample processes for both q = 1 and
q = 4. If a process finishes during its quantum, another process is started immediately
and is given a full quantum. Newly arrived processes are put at the end of the ready list.
If a process arrives at the same time as a quantum finishes, we assume that the arrival
occurs slightly before the quantum actually expires.
Policies 37
C
C
C
q = 4
q = 1
EDEDBBA
E
D
B
A
20151050
CC EEEEE DDDDD BBBBB AAA
E
D
B
A
20151050
0 3
1
3
9
12
0 3
1
3
9
12
6
8
11
18
20
20
9
10
19
Figure 2.8 RR Schedule
The statistics for RR are therefore as follows.
q = 1
Process Arrival Service Finish T M P
name time required time
A 0 3 6 6 3 2.0
B 1 5 11 10 5 2.0
C 3 2 8 5 3 2.5
D 9 5 18 9 4 1.8
E 12 5 20 8 3 1.6
Mean 7.6 3.6 1.98
38 Time Management Chapter 2
q = 4
Process Arrival Service Finish T M P
name time required time
A 0 3 3 3 0 1.0
B 1 5 10 9 4 1.8
C 3 2 9 6 4 3.0
D 9 5 19 10 5 2.0
E 12 5 20 8 3 1.6
Mean 7.2 3.2 1.88
Figures 2.4 and 2.6 display RR with a quantum of 0.1. The shortest 10 percent of
all processes can be serviced in one quantum. All other processes suffer a penalty ratio of
approximately 5. RR therefore penalizes all processes about the same, unlike FCFS,
which causes all processes to miss the same amount of time. Figure 2.6 shows that the
missed time is very small (on the order of one quantum) for very short processes and that
the missed time rises steadily as processes become longer.
Under PS, which is a limiting case of RR, we have the following service measures:
T(t) =
1−ρ
t
P() =
1−ρ
1
The second formula agrees with our observation that the penalty ratio has the value 5,
independent of t.
2.3 Shortest process next (SPN)
We have seen that RR improves the response ratio for short processes but that it requires
preemption to do so. The SPN method is an attempt to have the same effect without
preemption. Instead, it requires a different ingredient: explicit information about the
service-time requirements for each process. When a process must be selected from the
ready list to execute next, the one with the shortest service-time requirement is chosen.
How are these service times discovered? It obviously won’t do to execute the pro-
cess, see how long it takes, and then schedule it according to this knowledge. One alter-
native is to have the user characterize the process before it starts. For example, a process
could be characterized as transput-bound or compute-bound. A highly interactive pro-
cess, such as a text editor, is transput-bound. A long simulation that uses internally gen-
erated random data is compute-bound. More precisely, the average service time needed
(averaged over each entry into the short-term scheduler) could be specified. It seems an
unreasonable burden to have the user characterize each process. Processes could
describe themselves (in their load image, discussed in Chapter 3), but that description
would be at best a guess. Besides, processes often go through phases, some of which
require more computation between transput events and some of which require less.
Policies 39
Instead, the short-term scheduler can accumulate service statistics for each process
every time it departs. Say a given process p used s seconds of time during its most
recent stay in the short-term ready list. Then the exponential average e
p
can be updated
this way:
e
p
:= 0.9 e
p
+ 0.1 s
The number 0.9 is called a ‘‘smoothing factor,’’ and may be set to a higher number (like
0.99) to make the estimate less responsive to change, or to a lower number (like 0.7) to
make the estimate more responsive to change. The initial estimate, for the first time the
process arrives, can be the average service time for all processes.
To demonstrate the SPN method, we will assume that the scheduler has complete
and accurate knowledge of the service requirement of each process. Our sample set of
processes is serviced as shown in Figure 2.9.
0 5 10 15 20
A
B
C
D
E
A C B D E
0 3
1
3 5
9
1210
15
20
Figure 2.9 SPN Schedule
Here are the statistics when these processes are scheduled under SPN:
Process Arrival Service Start Finish
name time required time time
T M P
A 0 3 0 3 3 0 1.0
B 1 5 5 10 9 4 1.8
C 3 2 3 5 2 0 1.0
D 9 5 10 15 6 1 1.2
E 12 5 15 20 8 3 1.6
Mean 5.6 1.6 1.32
The response time under SPN is particularly good for short processes, as you might
expect. In contrast, long processes may wait a long time indeed, especially if ρ
approaches 1. Overall, T() and M() are lower under SPN than any method that does not
use a time estimate. Although analytic results are hard to derive, Figures 2.4 and 2.6, our
40 Time Management Chapter 2
simulation results, show that the penalty ratio and missed time for SPN are better than for
RR, except for the shortest 15 percent of all processes, where the figures are still far
better than for FCFS.
2.4 Preemptive shortest process next (PSPN)
We saw that RR achieves a good penalty ratio by using preemption, and that SPN does
even better by using extra information about each process. We would expect to do still
better by combining these techniques. The PSPN preempts the current process when
another process arrives with a total service time requirement less than the remaining ser-
vice time required by the current process. The value of T(t) turns out to be lower than
for SPN for all but the longest 10 percent of all processes in our simulation of Figure 2.4.
Our five-process example behaves the same under SPN and PSPN; there is never
any preemption because short processes do not arrive near the start of execution of long
processes. Figures 2.4 and 2.6 show that for all but the longest 7 percent of all processes,
PSPN is better than SPN. It has an excellent penalty ratio, and the missed time stays very
low for a large majority of processes. Even for very long processes, PSPN is not much
worse than RR. In fact, PSPN gives the best achievable average penalty ratio because it
keeps the ready list as short as possible. It manages this feat by directing resources
toward the process that will finish soonest, and will therefore shorten the ready list
soonest. A short ready list means reduced contention, which leads to a low penalty ratio.
2.5 Highest penalty ratio next (HPRN)
Non-preemptive scheduling policies seem to give unfair advantage either to very long
processes (FCFS) or to very short ones (SPN). The HPRN method tries to be fairer and
still not introduce preemption. As a new process waits in the ready list, its value of P,
which starts at 1, begins to rise. After it has waited w time on the ready list,
P = (w+t)/t. When an old process departs, the ready process with the highest penalty
ratio is selected for execution. As long as the saturation is not unreasonable (ρ < 1), the
HPRN method will not starve even very long processes, since eventually their penalty
ratio reaches a high value.
If t is not known, it can be estimated by an exponential average of service times
during previous compute bursts, as we discussed earlier. Alternatively, we can base the
HPRN method on a medium-term penalty ratio, which is (M +t)/t, where M is the total
time missed while the process has languished either on the short-term ready list or on a
medium-term main-store wait list (but not on a transput-wait list) and t is the total cpu
time used during previous compute bursts.
HPRN strikes a nice balance between FCFS and SPN. If we use the actual value of
t, our sample process set behaves the same under HPRN and FCFS. Our simulation,
reported in Figures 2.4 and 2.6, also used the actual value of t . HPRN fits neatly
between FCFS and SPN, for short processes, where HPRN is much like SPN; for
Policies 41
middle-length processes, where HPRN has an intermediate penalty ratio; and for very
long processes, where SPN becomes worse than FCFS but where HPRN is still in the
middle.
However, HPRN has some disadvantages. First, it is not preemptive, so it cannot
beat RR or PSPN for short processes. A short process that unluckily arrives just after a
long process has started executing will still have to wait a very long time. Second, it is
generally not as good as SPN (at least in our simulation), which uses the same tech-
niques: knowledge of process length without preemption. Third, HPRN is more expen-
sive to implement, since the penalty ratio must be calculated for every waiting process
whenever a running process completes.
2.6 Multiple-level feedback (FB)
The multiple-level feedback method splits the ready list into a number of queues: queue
0, queue 1, queue 2, and so on. Lower-numbered queues have higher priority. When the
current process is interrupted at the end of its quantum, a new process is selected from
the front of the lowest-numbered queue that has any processes. After a process has used
a certain number of quanta in its queue, it is placed at the end of the next-higher-
numbered queue. (The word ‘‘feedback’’ in the name of this method refers to the fact
that processes can move from one queue to another.) In Figure 2.10, a process is allowed
only one quantum in its queue before being bumped to the next one. The statistics for
our process set are as follows.
20
19
18
7
6 12
9
3
1
0
queue555444 3333 22222 11111
ED EEEE DDDDCC BB BBB AAA
E
D
C
B
A
20151050
Figure 2.10 FB Schedule
42 Time Management Chapter 2
Process Arrival Service Finish T M P
name time required time
A 0 3 7 7 4 2.3
B 1 5 18 17 12 3.4
C 3 2 6 3 1 1.5
D 9 5 19 10 5 2.0
E 12 5 20 8 3 1.6
Mean 9.0 5.0 2.16
Short processes have priority over long ones, since they finish while still in the first
queue, whereas long processes eventually migrate to low priority queues. T() is the same
as for RR, so since short processes do better than under RR, long processes must do more
poorly. This prediction is borne out by the results in Figures 2.4 and 2.6. FB is better
than RR for about 80 percent of all processes but worse for the longest 20 percent.
The FB method has several variations.
(1) Let the quantum size depend on the queue. A queue numbered n could have a
quantum of length 2
n
q, where q is the ‘‘basic quantum’’ size. Therefore, the
queues have quanta of sizes q, 2q , 4q , 8q, and so on. The quantum given to any
process is based on the queue it is taken from. A process that needs a long time
suffers process switches after times q , 3q, 7q, 15q, and so on. The total number
of process switches is therefore log(t(p) / q) instead of t(p) / q, which is the
number needed by RR. Therefore, this method reduces process switch overhead
while still behaving much like RR.
The quantum length could be calculated by slower-growing functions, such as
n
.
q. Such functions keep the quantum size within reasonable bounds while still
reducing the total number of process switches needed for long processes.
Figure 2.11 shows how our sample processes are treated with exponentially
growing quanta.
Policies 43
0 5 10 15 20
A
B
C
D
E
A B A C B C B D D E DE E
0 0 0 0 01 1 1 1 12 2 2 queue
0
1
3
9
12
4
8
10
18
20
Figure 2.11 Exponential FB schedule
The statistics for our process set are as follows.
Process Arrival Service Finish T M P
name time required time
A 0 3 4 4 1 1.3
B 1 5 10 9 4 1.8
C 3 2 8 5 3 2.5
D 9 5 18 9 4 1.8
E 12 5 20 8 3 1.6
Mean 7.0 3.0 1.8
(2) Let a process in queue n be scheduled by RR for 2
n
(or perhaps just n) quanta
before being demoted to the next queue.
(3) Promote a process to a higher-priority queue after it spends a certain amount of
time waiting for service in its current queue.
(4) Instead of granting absolute priority to low-numbered queues, grant slices of time
to each queue, with lower-numbered queues receiving larger slices.
These variants can be used by themselves or in any combination.
2.7 Selfish round robin (SRR)
The selfish round robin method adds a new dimension to round robin by giving better
service to processes that have been executing for a while than to newcomers. Processes
in the ready list are partitioned into two lists: new and accepted. New processes wait.
Accepted processes are serviced by RR. The priority of a new process increases at rate
a. The priority of an accepted process increases at rate b . Both a and b are
44 Time Management Chapter 2
parameters; that is, they can be adjusted to tune the method. When the priority of a new
process reaches the priority of an accepted process, that new process becomes accepted.
If all accepted processes finish, the highest priority new process is accepted.
Assume that there are no ready processes, when the first one, A, arrives. It has
priority 0 to begin with. Since there are no other accepted processes, A is accepted
immediately. After a while another process, B, arrives. As long as b / a < 1, B’s priority
will eventually catch up to A’s, so it is accepted; now both A and B have the same prior-
ity. We can see that all accepted processes share a common priority (which rises at rate
b); that makes this policy easy to implement. Even if b / a > 1, A will eventually finish,
and then B can be accepted.
Adjusting the relative values of a and b has a great influence on the behavior of
SRR. If b / a 1, a new process is not accepted until all the accepted processes have
finished, so SRR becomes FCFS. If b / a = 0, all processes are accepted immediately, so
SRR becomes RR. If 0 < b / a < 1, accepted processes are selfish, but not completely.
To demonstrate how SRR schedules our running example, let us set a = 2 and
b = 1. If a new process achieves the priority of the accepted processes at the end of a
quantum, we place it on the ready list first and then preempt the running process. Figure
2.12 shows the resulting schedule, including the priority of each process at the end of
each quantum. The letter d indicates that the process is done.
0 5 10 15 20
A
B
C
D
E
A C B C D E
0 1 2
0 2 3 4 5 6 7 8
0 2 4 6 7 8
9
0 2 3 4 5 6
0 2 4 6 7 8 9 10
B A
3
B B
d
d
d
d
d
A
B
C
D
E
0
1
3
9
12
4
9
10
15
20
Figure 2.12 SRR Schedule
The statistics for our process set are as follows.
Policies 45
Process Arrival Service Finish T M P
name time required time
A 0 3 4 4 1 1.3
B 1 5 10 9 4 1.8
C 3 2 9 6 4 3.0
D 9 5 15 6 1 1.2
E 12 5 20 8 3 1.6
Mean 6.6 2.6 1.79
2.8 Hybrid methods
All sorts of methods can be invented by combining ones that we have mentioned. Here
are some examples.
(1) Use FB up to a fixed number z of quanta; then use FCFS for the last queue. This
method reduces the number of process switches for very long processes.
(2) Use RR up to some number of quanta. A process that needs more time is put in a
second run queue that is treated with SRR scheduling. Very long processes are
eventually placed in a third queue that uses FCFS. RR could have absolute
predence over SRR, which has precedence over FCFS, or each could have a fixed
percentage of total time.
2.9 State-dependent priority methods
These three methods adjust parameters based on the current state.
(1) Use RR. However, instead of keeping the quantum constant, adjust it periodi-
cally, perhaps after every process switch, so that the quantum becomes q / n,
where n is the size of the ready list. If there are very few ready processes, each
gets a long quantum, which avoids process switches. If there are very many, the
algorithm becomes more fair for all, but at the expense of process switching.
Processes that need only a small amount of time get a quantum, albeit a small
one, fairly soon, so they may be able to finish soon. The quantum should not be
allowed to drop below some given minimal value so that process switching does
not start to consume undue amounts of time.
(2) Give the current process an extra quantum whenever a new process arrives. The
effect of this gift is to reduce process switching in proportion to the level of
saturation.
(3) Some versions of Unix use the following scheduling algorithm. Every second an
internal priority is calculated for each process. This priority depends on the
external priority (set by the user) and the amount of recent time consumed. This
46 Time Management Chapter 2
latter figure rises linearly as the process runs and decreases exponentially as the
process waits (whether because of short-term scheduling or other reasons). The
exponential decay depends on the current load (that is, the size of the ready list);
if the load is higher, the central processing unit (cpu) usage figure decays more
slowly. Processes with higher recent cpu usage get lower priorities than those
with lower recent cpu usage. The scheduler runs the process with the highest
priority in the ready list. If several processes have the same priority, they are
scheduled in RR fashion.
2.10 External priority methods
These three methods adjust parameters on the basis of some external priority.
(1) Use RR, but let the quantum depend on the external priority of the process. That
is, allow larger quanta for processes run for a user willing to pay a premium for
this service.
(2) The Worst Service Next (WSN) method is a generalization of many others.
After each quantum, compute for each process how much it has suffered so far.
Suffering is an arbitrarily complex figure arrived at by crediting the process for
how much it has had to wait, how many times it has been preempted, how much
its user is paying in premiums, and how urgent it is. The process is also debited
for such items as