当前位置:首页 >> 语文 >>

Operating Systems - William Stalling 6th edition


M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 6

PART ONE
Background

P

art One provides a background and context for th

e remainder of this book. This part presents the fundamental concepts of computer architecture and operating system internals.

ROAD MAP FOR PART ONE
Chapter 1 Computer System Overview
An operating system mediates among application programs, utilities, and users, on the one hand, and the computer system hardware on the other. To appreciate the functionality of the operating system and the design issues involved, one must have some appreciation for computer organization and architecture. Chapter 1 provides a brief survey of the processor, memory, and Input/Output (I/O) elements of a computer system.

Chapter 2 Operating System Overview
The topic of operating system (OS) design covers a huge territory, and it is easy to get lost in the details and lose the context of a discussion of a particular issue. Chapter 2 provides an overview to which the reader can return at any point in the book for context. We begin with a statement of the objectives and functions of an operating system. Then some historically important systems and OS functions are described. This discussion allows us to present some fundamental OS design principles in a simple environment so that the relationship among various OS functions is clear. The chapter next highlights important characteristics of modern operating systems. Throughout the book, as various topics are discussed, it is necessary to talk about both fundamental, well-established principles as well as more recent innovations in OS design. The discussion in this chapter alerts the reader to this blend of established and recent design approaches that must be addressed. Finally, we present an overview of Windows, UNIX, and Linux; this discussion establishes the general architecture of these systems, providing context for the detailed discussions to follow.

6

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 7

CHAPTER

COMPUTER SYSTEM OVERVIEW
1.1 1.2 Basic Elements Processor Registers User-Visible Registers Control and Status Registers Instruction Execution Instruction Fetch and Execute I/O Function Interrupts Interrupts and the Instruction Cycle Interrupt Processing Multiple Interrupts Multiprogramming The Memory Hierarchy Cache Memory Motivation Cache Principles Cache Design I/O Communication Techniques Programmed I/O Interrupt-Driven I/O Direct Memory Access Recommended Reading and Web Sites Key Terms, Review Questions, and Problems 1.3

1.4

1.5 1.6

1.7

1.8 1.9

APPENDIX 1A Performance Characteristicd of Two-Level Memories Locality Operation of Two-Level Memory Performance APPENDIX 1B Procedure Control Stack Implementation Procedure Calls and Returns Reentrant Procedures

7

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 8

8

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW

An operating system (OS) exploits the hardware resources of one or more processors to provide a set of services to system users. The OS also manages secondary memory and I/O (input/output) devices on behalf of its users. Accordingly, it is important to have some understanding of the underlying computer system hardware before we begin our examination of operating systems. This chapter provides an overview of computer system hardware. In most areas, the survey is brief, as it is assumed that the reader is familiar with this subject. However, several areas are covered in some detail because of their importance to topics covered later in the book.

1.1 BASIC ELEMENTS
At a top level, a computer consists of processor, memory, and I/O components, with one or more modules of each type. These components are interconnected in some fashion to achieve the main function of the computer, which is to execute programs. Thus, there are four main structural elements: ? Processor: Controls the operation of the computer and performs its data processing functions. When there is only one processor, it is often referred to as the central processing unit (CPU). ? Main memory: Stores data and programs. This memory is typically volatile; that is, when the computer is shut down, the contents of the memory are lost. In contrast, the contents of disk memory are retained even when the computer system is shut down. Main memory is also referred to as real memory or primary memory. ? I/O modules: Move data between the computer and its external environment. The external environment consists of a variety of devices, including secondary memory devices (e. g., disks), communications equipment, and terminals. ? System bus: Provides for communication among processors, main memory, and I/O modules. Figure 1.1 depicts these top-level components. One of the processor’s functions is to exchange data with memory. For this purpose, it typically makes use of two internal (to the processor) registers: a memory address register (MAR), which specifies the address in memory for the next read or write; and a memory buffer register (MBR), which contains the data to be written into memory or which receives the data read from memory. Similarly, an I/O address register (I/OAR) specifies a particular I/O device. An I/O buffer register (I/OBR) is used for the exchange of data between an I/O module and the processor. A memory module consists of a set of locations, defined by sequentially numbered addresses. Each location contains a bit pattern that can be interpreted as either an instruction or data. An I/O module transfers data from external devices to processor and memory, and vice versa. It contains internal buffers for temporarily holding data until they can be sent on.

M01_STAL6329_06_SE_C01.QXD

2/28/08

3:42 AM

Page 9

1.2 / PROCESSOR REGISTERS

9

CPU System bus

Main memory
0 1 2 Instruction Instruction Instruction

PC IR

MAR MBR I/O AR

Execution unit

Data

I/O BR

Data Data Data

I/O module

n 2 n 1

Buffers

PC IR MAR MBR I/O AR I/O BR

Program counter Instruction register Memory address register Memory buffer register Input/output address register Input/output buffer register

Figure 1.1

Computer Components: Top-Level View

1.2 PROCESSOR REGISTERS
A processor includes a set of registers that provide memory that is faster and smaller than main memory. Processor registers serve two functions: ? User-visible registers: Enable the machine or assembly language programmer to minimize main memory references by optimizing register use. For highlevel languages, an optimizing compiler will attempt to make intelligent choices of which variables to assign to registers and which to main memory locations. Some high-level languages, such as C, allow the programmer to suggest to the compiler which variables should be held in registers. ? Control and status registers: Used by the processor to control the operation of the processor and by privileged OS routines to control the execution of programs.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 10

10

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW

There is not a clean separation of registers into these two categories. For example, on some processors, the program counter is user visible, but on many it is not. For purposes of the following discussion, however, it is convenient to use these categories.

User-Visible Registers
A user-visible register may be referenced by means of the machine language that the processor executes and is generally available to all programs, including application programs as well as system programs. Types of registers that are typically available are data, address, and condition code registers. Data registers can be assigned to a variety of functions by the programmer. In some cases, they are general purpose in nature and can be used with any machine instruction that performs operations on data. Often, however, there are restrictions. For example, there may be dedicated registers for floating-point operations and others for integer operations. Address registers contain main memory addresses of data and instructions, or they contain a portion of the address that is used in the calculation of the complete or effective address. These registers may themselves be general purpose, or may be devoted to a particular way, or mode, of addressing memory. Examples include the following: ? Index register: Indexed addressing is a common mode of addressing that involves adding an index to a base value to get the effective address. ? Segment pointer: With segmented addressing, memory is divided into segments, which are variable-length blocks of words.1 A memory reference consists of a reference to a particular segment and an offset within the segment; this mode of addressing is important in our discussion of memory management in Chapter 7. In this mode of addressing, a register is used to hold the base address (starting location) of the segment. There may be multiple registers; for example, one for the OS (i.e., when OS code is executing on the processor) and one for the currently executing application. ? Stack pointer: If there is user-visible stack2 addressing, then there is a dedicated register that points to the top of the stack. This allows the use of instructions that contain no address field, such as push and pop. For some processors, a procedure call will result in automatic saving of all uservisible registers, to be restored on return. Saving and restoring is performed by the processor as part of the execution of the call and return instructions. This allows each

1

There is no universal definition of the term word. In general, a word is an ordered set of bytes or bits that is the normal unit in which information may be stored, transmitted, or operated on within a given computer. Typically, if a processor has a fixed-length instruction set, then the instruction length equals the word length. 2 A stack is located in main memory and is a sequential set of locations that are referenced similarly to a physical stack of papers, by putting on and taking away from the top. See Appendix 1B for a discussion of stack processing.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 11

1.2 / PROCESSOR REGISTERS

11

procedure to use these registers independently. On other processors, the programmer must save the contents of the relevant user-visible registers prior to a procedure call, by including instructions for this purpose in the program. Thus, the saving and restoring functions may be performed in either hardware or software, depending on the processor.

Control and Status Registers
A variety of processor registers are employed to control the operation of the processor. On most processors, most of these are not visible to the user. Some of them may be accessible by machine instructions executed in what is referred to as a control or kernel mode. Of course, different processors will have different register organizations and use different terminology. We provide here a reasonably complete list of register types, with a brief description. In addition to the MAR, MBR, I/OAR, and I/OBR registers mentioned earlier (Figure 1.1), the following are essential to instruction execution: ? Program counter (PC): Contains the address of the next instruction to be fetched ? Instruction register (IR): Contains the instruction most recently fetched All processor designs also include a register or set of registers, often known as the program status word (PSW), that contains status information. The PSW typically contains condition codes plus other status information, such as an interrupt enable/disable bit and a kernel/user mode bit. Condition codes (also referred to as flags) are bits typically set by the processor hardware as the result of operations. For example, an arithmetic operation may produce a positive, negative, zero, or overflow result. In addition to the result itself being stored in a register or memory, a condition code is also set following the execution of the arithmetic instruction. The condition code may subsequently be tested as part of a conditional branch operation. Condition code bits are collected into one or more registers. Usually, they form part of a control register. Generally, machine instructions allow these bits to be read by implicit reference, but they cannot be altered by explicit reference because they are intended for feedback regarding the results of instruction execution. In processors with multiple types of interrupts, a set of interrupt registers may be provided, with one pointer to each interrupt-handling routine. If a stack is used to implement certain functions (e. g., procedure call), then a stack pointer is needed (see Appendix 1B). Memory management hardware, discussed in Chapter 7, requires dedicated registers. Finally, registers may be used in the control of I/O operations. A number of factors go into the design of the control and status register organization. One key issue is OS support. Certain types of control information are of specific utility to the OS. If the processor designer has a functional understanding of the OS to be used, then the register organization can be designed to provide hardware support for particular features such as memory protection and switching between user programs.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 12

12

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW

Another key design decision is the allocation of control information between registers and memory. It is common to dedicate the first (lowest) few hundred or thousand words of memory for control purposes. The designer must decide how much control information should be in more expensive, faster registers and how much in less expensive, slower main memory.

1.3 INSTRUCTION EXECUTION
A program to be executed by a processor consists of a set of instructions stored in memory. In its simplest form, instruction processing consists of two steps: The processor reads (fetches) instructions from memory one at a time and executes each instruction. Program execution consists of repeating the process of instruction fetch and instruction execution. Instruction execution may involve several operations and depends on the nature of the instruction. The processing required for a single instruction is called an instruction cycle. Using a simplified two-step description, the instruction cycle is depicted in Figure 1.2. The two steps are referred to as the fetch stage and the execute stage. Program execution halts only if the processor is turned off, some sort of unrecoverable error occurs, or a program instruction that halts the processor is encountered.

Instruction Fetch and Execute
At the beginning of each instruction cycle, the processor fetches an instruction from memory. Typically, the program counter (PC) holds the address of the next instruction to be fetched. Unless instructed otherwise, the processor always increments the PC after each instruction fetch so that it will fetch the next instruction in sequence (i.e., the instruction located at the next higher memory address). For example, consider a simplified computer in which each instruction occupies one 16-bit word of memory. Assume that the program counter is set to location 300. The processor will next fetch the instruction at location 300. On succeeding instruction cycles, it will fetch instructions from locations 301, 302, 303, and so on. This sequence may be altered, as explained subsequently. The fetched instruction is loaded into the instruction register (IR). The instruction contains bits that specify the action the processor is to take. The processor interprets the instruction and performs the required action. In general, these actions fall into four categories: ? Processor-memory: Data may be transferred from processor to memory or from memory to processor. Fetch stage Execute stage

START

Fetch next instruction

Execute instruction

HALT

Figure 1.2

Basic Instruction Cycle

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 13

1.3 / INSTRUCTION EXECUTION
0 Opcode 3 4 Address (a) Instruction format 15

13

0 S

1 Magnitude (b) Integer format

15

Program counter (PC) = Address of instruction Instruction register (IR) = Instruction being executed Accumulator (AC) = Temporary storage (c) Internal CPU registers

0001 = Load AC from memory 0010 = Store AC to memory 0101 = Add to AC from memory (d) Partial list of opcodes

Figure 1.3

Characteristics of a Hypothetical Machine

? Processor-I/O: Data may be transferred to or from a peripheral device by transferring between the processor and an I/O module. ? Data processing: The processor may perform some arithmetic or logic operation on data. ? Control: An instruction may specify that the sequence of execution be altered. For example, the processor may fetch an instruction from location 149, which specifies that the next instruction be from location 182. The processor sets the program counter to 182. Thus, on the next fetch stage, the instruction will be fetched from location 182 rather than 150. An instruction’s execution may involve a combination of these actions. Consider a simple example using a hypothetical processor that includes the characteristics listed in Figure 1.3. The processor contains a single data register, called the accumulator (AC). Both instructions and data are 16 bits long, and memory is organized as a sequence of 16-bit words. The instruction format provides 4 bits for the opcode, allowing as many as 24 16 different opcodes (represented by a single hexadecimal3 digit). The opcode defines the operation the processor is to perform. With the remaining 12 bits of the instruction format, up to 212 4096 (4 K) words of memory (denoted by three hexadecimal digits) can be directly addressed.

3

A basic refresher on number systems (decimal, binary, hexadecimal) can be found at the Computer Science Student Resource Site at WilliamStallings. com/StudentSupport.html.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 14

14

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW

Fetch stage
Memory 300 1 9 4 0 301 5 9 4 1 302 2 9 4 1 940 0 0 0 3 941 0 0 0 2 Step 1 Memory 300 1 9 4 0 301 5 9 4 1 302 2 9 4 1 940 0 0 0 3 941 0 0 0 2 Step 3 Memory 300 1 9 4 0 301 5 9 4 1 302 2 9 4 1 940 0 0 0 3 941 0 0 0 2 Step 5

Execute stage
CPU registers 3 0 1 PC 0 0 0 3 AC 1 9 4 0 IR

CPU registers Memory 300 1 9 4 0 3 0 0 PC AC 301 5 9 4 1 1 9 4 0 IR 302 2 9 4 1 940 0 0 0 3 941 0 0 0 2 Step 2 CPU registers Memory 300 1 9 4 0 3 0 1 PC 0 0 0 3 AC 301 5 9 4 1 5 9 4 1 IR 302 2 9 4 1 940 0 0 0 3 941 0 0 0 2 Step 4 CPU registers Memory 300 1 9 4 0 3 0 2 PC 0 0 0 5 AC 301 5 9 4 1 2 9 4 1 IR 302 2 9 4 1 940 0 0 0 3 941 0 0 0 5 Step 6

CPU registers 3 0 2 PC 0 0 0 5 AC 5 9 4 1 IR 3+2=5

CPU registers 3 0 3 PC 0 0 0 5 AC 2 9 4 1 IR

Figure 1.4 Example of Program Execution (contents of memory and registers in hexadecimal)

Figure 1.4 illustrates a partial program execution, showing the relevant portions of memory and processor registers. The program fragment shown adds the contents of the memory word at address 940 to the contents of the memory word at address 941 and stores the result in the latter location. Three instructions, which can be described as three fetch and three execute stages, are required: 1. The PC contains 300, the address of the first instruction. This instruction (the value 1940 in hexadecimal) is loaded into the IR and the PC is incremented. Note that this process involves the use of a memory address register (MAR) and a memory buffer register (MBR). For simplicity, these intermediate registers are not shown. 2. The first 4 bits (first hexadecimal digit) in the IR indicate that the AC is to be loaded from memory. The remaining 12 bits (three hexadecimal digits) specify the address, which is 940. 3. The next instruction (5941) is fetched from location 301 and the PC is incremented. 4. The old contents of the AC and the contents of location 941 are added and the result is stored in the AC. 5. The next instruction (2941) is fetched from location 302 and the PC is incremented. 6. The contents of the AC are stored in location 941.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 15

1.4 / INTERRUPTS

15

In this example, three instruction cycles, each consisting of a fetch stage and an execute stage, are needed to add the contents of location 940 to the contents of 941. With a more complex set of instructions, fewer instruction cycles would be needed. Most modern processors include instructions that contain more than one address. Thus the execution stage for a particular instruction may involve more than one reference to memory. Also, instead of memory references, an instruction may specify an I/O operation.

I/O Function
Data can be exchanged directly between an I/O module (e. g., a disk controller) and the processor. Just as the processor can initiate a read or write with memory, specifying the address of a memory location, the processor can also read data from or write data to an I/O module. In this latter case, the processor identifies a specific device that is controlled by a particular I/O module. Thus, an instruction sequence similar in form to that of Figure 1.4 could occur, with I/O instructions rather than memory-referencing instructions. In some cases, it is desirable to allow I/O exchanges to occur directly with main memory to relieve the processor of the I/O task. In such a case, the processor grants to an I/O module the authority to read from or write to memory, so that the I/Omemory transfer can occur without tying up the processor. During such a transfer, the I/O module issues read or write commands to memory, relieving the processor of responsibility for the exchange. This operation, known as direct memory access (DMA), is examined later in this chapter.

1.4 INTERRUPTS
Virtually all computers provide a mechanism by which other modules (I/O, memory) may interrupt the normal sequencing of the processor. Table 1.1 lists the most common classes of interrupts. Interrupts are provided primarily as a way to improve processor utilization. For example, most I/O devices are much slower than the processor. Suppose that the processor is transferring data to a printer using the instruction cycle scheme of Figure 1.2. After each write operation, the processor must pause and remain idle
Table 1.1 Classes of Interrupts
Program Generated by some condition that occurs as a result of an instruction execution, such as arithmetic overflow, division by zero, attempt to execute an illegal machine instruction, and reference outside a user’s allowed memory space. Generated by a timer within the processor. This allows the operating system to perform certain functions on a regular basis. Generated by an I/O controller, to signal normal completion of an operation or to signal a variety of error conditions. Generated by a failure, such as power failure or memory parity error.

Timer I/O Hardware failure

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 16

16

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW
I/O program 4 I/O Command 5 2a 2 END 2b Interrupt handler 5 END 3 3b WRITE 2 Interrupt handler 5 END User program 1 WRITE I/O program 4 I/O Command User program 1 WRITE I/O program 4 I/O Command

User program 1 WRITE

WRITE

WRITE 3a

3

WRITE (a) No interrupts

WRITE (b) Interrupts; short I/O wait

WRITE (c) Interrupts; long I/O wait

Figure 1.5

Program Flow of Control without and with Interrupts

until the printer catches up. The length of this pause may be on the order of many thousands or even millions of instruction cycles. Clearly, this is a very wasteful use of the processor. To give a specific example, consider a PC that operates at 1 GHz, which would allow roughly 109 instructions per second.4 A typical hard disk has a rotational speed of 7200 revolutions per minute for a half-track rotation time of 4 ms, which is 4 million times slower than the processor. Figure 1.5a illustrates this state of affairs. The user program performs a series of WRITE calls interleaved with processing. The solid vertical lines represent segments of code in a program. Code segments 1, 2, and 3 refer to sequences of instructions that do not involve I/O. The WRITE calls are to an I/O routine that is a system utility and that will perform the actual I/O operation. The I/O program consists of three sections: ? A sequence of instructions, labeled 4 in the figure, to prepare for the actual I/O operation. This may include copying the data to be output into a special buffer and preparing the parameters for a device command. ? The actual I/O command. Without the use of interrupts, once this command is issued, the program must wait for the I/O device to perform the requested

4

A discussion of the uses of numerical prefixes, such as giga and tera, is contained in a supporting document at the Computer Science Student Resource Site at WilliamStallings. com/StudentSupport.html.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 17

1.4 / INTERRUPTS

17

function (or periodically check the status, or poll, the I/O device). The program might wait by simply repeatedly performing a test operation to determine if the I/O operation is done. ? A sequence of instructions, labeled 5 in the figure, to complete the operation. This may include setting a flag indicating the success or failure of the operation. The dashed line represents the path of execution followed by the processor; that is, this line shows the sequence in which instructions are executed. Thus, after the first WRITE instruction is encountered, the user program is interrupted and execution continues with the I/O program. After the I/O program execution is complete, execution resumes in the user program immediately following the WRITE instruction. Because the I/O operation may take a relatively long time to complete, the I/O program is hung up waiting for the operation to complete; hence, the user program is stopped at the point of the WRITE call for some considerable period of time.

Interrupts and the Instruction Cycle
With interrupts, the processor can be engaged in executing other instructions while an I/O operation is in progress. Consider the flow of control in Figure 1.5b. As before, the user program reaches a point at which it makes a system call in the form of a WRITE call. The I/O program that is invoked in this case consists only of the preparation code and the actual I/O command. After these few instructions have been executed, control returns to the user program. Meanwhile, the external device is busy accepting data from computer memory and printing it. This I/O operation is conducted concurrently with the execution of instructions in the user program. When the external device becomes ready to be serviced, that is, when it is ready to accept more data from the processor, the I/O module for that external device sends an interrupt request signal to the processor. The processor responds by suspending operation of the current program; branching off to a routine to service that particular I/O device, known as an interrupt handler; and resuming the original execution after the device is serviced. The points at which such interrupts occur are indicated by in Figure 1.5b. Note that an interrupt can occur at any point in the main program, not just at one specific instruction. For the user program, an interrupt suspends the normal sequence of execution. When the interrupt processing is completed, execution resumes (Figure 1.6). Thus, the user program does not have to contain any special code to accommodate interrupts; the processor and the OS are responsible for suspending the user program and then resuming it at the same point. To accommodate interrupts, an interrupt stage is added to the instruction cycle, as shown in Figure 1.7 (compare Figure 1.2). In the interrupt stage, the processor checks to see if any interrupts have occurred, indicated by the presence of an interrupt signal. If no interrupts are pending, the processor proceeds to the fetch stage and fetches the next instruction of the current program. If an interrupt is pending, the processor suspends execution of the current program and executes an interrupt-handler routine. The interrupt-handler routine is generally part of the OS. Typically, this routine determines the nature of the interrupt and performs

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 18

18

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW
User program Interrupt handler

1 2

Interrupt occurs here

i i 1

M

Figure 1.6

Transfer of Control via Interrupts

whatever actions are needed. In the example we have been using, the handler determines which I/O module generated the interrupt and may branch to a program that will write more data out to that I/O module. When the interrupt-handler routine is completed, the processor can resume execution of the user program at the point of interruption. It is clear that there is some overhead involved in this process. Extra instructions must be executed (in the interrupt handler) to determine the nature of the interrupt and to decide on the appropriate action. Nevertheless, because of the relatively large amount of time that would be wasted by simply waiting on an I/O operation, the processor can be employed much more efficiently with the use of interrupts.

Fetch stage

Execute stage
Interrupts disabled

Interrupt stage

START

Fetch next instruction

Execute instruction

Interrupts enabled

Check for interrupt; initiate interrupt handler

HALT

Figure 1.7

Instruction Cycle with Interrupts

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 19

1.4 / INTERRUPTS
Time

19

1 4
Processor wait I/O operation

1

4 2a 5 2b
I/O operation

5

2 4 4
Processor wait I/O operation

3a 5 3b

I/O operation

5

3

(b) With interrupts (circled numbers refer to numbers in Figure 1.5b)

(a) Without interrupts (circled numbers refer to numbers in Figure 1.5a)

Figure 1.8

Program Timing: Short I/O Wait

To appreciate the gain in efficiency, consider Figure 1.8, which is a timing diagram based on the flow of control in Figures 1.5 a and 1.5b. Figures 1.5b and 1.8 assume that the time required for the I/O operation is relatively short: less than the time to complete the execution of instructions between write operations in the user program. The more typical case, especially for a slow device such as a printer, is that the I/O operation will take much more time than executing a sequence of user instructions. Figure 1.5 c indicates this state of affairs. In this case, the user program reaches the second WRITE call before the I/O operation spawned by the first call is complete. The result is that the user program is hung up at that point. When the preceding I/O operation is completed, this new WRITE call may be processed, and a new I/O operation may be started. Figure 1.9 shows the timing for this situation with and without the use of interrupts. We can see that there is still a gain in efficiency because part of the time during which the I/O operation is underway overlaps with the execution of user instructions.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 20

20

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW
Time

1 4

1 4

Processor wait

I/O operation

2
I/O operation

5

Processor wait

2

5 4

4 3
Processor wait I/O operation Processor wait I/O operation

5 5 3
(b) With interrupts (circled numbers refer to numbers in Figure 1.5c)

(a) Without interrupts (circled numbers refer to numbers in Figure 1.5a)

Figure 1.9

Program Timing: Long I/O Wait

Interrupt Processing
An interrupt triggers a number of events, both in the processor hardware and in software. Figure 1.10 shows a typical sequence. When an I/O device completes an I/O operation, the following sequence of hardware events occurs: 1. The device issues an interrupt signal to the processor. 2. The processor finishes execution of the current instruction before responding to the interrupt, as indicated in Figure 1.7. 3. The processor tests for a pending interrupt request, determines that there is one, and sends an acknowledgment signal to the device that issued the interrupt. The acknowledgment allows the device to remove its interrupt signal.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 21

1.4 / INTERRUPTS

21

Hardware
Device controller or other system hardware issues an interrupt

Software

Save remainder of process state information Processor finishes execution of current instruction Process interrupt Processor signals acknowledgment of interrupt Restore process state information Processor pushes PSW and PC onto control stack Restore old PSW and PC Processor loads new PC value based on interrupt

Figure 1.10 Simple Interrupt Processing

4. The processor next needs to prepare to transfer control to the interrupt routine. To begin, it saves information needed to resume the current program at the point of interrupt. The minimum information required is the program status word (PSW) and the location of the next instruction to be executed, which is contained in the program counter.These can be pushed onto a control stack (see Appendix 1B). 5. The processor then loads the program counter with the entry location of the interrupt-handling routine that will respond to this interrupt. Depending on the computer architecture and OS design, there may be a single program, one for each type of interrupt, or one for each device and each type of interrupt. If there is more than one interrupt-handling routine, the processor must determine which one to invoke. This information may have been included in the original interrupt signal, or the processor may have to issue a request to the device that issued the interrupt to get a response that contains the needed information. Once the program counter has been loaded, the processor proceeds to the next instruction cycle, which begins with an instruction fetch. Because the instruction fetch is determined by the contents of the program counter, control is transferred to

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 22

22

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW

the interrupt-handler program. The execution of this program results in the following operations: 6. At this point, the program counter and PSW relating to the interrupted program have been saved on the control stack. However, there is other information that is considered part of the state of the executing program. In particular, the contents of the processor registers need to be saved, because these registers may be used by the interrupt handler. So all of these values, plus any other state information, need to be saved. Typically, the interrupt handler will begin by saving the contents of all registers on the stack. Other state information that must be saved is discussed in Chapter 3. Figure 1.11 a shows a simple example. In this case, a user program is interrupted after the instruction at location N. The contents of all of the registers plus the address of the next instruction (N + 1), a total of M words, are pushed onto the control stack. The stack pointer is updated to point to the new top of stack, and the program counter is updated to point to the beginning of the interrupt service routine. 7. The interrupt handler may now proceed to process the interrupt.This includes an examination of status information relating to the I/O operation or other event that caused an interrupt. It may also involve sending additional commands or acknowledgments to the I/O device. 8. When interrupt processing is complete, the saved register values are retrieved from the stack and restored to the registers (e. g., see Figure 1.11b). 9. The final act is to restore the PSW and program counter values from the stack. As a result, the next instruction to be executed will be from the previously interrupted program. It is important to save all of the state information about the interrupted program for later resumption. This is because the interrupt is not a routine called from the program. Rather, the interrupt can occur at any time and therefore at any point in the execution of a user program. Its occurrence is unpredictable.

Multiple Interrupts
So far, we have discussed the occurrence of a single interrupt. Suppose, however, that one or more interrupts can occur while an interrupt is being processed. For example, a program may be receiving data from a communications line and printing results at the same time. The printer will generate an interrupt every time that it completes a print operation. The communication line controller will generate an interrupt every time a unit of data arrives. The unit could either be a single character or a block, depending on the nature of the communications discipline. In any case, it is possible for a communications interrupt to occur while a printer interrupt is being processed. Two approaches can be taken to dealing with multiple interrupts. The first is to disable interrupts while an interrupt is being processed. A disabled interrupt simply means that the processor ignores any new interrupt request signal. If an interrupt occurs during this time, it generally remains pending and will be checked by the processor after the processor has reenabled interrupts. Thus, when a user program is executing and an interrupt occurs, interrupts are disabled immediately. After the

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 23

1.4 / INTERRUPTS

23

T Control stack

M Y T N+1 Program counter

T Control stack

M

N

1

T
Y L 1

Program counter

Y

Start Interrupt service routine

General registers T Stack pointer Y

Y

Start Interrupt service routine

General registers T M Stack pointer

Y

L Return

L Return

Processor
T N N 1 User's program M N N 1 User's program

Processor
T

Main memory (a) Interrupt occurs after instruction at location N
Figure 1.11 Changes in Memory and Registers for an Interrupt

Main memory (b) Return from interrupt

interrupt-handler routine completes, interrupts are reenabled before resuming the user program, and the processor checks to see if additional interrupts have occurred. This approach is simple, as interrupts are handled in strict sequential order (Figure 1.12a). The drawback to the preceding approach is that it does not take into account relative priority or time-critical needs. For example, when input arrives from the communications line, it may need to be absorbed rapidly to make room for more input. If the first batch of input has not been processed before the second batch arrives, data may be lost because the buffer on the I/O device may fill and overflow.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 24

24

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW
Interrupt handler X

User program

Interrupt handler Y

(a) Sequential interrupt processing Interrupt handler X

User program

Interrupt handler Y

(b) Nested interrupt processing

Figure 1.12 Transfer of Control with Multiple Interrupts

A second approach is to define priorities for interrupts and to allow an interrupt of higher priority to cause a lower-priority interrupt handler to be interrupted (Figure 1.12b). As an example of this second approach, consider a system with three I/O devices: a printer, a disk, and a communications line, with increasing priorities of 2, 4, and 5, respectively. Figure 1.13, based on an example in [TANE06], illustrates a possible sequence.A user program begins at t 0.At t 10, a printer interrupt occurs; user information is placed on the control stack and execution continues at the printer interrupt service routine (ISR). While this routine is still executing, at t 15 a communications interrupt occurs. Because the communications line has higher priority than the printer, the interrupt request is honored. The printer ISR is interrupted, its state is pushed onto the stack, and execution continues at the communications ISR.While this

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 25

1.4 / INTERRUPTS
User program t 0
10

25

Printer interrupt service routine

Communication interrupt service routine

t

t
t

15

25
t

t

40

25

Disk interrupt service routine

t

35

Figure 1.13 Example Time Sequence of Multiple Interrupts

routine is executing, a disk interrupt occurs (t 20). Because this interrupt is of lower priority, it is simply held, and the communications ISR runs to completion. When the communications ISR is complete (t 25), the previous processor state is restored, which is the execution of the printer ISR. However, before even a single instruction in that routine can be executed, the processor honors the higherpriority disk interrupt and transfers control to the disk ISR. Only when that routine is complete (t 35) is the printer ISR resumed.When that routine completes (t 40), control finally returns to the user program.

Multiprogramming
Even with the use of interrupts, a processor may not be used very efficiently. For example, refer to Figure 1.9b, which demonstrates utilization of the processor with long I/O waits. If the time required to complete an I/O operation is much greater than the user code between I/O calls (a common situation), then the processor will be idle much of the time. A solution to this problem is to allow multiple user programs to be active at the same time. Suppose, for example, that the processor has two programs to execute. One is a program for reading data from memory and putting it out on an external device; the other is an application that involves a lot of calculation. The processor can begin the output program, issue a write command to the external device, and then proceed to begin execution of the other application. When the processor is dealing with a number of programs, the sequence with which programs are executed will depend on their relative priority as well as whether they are waiting for I/O. When a program has been interrupted and control transfers to an interrupt handler, once the interrupt-handler routine has completed, control may not necessarily immediately be returned to the user program that was in execution at the time. Instead, control may

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 26

26

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW

pass to some other pending program with a higher priority. Eventually, the user program that was interrupted will be resumed, when it has the highest priority. This concept of multiple programs taking turns in execution is known as multiprogramming and is discussed further in Chapter 2.

1.5 THE MEMORY HIERARCHY
The design constraints on a computer’s memory can be summed up by three questions: How much? How fast? How expensive? The question of how much is somewhat open ended. If the capacity is there, applications will likely be developed to use it. The question of how fast is, in a sense, easier to answer. To achieve greatest performance, the memory must be able to keep up with the processor. That is, as the processor is executing instructions, we would not want it to have to pause waiting for instructions or operands. The final question must also be considered. For a practical system, the cost of memory must be reasonable in relationship to other components. As might be expected, there is a tradeoff among the three key characteristics of memory: namely, capacity, access time, and cost. A variety of technologies are used to implement memory systems, and across this spectrum of technologies, the following relationships hold: ? Faster access time, greater cost per bit ? Greater capacity, smaller cost per bit ? Greater capacity, slower access speed The dilemma facing the designer is clear. The designer would like to use memory technologies that provide for large-capacity memory, both because the capacity is needed and because the cost per bit is low. However, to meet performance requirements, the designer needs to use expensive, relatively lower-capacity memories with fast access times. The way out of this dilemma is to not rely on a single memory component or technology, but to employ a memory hierarchy. A typical hierarchy is illustrated in Figure 1.14. As one goes down the hierarchy, the following occur:
a. Decreasing cost per bit b. Increasing capacity c. Increasing access time d. Decreasing frequency of access to the memory by the processor

Thus, smaller, more expensive, faster memories are supplemented by larger, cheaper, slower memories. The key to the success of this organization decreasing frequency of access at lower levels. We will examine this concept in greater detail later in this chapter, when we discuss the cache, and when we discuss virtual memory later in this book. A brief explanation is provided at this point. Suppose that the processor has access to two levels of memory. Level 1 contains 1000 bytes and has an access time of 0.1 ?s; level 2 contains 100,000 bytes and has an access time of 1 ?s. Assume that if a byte to be accessed is in level 1, then the

M01_STAL6329_06_SE_C01.QXD

2/27/08

9:03 AM

Page 27

1.5 / THE MEMORY HIERARCHY

27

gRe rs iste

Inb me oard mo ry

Ca

che in Ma ory em m

Ou tb sto oard rag e

isk ic d net OM g Ma D-R RW C D- W C D-R M DV D-RA DV

Of f sto -line rag e

M

agn

eti

c ta

pe

Figure 1.14 The Memory Hierarchy

processor accesses it directly. If it is in level 2, then the byte is first transferred to level 1 and then accessed by the processor. For simplicity, we ignore the time required for the processor to determine whether the byte is in level 1 or level 2. Figure 1.15 shows the general shape of the curve that models this situation.The figure shows the average access time to a two-level memory as a function of the hit ratio H, where H is defined as the fraction of all memory accesses that are found in the faster memory (e. g., the cache), T1 is the access time to level 1, and T2 is the access time to level 2.5 As can be seen, for high percentages of level 1 access, the average total access time is much closer to that of level 1 than that of level 2. In our example, suppose 95% of the memory accesses are found in the cache (H 0.95). Then the average time to access a byte can be expressed as (0.95) (0.1 ?s) (0.05) (0.1 ?s 1 ?s) 0.095 0.055 0.15 ?s

5

If the accessed word is found in the faster memory, that is defined as a hit. A miss occurs if the accessed word is not found in the faster memory.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 28

28

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW
T1 T2 T2 Average access time

T1

0 1 Fraction of accesses involving only level 1 (Hit ratio)

Figure 1.15 Performance of a Simple Two-Level Memory

The result is close to the access time of the faster memory. So the strategy of using two memory levels works in principle, but only if conditions (a) through (d) in the preceding list apply. By employing a variety of technologies, a spectrum of memory systems exists that satisfies conditions (a) through (c). Fortunately, condition (d) is also generally valid. The basis for the validity of condition (d) is a principle known as locality of reference [DENN68]. During the course of execution of a program, memory references by the processor, for both instructions and data, tend to cluster. Programs typically contain a number of iterative loops and subroutines. Once a loop or subroutine is entered, there are repeated references to a small set of instructions. Similarly, operations on tables and arrays involve access to a clustered set of data bytes. Over a long period of time, the clusters in use change, but over a short period of time, the processor is primarily working with fixed clusters of memory references. Accordingly, it is possible to organize data across the hierarchy such that the percentage of accesses to each successively lower level is substantially less than that of the level above. Consider the two-level example already presented. Let level 2 memory contain all program instructions and data. The current clusters can be temporarily placed in level 1. From time to time, one of the clusters in level 1 will have to be swapped back to level 2 to make room for a new cluster coming in to level 1. On average, however, most references will be to instructions and data contained in level 1. This principle can be applied across more than two levels of memory. The fastest, smallest, and most expensive type of memory consists of the registers internal to the processor. Typically, a processor will contain a few dozen such registers, although some processors contain hundreds of registers. Skipping down two levels, main memory is the principal internal memory system of the computer. Each location in

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 29

1.6 / CACHE MEMORY

29

main memory has a unique address, and most machine instructions refer to one or more main memory addresses. Main memory is usually extended with a higher-speed, smaller cache. The cache is not usually visible to the programmer or, indeed, to the processor. It is a device for staging the movement of data between main memory and processor registers to improve performance. The three forms of memory just described are, typically, volatile and employ semiconductor technology. The use of three levels exploits the fact that semiconductor memory comes in a variety of types, which differ in speed and cost. Data are stored more permanently on external mass storage devices, of which the most common are hard disk and removable media, such as removable disk, tape, and optical storage. External, nonvolatile memory is also referred to as secondary memory or auxiliary memory. These are used to store program and data files and are usually visible to the programmer only in terms of files and records, as opposed to individual bytes or words. A hard disk is also used to provide an extension to main memory known as virtual memory, which is discussed in Chapter 8. Additional levels can be effectively added to the hierarchy in software. For example, a portion of main memory can be used as a buffer to temporarily hold data that are to be read out to disk. Such a technique, sometimes referred to as a disk cache (examined in detail in Chapter 11), improves performance in two ways: ? Disk writes are clustered. Instead of many small transfers of data, we have a few large transfers of data. This improves disk performance and minimizes processor involvement. ? Some data destined for write-out may be referenced by a program before the next dump to disk. In that case, the data are retrieved rapidly from the software cache rather than slowly from the disk. Appendix 1 A examines the performance implications of multilevel memory structures.

1.6 CACHE MEMORY
Although cache memory is invisible to the OS, it interacts with other memory management hardware. Furthermore, many of the principles used in virtual memory schemes (discussed in Chapter 8) are also applied in cache memory.

Motivation
On all instruction cycles, the processor accesses memory at least once, to fetch the instruction, and often one or more additional times, to fetch operands and/or store results. The rate at which the processor can execute instructions is clearly limited by the memory cycle time (the time it takes to read one word from or write one word to memory). This limitation has been a significant problem because of the persistent mismatch between processor and main memory speeds: Over the years, processor speed has consistently increased more rapidly than memory access speed. We are faced with a tradeoff among speed, cost, and size. Ideally, main memory should be

M01_STAL6329_06_SE_C01.QXD

2/28/08

11:38 PM

Page 30

30

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW
Byte or word transfer Block transfer

CPU

Cache

Main memory

Figure 1.16 Cache and Main Memory

built with the same technology as that of the processor registers, giving memory cycle times comparable to processor cycle times. This has always been too expensive a strategy. The solution is to exploit the principle of locality by providing a small, fast memory between the processor and main memory, namely the cache.

Cache Principles
Cache memory is intended to provide memory access time approaching that of the fastest memories available and at the same time support a large memory size that has the price of less expensive types of semiconductor memories. The concept is illustrated in Figure 1.16. There is a relatively large and slow main memory together with a smaller, faster cache memory. The cache contains a copy of a portion of main memory. When the processor attempts to read a byte or word of memory, a check is made to determine if the byte or word is in the cache. If so, the byte or word is delivered to the processor. If not, a block of main memory, consisting of some fixed number of bytes, is read into the cache and then the byte or word is delivered to the processor. Because of the phenomenon of locality of reference, when a block of data is fetched into the cache to satisfy a single memory reference, it is likely that many of the nearfuture memory references will be to other bytes in the block. Figure 1.17 depicts the structure of a cache/main memory system. Main memory consists of up to 2n addressable words, with each word having a unique n-bit address. For mapping purposes, this memory is considered to consist of a number of fixedlength blocks of K words each. That is, there are M 2n/K blocks. Cache consists of C slots (also referred to as lines) of K words each, and the number of slots is considerably less than the number of main memory blocks (C << M).6 Some subset of the blocks of main memory resides in the slots of the cache. If a word in a block of memory that is not in the cache is read, that block is transferred to one of the slots of the cache. Because there are more blocks than slots, an individual slot cannot be uniquely and permanently dedicated to a particular block. Therefore, each slot includes a tag that identifies which particular block is currently being stored. The tag is usually some number of higher-order bits of the address and refers to all addresses that begin with that sequence of bits. As a simple example, suppose that we have a 6-bit address and a 2-bit tag. The tag 01 refers to the block of locations with the following addresses: 010000, 010001, 010010, 010011, 010100, 010101, 010110, 010111, 011000, 011001, 011010, 011011, 011100, 011101, 011110, 011111.
6

The symbol << means much less than. Similarly, the symbol >> means much greater than.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 31

1.6 / CACHE MEMORY

31

Line number Tag 0 1 2

Block

Memory address 0 1 2 3

Block (K words)

C

1
Block length (K words)

(a) Cache

Block
2n 1 Word length

(b) Main memory
Figure 1.17 Cache/Main-Memory Structure

Figure 1.18 illustrates the read operation. The processor generates the address, RA, of a word to be read. If the word is contained in the cache, it is delivered to the processor. Otherwise, the block containing that word is loaded into the cache and the word is delivered to the processor.

Cache Design
A detailed discussion of cache design is beyond the scope of this book. Key elements are briefly summarized here. We will see that similar design issues must be addressed in dealing with virtual memory and disk cache design. They fall into the following categories: ? ? ? ? ? Cache size Block size Mapping function Replacement algorithm Write policy

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 32

32

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW
START

RA—read address Receive address RA from CPU

Is block containing RA in cache? Yes Fetch RA word and deliver to CPU

No

Access main memory for block containing RA

Allocate cache slot for main memory block

Load main memory block into cache slot

Deliver RA word to CPU

DONE

Figure 1.18 Cache Read Operation

We have already dealt with the issue of cache size. It turns out that reasonably small caches can have a significant impact on performance. Another size issue is that of block size: the unit of data exchanged between cache and main memory. As the block size increases from very small to larger sizes, the hit ratio will at first increase because of the principle of locality: the high probability that data in the vicinity of a referenced word are likely to be referenced in the near future. As the block size increases, more useful data are brought into the cache. The hit ratio will begin to decrease, however, as the block becomes even bigger and the probability of using the newly fetched data becomes less than the probability of reusing the data that have to be moved out of the cache to make room for the new block. When a new block of data is read into the cache, the mapping function determines which cache location the block will occupy. Two constraints affect the design of the mapping function. First, when one block is read in, another may have to be replaced. We would like to do this in such a way as to minimize the probability that we will replace a block that will be needed in the near future. The more flexible the mapping function, the more scope we have to design a replacement algorithm to maximize the hit ratio. Second, the more flexible the mapping function, the more complex is the circuitry required to search the cache to determine if a given block is in the cache.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 33

1.7 / I/O COMMUNICATION TECHNIQUES

33

The replacement algorithm chooses, within the constraints of the mapping function, which block to replace when a new block is to be loaded into the cache and the cache already has all slots filled with other blocks. We would like to replace the block that is least likely to be needed again in the near future. Although it is impossible to identify such a block, a reasonably effective strategy is to replace the block that has been in the cache longest with no reference to it. This policy is referred to as the least-recently-used (LRU) algorithm. Hardware mechanisms are needed to identify the least-recently-used block. If the contents of a block in the cache are altered, then it is necessary to write it back to main memory before replacing it. The write policy dictates when the memory write operation takes place. At one extreme, the writing can occur every time that the block is updated. At the other extreme, the writing occurs only when the block is replaced. The latter policy minimizes memory write operations but leaves main memory in an obsolete state. This can interfere with multiple-processor operation and with direct memory access by I/O hardware modules.

1.7 I/O COMMUNICATION TECHNIQUES
Three techniques are possible for I/O operations: ? Programmed I/O ? Interrupt-driven I/O ? Direct memory access (DMA)

Programmed I/O
When the processor is executing a program and encounters an instruction relating to I/O, it executes that instruction by issuing a command to the appropriate I/O module. In the case of programmed I/O, the I/O module performs the requested action and then sets the appropriate bits in the I/O status register but takes no further action to alert the processor. In particular, it does not interrupt the processor. Thus, after the I/O instruction is invoked, the processor must take some active role in determining when the I/O instruction is completed. For this purpose, the processor periodically checks the status of the I/O module until it finds that the operation is complete. With this technique, the processor is responsible for extracting data from main memory for output and storing data in main memory for input. I/O software is written in such a way that the processor executes instructions that give it direct control of the I/O operation, including sensing device status, sending a read or write command, and transferring the data. Thus, the instruction set includes I/O instructions in the following categories: ? Control: Used to activate an external device and tell it what to do. For example, a magnetic-tape unit may be instructed to rewind or to move forward one record. ? Status: Used to test various status conditions associated with an I/O module and its peripherals. ? Transfer: Used to read and/or write data between processor registers and external devices.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 34

34

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW
Issue read command to I/O module Read status of I/O module Issue read command to I/O module Read status of I/O module CPU I/O Do something else Interrupt I/O CPU Issue read block command to I/O module Read status of DMA module CPU DMA Do something else Interrupt DMA CPU

CPU

I/O

I/O

CPU

Not ready Check status Ready Read word from I/O module I/O CPU Error condition Check status Ready Read word from I/O module I/O CPU Error condition

Next instruction (c) Direct memory access

Write word into memory

CPU

memory

Write word into memory

CPU

memory

No

Done? Yes

No

Done? Yes

Next instruction (a) Programmed I/O

Next instruction (b) Interrupt-driven I/O

Figure 1.19 Three Techniques for Input of a Block of Data

Figure 1.19a gives an example of the use of programmed I/O to read in a block of data from an external device (e. g., a record from tape) into memory. Data are read in one word (e. g., 16 bits) at a time. For each word that is read in, the processor must remain in a status-checking loop until it determines that the word is available in the I/O module’s data register. This flowchart highlights the main disadvantage of this technique: It is a time-consuming process that keeps the processor busy needlessly.

Interrupt-Driven I/O
With programmed I/O, the processor has to wait a long time for the I/O module of concern to be ready for either reception or transmission of more data. The processor, while waiting, must repeatedly interrogate the status of the I/O module. As a result, the performance level of the entire system is severely degraded. An alternative is for the processor to issue an I/O command to a module and then go on to do some other useful work.The I/O module will then interrupt the processor to request service when it is ready to exchange data with the processor.The processor then executes the data transfer, as before, and then resumes its former processing. Let us consider how this works, first from the point of view of the I/O module. For input, the I/O module receives a READ command from the processor. The I/O module then proceeds to read data in from an associated peripheral. Once the data

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 35

1.7 / I/O COMMUNICATION TECHNIQUES

35

are in the module’s data register, the module signals an interrupt to the processor over a control line. The module then waits until its data are requested by the processor. When the request is made, the module places its data on the data bus and is then ready for another I/O operation. From the processor’s point of view, the action for input is as follows. The processor issues a READ command. It then saves the context (e. g., program counter and processor registers) of the current program and goes off and does something else (e. g., the processor may be working on several different programs at the same time). At the end of each instruction cycle, the processor checks for interrupts (Figure 1.7). When the interrupt from the I/O module occurs, the processor saves the context of the program it is currently executing and begins to execute an interrupt-handling program that processes the interrupt. In this case, the processor reads the word of data from the I/O module and stores it in memory. It then restores the context of the program that had issued the I/O command (or some other program) and resumes execution. Figure 1.19b shows the use of interrupt-driven I/O for reading in a block of data. Interrupt-driven I/O is more efficient than programmed I/O because it eliminates needless waiting. However, interrupt-driven I/O still consumes a lot of processor time, because every word of data that goes from memory to I/O module or from I/O module to memory must pass through the processor. Almost invariably, there will be multiple I/O modules in a computer system, so mechanisms are needed to enable the processor to determine which device caused the interrupt and to decide, in the case of multiple interrupts, which one to handle first. In some systems, there are multiple interrupt lines, so that each I/O module signals on a different line. Each line will have a different priority. Alternatively, there can be a single interrupt line, but additional lines are used to hold a device address. Again, different devices are assigned different priorities.

Direct Memory Access
Interrupt-driven I/O, though more efficient than simple programmed I/O, still requires the active intervention of the processor to transfer data between memory and an I/O module, and any data transfer must traverse a path through the processor. Thus both of these forms of I/O suffer from two inherent drawbacks: 1. The I/O transfer rate is limited by the speed with which the processor can test and service a device. 2. The processor is tied up in managing an I/O transfer; a number of instructions must be executed for each I/O transfer. When large volumes of data are to be moved, a more efficient technique is required: direct memory access (DMA). The DMA function can be performed by a separate module on the system bus or it can be incorporated into an I/O module. In either case, the technique works as follows. When the processor wishes to read or write a block of data, it issues a command to the DMA module, by sending to the DMA module the following information: ? Whether a read or write is requested ? The address of the I/O device involved

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 36

36

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW

? The starting location in memory to read data from or write data to ? The number of words to be read or written The processor then continues with other work. It has delegated this I/O operation to the DMA module, and that module will take care of it. The DMA module transfers the entire block of data, one word at a time, directly to or from memory without going through the processor. When the transfer is complete, the DMA module sends an interrupt signal to the processor. Thus the processor is involved only at the beginning and end of the transfer (Figure 1.19c). The DMA module needs to take control of the bus to transfer data to and from memory. Because of this competition for bus usage, there may be times when the processor needs the bus and must wait for the DMA module. Note that this is not an interrupt; the processor does not save a context and do something else. Rather, the processor pauses for one bus cycle (the time it takes to transfer one word across the bus). The overall effect is to cause the processor to execute more slowly during a DMA transfer when processor access to the bus is required. Nevertheless, for a multiple-word I/O transfer, DMA is far more efficient than interruptdriven or programmed I/O.

1.8 RECOMMENDED READING AND WEB SITES
[STAL06] covers the topics of this chapter in detail. In addition, there are many other texts on computer organization and architecture. Among the more worthwhile texts are the following. [PATT07] is a comprehensive survey; [HENN07], by the same authors, is a more advanced text that emphasizes quantitative aspects of design. [DENN05] looks at the history of the development and application of the locality principle, making for fascinating reading.

DENN05 Denning, P. “The Locality Principle” Communications of the ACM, July 2005. HENN07 Hennessy, J., and Patterson, D. Computer Architecture: A Quantitative Approach. San Mateo, CA: Morgan Kaufmann, 2007. PATT07 Patterson, D., and Hennessy, J. Computer Organization and Design: The Hardware/ Software Interface. San Mateo, CA: Morgan Kaufmann, 2007. STAL06 Stallings, W. Computer Organization and Architecture, 7th ed. Upper Saddle River, NJ: Prentice Hall, 2006.

Recommended Web sites:
? WWW Computer Architecture Home Page: A comprehensive index to information relevant to computer architecture researchers, including architecture groups and projects, technical organizations, literature, employment, and commercial information

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 37

1.9 / KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS

37

? CPU Info Center: Information on specific processors, including technical papers, product information, and latest announcements

1.9 KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS
Key Terms
address register cache memory cache slot central processing unit (CPU) condition code data register direct memory access (DMA) hit ratio index register input/output (I/O) instruction instruction cycle instruction register interrupt interrupt-driven I/O I/O module locality main memory multiprogramming processor program counter programmed I/O reentrant procedure register secondary memory segment pointer spatial locality stack stack frame stack pointer system bus temporal locality

Review Questions
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 List and briefly define the four main elements of a computer. Define the two main categories of processor registers. In general terms, what are the four distinct actions that a machine instruction can specify? What is an interrupt? How are multiple interrupts dealt with? What characteristics distinguish the various elements of a memory hierarchy? What is cache memory? List and briefly define three techniques for I/O operations. What is the distinction between spatial locality and temporal locality? In general, what are the strategies for exploiting spatial locality and temporal locality?

Problems
1.1 Suppose the hypothetical processor of Figure 1.3 also has two I/O instructions: 0011 Load AC from I/O 0111 Store AC to I/O In these cases, the 12-bit address identifies a particular external device. Show the program execution (using format of Figure 1.4) for the following program: 1. Load AC from device 5. 2. Add contents of memory location 940. 3. Store AC to device 6. Assume that the next value retrieved from device 5 is 3 and that location 940 contains a value of 2. The program execution of Figure 1.4 is described in the text using six steps. Expand this description to show the use of the MAR and MBR.

1.2

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 38

38

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW 1.3 Consider a hypothetical 32-bit microprocessor having 32-bit instructions composed of two fields. The first byte contains the opcode and the remainder an immediate operand or an operand address. a. What is the maximum directly addressable memory capacity (in bytes)? b. Discuss the impact on the system speed if the microprocessor bus has 1. a 32-bit local address bus and a 16-bit local data bus, or 2. a 16-bit local address bus and a 16-bit local data bus. c. How many bits are needed for the program counter and the instruction register? Consider a hypothetical microprocessor generating a 16-bit address (for example, assume that the program counter and the address registers are 16 bits wide) and having a 16-bit data bus. a. What is the maximum memory address space that the processor can access directly if it is connected to a “16-bit memory”? b. What is the maximum memory address space that the processor can access directly if it is connected to an “8-bit memory”? c. What architectural features will allow this microprocessor to access a separate “I/O space”? d. If an input and an output instruction can specify an 8-bit I/O port number, how many 8-bit I/O ports can the microprocessor support? How many 16-bit I/O ports? Explain. Consider a 32-bit microprocessor, with a 16-bit external data bus, driven by an 8-MHz input clock. Assume that this microprocessor has a bus cycle whose minimum duration equals four input clock cycles. What is the maximum data transfer rate across the bus that this microprocessor can sustain in bytes/s? To increase its performance, would it be better to make its external data bus 32 bits or to double the external clock frequency supplied to the microprocessor? State any other assumptions you make and explain. Hint: Determine the number of bytes that can be transferred per bus cycle. Consider a computer system that contains an I/O module controlling a simple keyboard/ printer Teletype.The following registers are contained in the CPU and connected directly to the system bus: INPR: Input Register, 8 bits OUTR: Output Register, 8 bits FGI: Input Flag, 1 bit FGO: Output Flag, 1 bit IEN: Interrupt Enable, 1 bit Keystroke input from the Teletype and output to the printer are controlled by the I/O module. The Teletype is able to encode an alphanumeric symbol to an 8-bit word and decode an 8-bit word into an alphanumeric symbol. The Input flag is set when an 8-bit word enters the input register from the Teletype. The Output flag is set when a word is printed. a. Describe how the CPU, using the first four registers listed in this problem, can achieve I/O with the Teletype. b. Describe how the function can be performed more efficiently by also employing IEN. In virtually all systems that include DMA modules, DMA access to main memory is given higher priority than processor access to main memory. Why? A DMA module is transferring characters to main memory from an external device transmitting at 9600 bits per second (bps). The processor can fetch instructions at the rate of 1 million instructions per second. By how much will the processor be slowed down due to the DMA activity? A computer consists of a CPU and an I/O device D connected to main memory M via a shared bus with a data bus width of one word. The CPU can execute a maximum of 106 instructions per second. An average instruction requires five processor cycles, three of which use the memory bus. A memory read or write operation uses one processor cycle. Suppose that the CPU is continuously executing “background”

1.4

1.5

1.6

1.7 1.8

1.9

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 39

APPENDIX 1 PERFORMANCE CHARACTERISTICS OF TWO-LEVEL MEMORIES

39

1.10

1.11 1.12

1.13

1.14

programs that require 95% of its instruction execution rate but not any I/O instructions. Assume that one processor cycle equals one bus cycle. Now suppose that very large blocks of data are to be transferred between M and D. a. If programmed I/O is used and each one-word I/O transfer requires the CPU to execute two instructions, estimate the maximum I/O data transfer rate, in words per second, possible through D. b. Estimate the same rate if DMA transfer is used. Consider the following code: for (i 0; i 20; i ) for (j 0; j 10; j ) a[i] a[i] * j a. Give one example of the spatial locality in the code. b. Give one example of the temporal locality in the code. Generalize Equations (1.1) and (1.2) in Appendix 1 A to n-level memory hierarchies. Consider a memory system with the following parameters: Tc 100 ns Cc 0.01 cents/bit Tm 1200 ns Cm 0.001 cents/bit a. What is the cost of 1 MByte of main memory? b. What is the cost of 1 MByte of main memory using cache memory technology? c. If the effective access time is 10% greater than the cache access time, what is the hit ratio H? A computer has a cache, main memory, and a disk used for virtual memory. If a referenced word is in the cache, 20 ns are required to access it. If it is in main memory but not in the cache, 60 ns are needed to load it into the cache (this includes the time to originally check the cache), and then the reference is started again. If the word is not in main memory, 12 ms are required to fetch the word from disk, followed by 60 ns to copy it to the cache, and then the reference is started again. The cache hit ratio is 0.9 and the main-memory hit ratio is 0.6. What is the average time in ns required to access a referenced word on this system? Suppose a stack is to be used by the processor to manage procedure calls and returns. Can the program counter be eliminated by using the top of the stack as a program counter?

APPENDIX 1A PERFORMANCE CHARACTERISTICS OF TWO-LEVEL MEMORIES
In this chapter, reference is made to a cache that acts as a buffer between main memory and processor, creating a two-level internal memory. This two-level architecture exploits a property known as locality to provide improved performance over a comparable one-level memory. The main memory cache mechanism is part of the computer architecture, implemented in hardware and typically invisible to the OS.Accordingly, this mechanism is not pursued in this book. However, there are two other instances of a two-level memory approach that also exploit the property of locality and that are, at least partially, implemented in the OS: virtual memory and the disk cache (Table 1.2). These two topics are explored in Chapters 8 and 11, respectively. In this appendix, we look at some of the performance characteristics of two-level memories that are common to all three approaches.

M01_STAL6329_06_SE_C01.QXD

2/27/08

7:40 AM

Page 40

40

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW

Table 1.2 Characteristics of Two-Level Memories Main Memory Cache
Typical access time ratios Memory management system Typical block size Access of processor to second level 5:1 Implemented by special hardware 4 to 128 bytes Direct access
6

Virtual Memory (Paging)
10 : 1 Combination of hardware and system software 64 to 4096 bytes Indirect access

Disk Cache
106 : 1 System software 64 to 4096 bytes Indirect access

Locality
The basis for the performance advantage of a two-level memory is the principle of locality, referred to in Section 1.5. This principle states that memory references tend to cluster. Over a long period of time, the clusters in use change, but over a short period of time, the processor is primarily working with fixed clusters of memory references. Intuitively, the principle of locality makes sense. Consider the following line of reasoning: 1. Except for branch and call instructions, which constitute only a small fraction of all program instructions, program execution is sequential. Hence, in most cases, the next instruction to be fetched immediately follows the last instruction fetched. 2. It is rare to have a long uninterrupted sequence of procedure calls followed by the corresponding sequence of returns. Rather, a program remains confined to a rather narrow window of procedure-invocation depth. Thus, over a short period of time references to instructions tend to be localized to a few procedures. 3. Most iterative constructs consist of a relatively small number of instructions repeated many times. For the duration of the iteration, computation is therefore confined to a small contiguous portion of a program. 4. In many programs, much of the computation involves processing data structures, such as arrays or sequences of records. In many cases, successive references to these data structures will be to closely located data items. This line of reasoning has been confirmed in many studies. With reference to point (1), a variety of studies have analyzed the behavior of high-level language programs. Table 1.3 includes key results, measuring the appearance of various statement types during execution, from the following studies. The earliest study of programming language behavior, performed by Knuth [KNUT71], examined a collection of FORTRAN programs used as student exercises. Tanenbaum [TANE78] published measurements collected from over 300 procedures used in OS programs and written in a language that supports structured programming (SAL). Patterson and Sequin [PATT82] analyzed a set of measurements taken from compilers and programs for typesetting, computer-aided design (CAD), sorting, and file comparison. The programming languages C and Pascal were studied. Huck [HUCK83] analyzed four programs intended to represent a mix of general-purpose scientific computing, including

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 41

APPENDIX 1 PERFORMANCE CHARACTERISTICS OF TWO-LEVEL MEMORIES Table 1.3 Relative Dynamic Frequency of High-Level Language Operations [HUCK83] Pascal Scientific
74 4 1 20 2 —

41

Study Language Workload
Assign Loop Call IF GOTO Other

[KNUT71] FORTRAN Student
67 3 3 11 9 7

[PATT82] Pascal C System System
45 5 15 29 — 6 38 3 12 43 3 1

[TANE78] SAL System
42 4 12 36 — 6

fast Fourier transform and the integration of systems of differential equations. There is good agreement in the results of this mixture of languages and applications that branching and call instructions represent only a fraction of statements executed during the lifetime of a program. Thus, these studies confirm assertion (1), from the preceding list. With respect to assertion (2), studies reported in [PATT85] provide confirmation. This is illustrated in Figure 1.20, which shows call-return behavior. Each call is represented by the line moving down and to the right, and each return by the line moving up and to the right. In the figure, a window with depth equal to 5 is defined. Only a sequence of calls and returns with a net movement of 6 in either direction causes the window to move. As can be seen, the executing program can remain within a stationary window for long periods of time. A study by the same analysts of C and Pascal programs showed that a window of depth 8 would only need to shift on less than 1% of the calls or returns [TAMI83]. The principle of locality of reference continues to be validated in more recent studies. For example, Figure 1.21 illustrates the results of a study of Web page access patterns at a single site [BAEN97].
Time (in units of calls/returns)

t

33

Return

Call

w

5

Nesting depth

Figure 1.20 Example Call-Return Behavior of a Program

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 42

42

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW
3000 2500 Number of references 2000 1500 1000 500 0 50 100 150 200 250 300 350 400

Cumulative number of documents

Figure 1.21 Locality of Reference for Web Pages

A distinction is made in the literature between spatial locality and temporal locality. Spatial locality refers to the tendency of execution to involve a number of memory locations that are clustered. This reflects the tendency of a processor to access instructions sequentially. Spatial location also reflects the tendency of a program to access data locations sequentially, such as when processing a table of data. Temporal locality refers to the tendency for a processor to access memory locations that have been used recently. For example, when an iteration loop is executed, the processor executes the same set of instructions repeatedly. Traditionally, temporal locality is exploited by keeping recently used instruction and data values in cache memory and by exploiting a cache hierarchy. Spatial locality is generally exploited by using larger cache blocks and by incorporating prefetching mechanisms (fetching items whose use is expected) into the cache control logic. Recently, there has been considerable research on refining these techniques to achieve greater performance, but the basic strategies remain the same.

Operation of Two-Level Memory
The locality property can be exploited in the formation of a two-level memory. The upper level memory (M1) is smaller, faster, and more expensive (per bit) than the lower level memory (M2). M1 is used as a temporary store for part of the contents of the larger M2. When a memory reference is made, an attempt is made to access the item in M1. If this succeeds, then a quick access is made. If not, then a block of memory locations is copied from M2 to M1 and the access then takes place via M1. Because of locality, once a block is brought into M1, there should be a number of accesses to locations in that block, resulting in fast overall service. To express the average time to access an item, we must consider not only the speeds of the two levels of memory but also the probability that a given reference can be found in M1. We have
Ts H T1 T1 (1 (1 H) H) T2 (T1 T2) (1.1)

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 43

APPENDIX 1 PERFORMANCE CHARACTERISTICS OF TWO-LEVEL MEMORIES

43

where Ts T1 T2 H

average (system) access time access time of M1 (e. g., cache, disk cache) access time of M2 (e. g., main memory, disk) hit ratio (fraction of time reference is found in M1)

Figure 1.15 shows average access time as a function of hit ratio. As can be seen, for a high percentage of hits, the average total access time is much closer to that of M1 than M2.

Performance
Let us look at some of the parameters relevant to an assessment of a two-level memory mechanism. First consider cost. We have C1S1 + C2S2 Cs = (1.2) S1 + S2 where Cs C1 C2 S1 S2 average cost per bit for the combined two-level memory average cost per bit of upper-level memory M1 average cost per bit of lower-level memory M2 size of M1 size of M2

We would like Cs L C2. Given that C1 >> C2, this requires S1 << S2. Figure 1.22 shows the relationship. 7 Next, consider access time. For a two-level memory to provide a significant performance improvement, we need to have Ts approximately equal to T1 (Ts L T1). Given that T1 is much less than T2 (T1 << T2), a hit ratio of close to 1 is needed. So we would like M1 to be small to hold down cost, and large to improve the hit ratio and therefore the performance. Is there a size of M1 that satisfies both requirements to a reasonable extent? We can answer this question with a series of subquestions: ? What value of hit ratio is needed to satisfy the performance requirement? ? What size of M1 will assure the needed hit ratio? ? Does this size satisfy the cost requirement? To get at this, consider the quantity T1/Ts, which is referred to as the access efficiency. It is a measure of how close average access time (Ts) is to M1 access time (T1). From Equation (1.1), T1 = Ts 1 1 + (1 - H) T2 T1 (1.3)

7

Note that both axes use a log scale. A basic review of log scales is in the math refresher document at the Computer Science Student Resource Site at WilliamStallings. com/StudentSupport.html.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 44

44

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW
1000
8 7 6 5 4 3 2

(C1/C2)

1000

Relative combined cost (CS /C2)

100
8 7 6 5 4 3 2

(C1/C2)

100

10
8 7 6 5 4 3 2

(C1/C2)

10

1
5 6 7 8 9

10

2

3

100 Relative size of two levels (S2/S1)

4

5

6

7 8 9

2

3

4

5

6

7 8

1000

Figure 1.22 Relationship of Average Memory Cost to Relative Memory Size for a Two-Level Memory

In Figure 1.23, we plot T1/Ts as a function of the hit ratio H, with the quantity T2/T1 as a parameter. A hit ratio in the range of 0.8 to 0.9 would seem to be needed to satisfy the performance requirement. We can now phrase the question about relative memory size more exactly. Is a hit ratio of 0.8 or better reasonable for S1 << S2? This will depend on a number of factors, including the nature of the software being executed and the details of the design of the two-level memory. The main determinant is, of course, the degree of locality. Figure 1.24 suggests the effect of locality on the hit ratio. Clearly, if M1 is the same size as M2, then the hit ratio will be 1.0: All of the items in M2 are always stored also in M1. Now suppose that there is no locality; that is, references are completely random. In that case the hit ratio should be a strictly linear function of the relative memory size. For example, if M1 is half the size of M2, then at any time half of the items from M2 are also in M1 and the hit ratio will be 0.5. In practice, however, there is some degree of locality in the references. The effects of moderate and strong locality are indicated in the figure. So if there is strong locality, it is possible to achieve high values of hit ratio even with relatively small upper-level memory size. For example, numerous studies have shown that rather small cache sizes will yield a hit ratio above 0.75 regardless of the size of main memory (e. g., [AGAR89], [PRZY88], [STRE83], and [SMIT82]). A cache in the range of 1 K to 128 K words is generally adequate, whereas main memory is now typically in the gigabyte range. When we consider virtual memory and disk cache, we will cite other studies that confirm the same phenomenon, namely that a relatively small M1 yields a high value of hit ratio because of locality.

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 45

APPENDIX 1 PERFORMANCE CHARACTERISTICS OF TWO-LEVEL MEMORIES
1

45

r

1

T1/Ts

0.1

r

10

Access efficiency

0.01

r

100

0.001 0.0 0.2 0.4

r

1000 0.6 0.8 1.0

Hit ratio

H

Figure 1.23 Access Efficiency as a Function of Hit Ratio (r
1.0

T2 /T1)

0.8

Strong locality

0.6 Hit ratio

Moderate locality

0.4

No locality

0.2

0.0 0.0 0.2 0.4 0.6 Relative memory size (S1/S2) 0.8 1.0

Figure 1.24 Hit Ratio as a Function of Relative Memory Size

M01_STAL6329_06_SE_C01.QXD

2/28/08

3:44 AM

Page 46

46

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW

This brings us to the last question listed earlier: Does the relative size of the two memories satisfy the cost requirement? The answer is clearly yes. If we need only a relatively small upper-level memory to achieve good performance, then the average cost per bit of the two levels of memory will approach that of the cheaper lower-level memory.

APPENDIX 1B PROCEDURE CONTROL
A common technique for controlling the execution of procedure calls and returns makes use of a stack. This appendix summarizes the basic properties of stacks and looks at their use in procedure control.

Stack Implementation
A stack is an ordered set of elements, only one of which (the most recently added) can be accessed at a time. The point of access is called the top of the stack. The number of elements in the stack, or length of the stack, is variable. Items may only be added to or deleted from the top of the stack. For this reason, a stack is also known as a pushdown list or a last-in-first-out (LIFO) list. The implementation of a stack requires that there be some set of locations used to store the stack elements. A typical approach is illustrated in Figure 1.25. A contiguous block of locations is reserved in main memory (or virtual memory) for the stack. Most of the time, the block is partially filled with stack elements and the
CPU registers Stack limit Stack pointer Main memory Top stack element Second stack element Free Block reserved for stack In use Stack limit Stack pointer Free Block reserved for stack In use CPU registers Main memory

Stack base

Stack base

(a) All of stack in memory

(b) Two top elements in registers

Figure 1.25 Typical Stack Organization

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 47

APPENDIX 1B PROCEDURE CONTROL

47

remainder is available for stack growth. Three addresses are needed for proper operation, and these are often stored in processor registers: ? Stack pointer: Contains the address of the current top of the stack. If an item is appended to (PUSH) or deleted from (POP) the stack, the pointer is decremented or incremented to contain the address of the new top of the stack. ? Stack base: Contains the address of the bottom location in the reserved block. This is the first location to be used when an item is added to an empty stack. If an attempt is made to POP an element when the stack is empty, an error is reported. ? Stack limit: Contains the address of the other end, or top, of the reserved block. If an attempt is made to PUSH an element when the stack is full, an error is reported. Traditionally, and on most processors today, the base of the stack is at the highaddress end of the reserved stack block, and the limit is at the low-address end. Thus, the stack grows from higher addresses to lower addresses.

Procedure Calls and Returns
A common technique for managing procedure calls and returns makes use of a stack. When the processor executes a call, it places (pushes) the return address on the stack. When it executes a return, it uses the address on top of the stack and removes (pops) that address from the stack. For the nested procedures of Figure 1.26, Figure 1.27 illustrates the use of a stack.
Addresses 4000 4100 4101 Main memory

CALL Proc1

Main program

4500 4600 4601 4650 4651 CALL Proc2 CALL Proc2 RETURN Procedure Proc1

4800 Procedure Proc2 RETURN (a) Calls and returns (b) Execution sequence

Figure 1.26 Nested Procedures

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 48

48

CHAPTER 1 / COMPUTER SYSTEM OVERVIEW

4601 4101 (a) Initial stack contents (b) After CALL Proc1 4101 (c) Initial CALL Proc2 4101 (d) After RETURN

4651 4101 (e) After CALL Proc2 4101 (f) After RETURN (g) After RETURN

Figure 1.27 Use of Stack to Implement Nested Procedures of figure 1.26

It is also often necessary to pass parameters with a procedure call. These could be passed in registers. Another possibility is to store the parameters in memory just after the Call instruction. In this case, the return must be to the location following the parameters. Both of these approaches have drawbacks. If registers are used, the called program and the calling program must be written to assure that the registers are used properly. The storing of parameters in memory makes it difficult to exchange a variable number of parameters. A more flexible approach to parameter passing is the stack. When the processor executes a call, it not only stacks the return address, it stacks parameters to be passed to the called procedure. The called procedure can access the parameters from the stack. Upon return, return parameters can also be placed on the stack, under the return address. The entire set of parameters, including return address, that is stored for a procedure invocation is referred to as a stack frame. An example is provided in Figure 1.28. The example refers to procedure P in which the local variables x1 and x2 are declared, and procedure Q, which can be

y2 y1 Return address Q: x2 x1 Return address P: Previous frame pointer (a) P is active Current frame pointer P: Top of stack pointer Previous frame pointer x2 x1 Return address Previous frame pointer (b) P has called Q

Top of stack pointer

Current frame pointer

Figure 1.28 Stack Frame Growth Using Sample Procedures P and Q

M01_STAL6329_06_SE_C01.QXD

2/13/08

1:48 PM

Page 49

APPENDIX 1B PROCEDURE CONTROL

49

called by P and in which the local variables y1 and y2 are declared. The first item stored in each stack frame is a pointer to the beginning of the previous frame. This is needed if the number or length of parameters to be stacked is variable. Next is stored the return point for the procedure that corresponds to this stack frame. Finally, space is allocated at the top of the stack frame for local variables. These local variables can be used for parameter passing. For example, suppose that when P calls Q, it passes one parameter value. This value could be stored in variable y1. Thus, in a high-level language, there would be an instruction in the P routine that looks like this: CALL Q(y1) When this call is executed, a new stack frame is created for Q (Figure 1.28b), which includes a pointer to the stack frame for P, the return address to P, and two local variables for Q, one of which is initialized to the passed parameter value from P. The other local variable, y2, is simply a local variable used by Q in its calculations. The need to include such local variables in the stack frame is discussed in the next subsection.

Reentrant Procedures
A useful concept, particularly in a system that supports multiple users at the same time, is that of the reentrant procedure. A reentrant procedure is one in which a single copy of the program code can be shared by multiple users during the same period of time. Reentrancy has two key aspects: The program code cannot modify itself and the local data for each user must be stored separately. A reentrant procedure can be interrupted and called by an interrupting program and still execute correctly upon return to the procedure. In a shared system, reentrancy allows more efficient use of main memory: One copy of the program code is kept in main memory, but more than one application can call the procedure. Thus, a reentrant procedure must have a permanent part (the instructions that make up the procedure) and a temporary part (a pointer back to the calling program as well as memory for local variables used by the program). Each execution instance, called activation, of a procedure will execute the code in the permanent part but must have its own copy of local variables and parameters. The temporary part associated with a particular activation is referred to as an activation record. The most convenient way to support reentrant procedures is by means of a stack. When a reentrant procedure is called, the activation record of the procedure can be stored on the stack. Thus, the activation record becomes part of the stack frame that is created on procedure call.

M02_STAL6329_06_SE_C02.QXD

2/28/08

3:33 AM

Page 50

CHAPTER

OPERATING SYSTEM OVERVIEW
2.1 Operating System Objectives and Functions The Operating System as a User/Computer Interface The Operating System as Resource Manager Ease of Evolution of an Operating System The Evolution of Operating Systems Serial Processing Simple Batch Systems Multiprogrammed Batch Systems Time-Sharing Systems Major Achievements The Process Memory Management Information Protection and Security Scheduling and Resource Management System Structure Developments Leading to Modern Operating Systems Microsoft Windows Overview History Single-User Multitasking Architecture Client/Server Model Threads and SMP Windows Objects Traditional UNIX Systems History Description Modern UNIX Systems System V Release 4 (SVR4) BSD Solaris 10 Linux History Modular Structure Kernel Components 2.9 Recommended Reading and Web Sites 2.10 Key Terms, Review Questions, and Problems 2.2

2.3

2.4 2.5

2.6

2.7

2.8

50

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 51

Edited by Foxit Reader Copyright(C) by Foxit Software Company,2005-2008 For Evaluation Only.
2.1 / OPERATING SYSTEM OBJECTIVES AND FUNCTIONS

51

We begin our study of operating systems (OSs) with a brief history. This history is itself interesting and also serves the purpose of providing an overview of OS principles. The first section examines the objectives and functions of operating systems. Then we look at how operating systems have evolved from primitive batch systems to sophisticated multitasking, multiuser systems. The remainder of the chapter looks at the history and general characteristics of the two operating systems that serve as examples throughout this book. All of the material in this chapter is covered in greater depth later in the book.

2.1 OPERATING SYSTEM OBJECTIVES AND FUNCTIONS
An OS is a program that controls the execution of application programs and acts as an interface between applications and the computer hardware. It can be thought of as having three objectives: ? Convenience: An OS makes a computer more convenient to use. ? Efficiency: An OS allows the computer system resources to be used in an efficient manner. ? Ability to evolve: An OS should be constructed in such a way as to permit the effective development, testing, and introduction of new system functions without interfering with service. Let us examine these three aspects of an OS in turn.

The Operating System as a User/Computer Interface
The hardware and software used in providing applications to a user can be viewed in a layered or hierarchical fashion, as depicted in Figure 2.1. The user of those applications, the end user, generally is not concerned with the details of computer hardware. Thus, the end user views a computer system in terms of a set of applications. An application can be expressed in a programming language and is developed by an application programmer. If one were to develop an application program as a set of machine instructions that is completely responsible for controlling the computer hardware, one would be faced with an overwhelmingly complex undertaking. To ease this chore, a set of system programs is provided. Some of these programs are referred to as utilities. These implement frequently used functions that assist in program creation, the management of files, and the control of I/O devices. A programmer will make use of these facilities in developing an application, and the application, while it is running, will invoke the utilities to perform certain functions. The most important collection of system programs comprises the OS. The OS masks the details of the hardware from the programmer and provides the programmer with a convenient interface for using the system. It acts as mediator, making it easier for the programmer and for application programs to access and use those facilities and services. Briefly, the OS typically provides services in the following areas: ? Program development: The OS provides a variety of facilities and services, such as editors and debuggers, to assist the programmer in creating programs. Typically, these services are in the form of utility programs that, while not

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 52

52

CHAPTER 2 / OPERATING SYSTEM OVERVIEW
End user Programmer

Application programs

Operating system designer

Utilities

Operating system

Computer hardware

Figure 2.1 Layers and Views of a Computer System

?

?

?

?

?

strictly part of the core of the OS, are supplied with the OS and are referred to as application program development tools. Program execution: A number of steps need to be performed to execute a program. Instructions and data must be loaded into main memory, I/O devices and files must be initialized, and other resources must be prepared. The OS handles these scheduling duties for the user. Access to I/O devices: Each I/O device requires its own peculiar set of instructions or control signals for operation. The OS provides a uniform interface that hides these details so that programmers can access such devices using simple reads and writes. Controlled access to files: For file access, the OS must reflect a detailed understanding of not only the nature of the I/O device (disk drive, tape drive) but also the structure of the data contained in the files on the storage medium. In the case of a system with multiple users, the OS may provide protection mechanisms to control access to the files. System access: For shared or public systems, the OS controls access to the system as a whole and to specific system resources. The access function must provide protection of resources and data from unauthorized users and must resolve conflicts for resource contention. Error detection and response: A variety of errors can occur while a computer system is running. These include internal and external hardware errors, such as a memory error, or a device failure or malfunction; and various software errors, such as division by zero, attempt to access forbidden memory location,

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 53

2.1 / OPERATING SYSTEM OBJECTIVES AND FUNCTIONS

53

and inability of the OS to grant the request of an application. In each case, the OS must provide a response that clears the error condition with the least impact on running applications. The response may range from ending the program that caused the error, to retrying the operation, to simply reporting the error to the application. ? Accounting: A good OS will collect usage statistics for various resources and monitor performance parameters such as response time. On any system, this information is useful in anticipating the need for future enhancements and in tuning the system to improve performance. On a multiuser system, the information can be used for billing purposes.

The Operating System as Resource Manager
A computer is a set of resources for the movement, storage, and processing of data and for the control of these functions. The OS is responsible for managing these resources. Can we say that it is the OS that controls the movement, storage, and processing of data? From one point of view, the answer is yes: By managing the computer’s resources, the OS is in control of the computer’s basic functions. But this control is exercised in a curious way. Normally, we think of a control mechanism as something external to that which is controlled, or at least as something that is a distinct and separate part of that which is controlled. (For example, a residential heating system is controlled by a thermostat, which is separate from the heat-generation and heatdistribution apparatus.) This is not the case with the OS, which as a control mechanism is unusual in two respects: ? The OS functions in the same way as ordinary computer software; that is, it is a program or suite of programs executed by the processor. ? The OS frequently relinquishes control and must depend on the processor to allow it to regain control. Like other computer programs, the OS provides instructions for the processor. The key difference is in the intent of the program. The OS directs the processor in the use of the other system resources and in the timing of its execution of other programs. But in order for the processor to do any of these things, it must cease executing the OS program and execute other programs. Thus, the OS relinquishes control for the processor to do some “useful” work and then resumes control long enough to prepare the processor to do the next piece of work. The mechanisms involved in all this should become clear as the chapter proceeds. Figure 2.2 suggests the main resources that are managed by the OS. A portion of the OS is in main memory. This includes the kernel, or nucleus, which contains the most frequently used functions in the OS and, at a given time, other portions of the OS currently in use. The remainder of main memory contains user programs and data. The allocation of this resource (main memory) is controlled jointly by the OS and memory management hardware in the processor, as we shall see. The OS decides when an I/O device can be used by a program in execution and controls access to and use of files. The processor itself is a resource, and the OS must determine how much processor time is to be devoted to the execution of a particular user program. In the case of a multiple-processor system, this decision must span all of the processors.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 54

54

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

Computer system Memory
Operating system software I/O controller I/O controller

I/O devices
Printers, keyboards, digital camera, etc.

Programs and data

I/O controller

Processor

Processor Storage OS Programs Data

Figure 2.2 The Operating System as Resource Manager

Ease of Evolution of an Operating System
A major operating system will evolve over time for a number of reasons: ? Hardware upgrades plus new types of hardware: For example, early versions of UNIX and the Macintosh operating system did not employ a paging mechanism because they were run on processors without paging hardware.1 Subsequent versions of these operating systems were modified to exploit paging capabilities. Also, the use of graphics terminals and page-mode terminals instead of line-at-a-time scroll mode terminals affects OS design. For example, a graphics terminal typically allows the user to view several applications at the same time through “windows” on the screen. This requires more sophisticated support in the OS. ? New services: In response to user demand or in response to the needs of system managers, the OS expands to offer new services. For example, if it is found to be difficult to maintain good performance for users with existing tools, new measurement and control tools may be added to the OS. ? Fixes: Any OS has faults. These are discovered over the course of time and fixes are made. Of course, the fix may introduce new faults.
1

Paging is introduced briefly later in this chapter and is discussed in detail in Chapter 7.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 55

2.2 / THE EVOLUTION OF OPERATING SYSTEMS

55

The need to change an OS regularly places certain requirements on its design. An obvious statement is that the system should be modular in construction, with clearly defined interfaces between the modules, and that it should be well documented. For large programs, such as the typical contemporary OS, what might be referred to as straightforward modularization is inadequate [DENN80a]. That is, much more must be done than simply partitioning a program into modules. We return to this topic later in this chapter.

2.2 THE EVOLUTION OF OPERATING SYSTEMS
In attempting to understand the key requirements for an OS and the significance of the major features of a contemporary OS, it is useful to consider how operating systems have evolved over the years.

Serial Processing
With the earliest computers, from the late 1940s to the mid-1950s, the programmer interacted directly with the computer hardware; there was no OS.These computers were run from a console consisting of display lights, toggle switches, some form of input device, and a printer. Programs in machine code were loaded via the input device (e.g., a card reader). If an error halted the program, the error condition was indicated by the lights. If the program proceeded to a normal completion, the output appeared on the printer. These early systems presented two main problems: ? Scheduling: Most installations used a hardcopy sign-up sheet to reserve computer time. Typically, a user could sign up for a block of time in multiples of a half hour or so. A user might sign up for an hour and finish in 45 minutes; this would result in wasted computer processing time. On the other hand, the user might run into problems, not finish in the allotted time, and be forced to stop before resolving the problem. ? Setup time: A single program, called a job, could involve loading the compiler plus the high-level language program (source program) into memory, saving the compiled program (object program) and then loading and linking together the object program and common functions. Each of these steps could involve mounting or dismounting tapes or setting up card decks. If an error occurred, the hapless user typically had to go back to the beginning of the setup sequence. Thus, a considerable amount of time was spent just in setting up the program to run. This mode of operation could be termed serial processing, reflecting the fact that users have access to the computer in series. Over time, various system software tools were developed to attempt to make serial processing more efficient. These include libraries of common functions, linkers, loaders, debuggers, and I/O driver routines that were available as common software for all users.

Simple Batch Systems
Early computers were very expensive, and therefore it was important to maximize processor utilization. The wasted time due to scheduling and setup time was unacceptable.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 56

56

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

To improve utilization, the concept of a batch operating system was developed. It appears that the first batch operating system (and the first OS of any kind) was developed in the mid-1950s by General Motors for use on an IBM 701 [WEIZ81]. The concept was subsequently refined and implemented on the IBM 704 by a number of IBM customers. By the early 1960s, a number of vendors had developed batch operating systems for their computer systems. IBSYS, the IBM operating system for the 7090/7094 computers, is particularly notable because of its widespread influence on other systems. The central idea behind the simple batch-processing scheme is the use of a piece of software known as the monitor. With this type of OS, the user no longer has direct access to the processor. Instead, the user submits the job on cards or tape to a computer operator, who batches the jobs together sequentially and places the entire batch on an input device, for use by the monitor. Each program is constructed to branch back to the monitor when it completes processing, at which point the monitor automatically begins loading the next program. To understand how this scheme works, let us look at it from two points of view: that of the monitor and that of the processor. ? Monitor point of view: The monitor controls the sequence of events. For this to be so, much of the monitor must always be in main memory and available for execution (Figure 2.3). That portion is referred to as the resident monitor. The rest of the monitor consists of utilities and common functions that are loaded as subroutines to the user program at the beginning of any job that requires them. The monitor reads in jobs one at a time from the input device (typically a card reader or magnetic tape drive). As it is read in, the current job is placed in the user program area, and control is passed to this job. When the job is completed, it returns control to the monitor, which immediately reads in
Interrupt processing Device drivers Job sequencing Control language interpreter

Monitor

Boundary

User program area

Figure 2.3 Memory Layout for a Resident Monitor

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 57

2.2 / THE EVOLUTION OF OPERATING SYSTEMS

57

the next job. The results of each job are sent to an output device, such as a printer, for delivery to the user. ? Processor point of view: At a certain point, the processor is executing instructions from the portion of main memory containing the monitor. These instructions cause the next job to be read into another portion of main memory. Once a job has been read in, the processor will encounter a branch instruction in the monitor that instructs the processor to continue execution at the start of the user program. The processor will then execute the instructions in the user program until it encounters an ending or error condition. Either event causes the processor to fetch its next instruction from the monitor program. Thus the phrase “control is passed to a job” simply means that the processor is now fetching and executing instructions in a user program, and “control is returned to the monitor” means that the processor is now fetching and executing instructions from the monitor program. The monitor performs a scheduling function: A batch of jobs is queued up, and jobs are executed as rapidly as possible, with no intervening idle time. The monitor improves job setup time as well. With each job, instructions are included in a primitive form of job control language (JCL). This is a special type of programming language used to provide instructions to the monitor. A simple example is that of a user submitting a program written in the programming language FORTRAN plus some data to be used by the program. All FORTRAN instructions and data are on a separate punched card or a separate record on tape. In addition to FORTRAN and data lines, the job includes job control instructions, which are denoted by the beginning $. The overall format of the job looks like this: $JOB $FTN ? ? s ? $LOAD $RUN ? s ? ? $END

FORTRAN instructions

Data

To execute this job, the monitor reads the $FTN line and loads the appropriate language compiler from its mass storage (usually tape). The compiler translates the user’s program into object code, which is stored in memory or mass storage. If it is stored in memory, the operation is referred to as “compile, load, and go.” If it is stored on tape, then the $LOAD instruction is required. This instruction is read by the monitor, which regains control after the compile operation. The monitor invokes the loader, which loads the object program into memory (in place of the compiler)

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 58

58

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

and transfers control to it. In this manner, a large segment of main memory can be shared among different subsystems, although only one such subsystem could be executing at a time. During the execution of the user program, any input instruction causes one line of data to be read. The input instruction in the user program causes an input routine that is part of the OS to be invoked. The input routine checks to make sure that the program does not accidentally read in a JCL line. If this happens, an error occurs and control transfers to the monitor. At the completion of the user job, the monitor will scan the input lines until it encounters the next JCL instruction. Thus, the system is protected against a program with too many or too few data lines. The monitor, or batch operating system, is simply a computer program. It relies on the ability of the processor to fetch instructions from various portions of main memory to alternately seize and relinquish control. Certain other hardware features are also desirable: ? Memory protection: While the user program is executing, it must not alter the memory area containing the monitor. If such an attempt is made, the processor hardware should detect an error and transfer control to the monitor.The monitor would then abort the job, print out an error message, and load in the next job. ? Timer: A timer is used to prevent a single job from monopolizing the system. The timer is set at the beginning of each job. If the timer expires, the user program is stopped, and control returns to the monitor. ? Privileged instructions: Certain machine level instructions are designated privileged and can be executed only by the monitor. If the processor encounters such an instruction while executing a user program, an error occurs causing control to be transferred to the monitor. Among the privileged instructions are I/O instructions, so that the monitor retains control of all I/O devices. This prevents, for example, a user program from accidentally reading job control instructions from the next job. If a user program wishes to perform I/O, it must request that the monitor perform the operation for it. ? Interrupts: Early computer models did not have this capability. This feature gives the OS more flexibility in relinquishing control to and regaining control from user programs. Considerations of memory protection and privileged instructions lead to the concept of modes of operation. A user program executes in a user mode, in which certain areas of memory are protected from the user’s use and in which certain instructions may not be executed. The monitor executes in a system mode, or what has come to be called kernel mode, in which privileged instructions may be executed and in which protected areas of memory may be accessed. Of course, an OS can be built without these features. But computer vendors quickly learned that the results were chaos, and so even relatively primitive batch operating systems were provided with these hardware features. With a batch operating system, processor time alternates between execution of user programs and execution of the monitor. There have been two sacrifices: Some main memory is now given over to the monitor and some processor time is consumed by the monitor. Both of these are forms of overhead. Despite this overhead, the simple batch system improves utilization of the computer.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 59

2.2 / THE EVOLUTION OF OPERATING SYSTEMS

59

Read one record from file Execute 100 instructions Write one record to file Total Percent CPU Utilization =

15 ms 1 ms 15 ms 31 ms 1 = 0.032 = 3.2% 31

Figure 2.4 System Utilization Example

Multiprogrammed Batch Systems
Even with the automatic job sequencing provided by a simple batch operating system, the processor is often idle. The problem is that I/O devices are slow compared to the processor. Figure 2.4 details a representative calculation. The calculation concerns a program that processes a file of records and performs, on average, 100 machine instructions per record. In this example the computer spends over 96% of its time waiting for I/O devices to finish transferring data to and from the file. Figure 2.5a illustrates this situation, where we have a single program, referred to
Program A Run Time (a) Uniprogramming Wait Run Wait

Program A

Run

Wait

Run

Wait

Program B

Wait Run Run Run A B Time

Wait

Run Run Run A B

Wait

Combined

Wait

Wait

(b) Multiprogramming with two programs

Program A

Run

Wait

Run

Wait

Program B

Wait Run

Wait

Run

Wait

Program C

Wait

Run

Wait

Run Run Run Run A B C

Wait

Combined

Run Run Run A B C

Wait

Wait

Time (c) Multiprogramming with three programs

Figure 2.5 Multiprogramming Example

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 60

60

CHAPTER 2 / OPERATING SYSTEM OVERVIEW Table 2.1 Sample Program Execution Attributes JOB1
Type of job Duration Memory required Need disk? Need terminal? Need printer? Heavy compute 5 min 50 M No No No

JOB2
Heavy I/O 15 min 100 M No Yes No

JOB3
Heavy I/O 10 min 75 M Yes No Yes

as uniprogramming. The processor spends a certain amount of time executing, until it reaches an I/O instruction. It must then wait until that I/O instruction concludes before proceeding. This inefficiency is not necessary. We know that there must be enough memory to hold the OS (resident monitor) and one user program. Suppose that there is room for the OS and two user programs. When one job needs to wait for I/O, the processor can switch to the other job, which is likely not waiting for I/O (Figure 2.5b). Furthermore, we might expand memory to hold three, four, or more programs and switch among all of them (Figure 2.5c). The approach is known as multiprogramming, or multitasking. It is the central theme of modern operating systems. To illustrate the benefit of multiprogramming, we give a simple example. Consider a computer with 250 Mbytes of available memory (not used by the OS), a disk, a terminal, and a printer. Three programs, JOB1, JOB2, and JOB3, are submitted for execution at the same time, with the attributes listed in Table 2.1. We assume minimal processor requirements for JOB2 and JOB3 and continuous disk and printer use by JOB3. For a simple batch environment, these jobs will be executed in sequence. Thus, JOB1 completes in 5 minutes. JOB2 must wait until the 5 minutes are over and then completes 15 minutes after that. JOB3 begins after 20 minutes and completes at 30 minutes from the time it was initially submitted. The average resource utilization, throughput, and response times are shown in the uniprogramming column of Table 2.2. Device-by-device utilization is illustrated in Figure 2.6a. It is evident that there is gross underutilization for all resources when averaged over the required 30-minute time period.

Table 2.2 Effects of Multiprogramming on Resource Utilization Uniprogramming
Processor use Memory use Disk use Printer use Elapsed time Throughput Mean response time 20% 33% 33% 33% 30 min 6 jobs/hr 18 min

Multiprogramming
40% 67% 67% 67% 15 min 12 jobs/hr 10 min

M02_STAL6329_06_SE_C02.QXD

100% CPU CPU 0% 100% Memory 0% 100% Disk Disk 0% 100% Terminal 0% 100% Printer 0% JOB1 0 5 10 15 Minutes 20 JOB2 JOB3 25 Time 30 0 Job history JOB1 JOB2 JOB3 5 10 Minutes 15 Time (b) Multiprogramming Printer 0% 0% 100% 0% 100% 0% 100% 0% 100%

100%

2/22/08

Memory

7:02 PM Page 61

Terminal

Job history

(a) Uniprogramming

Figure 2.6 Utilization Histograms

61

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 62

62

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

Now suppose that the jobs are run concurrently under a multiprogramming operating system. Because there is little resource contention between the jobs, all three can run in nearly minimum time while coexisting with the others in the computer (assuming that JOB2 and JOB3 are allotted enough processor time to keep their input and output operations active). JOB1 will still require 5 minutes to complete, but at the end of that time, JOB2 will be one-third finished and JOB3 half finished. All three jobs will have finished within 15 minutes. The improvement is evident when examining the multiprogramming column of Table 2.2, obtained from the histogram shown in Figure 2.6b. As with a simple batch system, a multiprogramming batch system must rely on certain computer hardware features. The most notable additional feature that is useful for multiprogramming is the hardware that supports I/O interrupts and DMA (direct memory access). With interrupt-driven I/O or DMA, the processor can issue an I/O command for one job and proceed with the execution of another job while the I/O is carried out by the device controller. When the I/O operation is complete, the processor is interrupted and control is passed to an interrupt-handling program in the OS. The OS will then pass control to another job. Multiprogramming operating systems are fairly sophisticated compared to single-program, or uniprogramming, systems. To have several jobs ready to run, they must be kept in main memory, requiring some form of memory management. In addition, if several jobs are ready to run, the processor must decide which one to run, this decision requires an algorithm for scheduling. These concepts are discussed later in this chapter.

Time-Sharing Systems
With the use of multiprogramming, batch processing can be quite efficient. However, for many jobs, it is desirable to provide a mode in which the user interacts directly with the computer. Indeed, for some jobs, such as transaction processing, an interactive mode is essential. Today, the requirement for an interactive computing facility can be, and often is, met by the use of a dedicated personal computer or workstation. That option was not available in the 1960s, when most computers were big and costly. Instead, time sharing was developed. Just as multiprogramming allows the processor to handle multiple batch jobs at a time, multiprogramming can also be used to handle multiple interactive jobs. In this latter case, the technique is referred to as time sharing, because processor time is shared among multiple users. In a time-sharing system, multiple users simultaneously access the system through terminals, with the OS interleaving the execution of each user program in a short burst or quantum of computation. Thus, if there are n users actively requesting service at one time, each user will only see on the average 1/n of the effective computer capacity, not counting OS overhead. However, given the relatively slow human reaction time, the response time on a properly designed system should be similar to that on a dedicated computer. Both batch processing and time sharing use multiprogramming. The key differences are listed in Table 2.3.

M02_STAL6329_06_SE_C02.QXD

2/29/08

11:21 PM

Page 63

2.2 / THE EVOLUTION OF OPERATING SYSTEMS Table 2.3 Batch Multiprogramming versus Time Sharing Batch Multiprogramming
Principal objective Source of directives to operating system Maximize processor use Job control language commands provided with the job

63

Time Sharing
Minimize response time Commands entered at the terminal

One of the first time-sharing operating systems to be developed was the Compatible Time-Sharing System (CTSS) [CORB62], developed at MIT by a group known as Project MAC (Machine-Aided Cognition, or Multiple-Access Computers). The system was first developed for the IBM 709 in 1961 and later transferred to an IBM 7094. Compared to later systems, CTSS is primitive. The system ran on a computer with 32,000 36-bit words of main memory, with the resident monitor consuming 5000 of that. When control was to be assigned to an interactive user, the user’s program and data were loaded into the remaining 27,000 words of main memory. A program was always loaded to start at the location of the 5000th word; this simplified both the monitor and memory management. A system clock generated interrupts at a rate of approximately one every 0.2 seconds. At each clock interrupt, the OS regained control and could assign the processor to another user. This technique is known as time slicing. Thus, at regular time intervals, the current user would be preempted and another user loaded in. To preserve the old user program status for later resumption, the old user programs and data were written out to disk before the new user programs and data were read in. Subsequently, the old user program code and data were restored in main memory when that program was next given a turn. To minimize disk traffic, user memory was only written out when the incoming program would overwrite it. This principle is illustrated in Figure 2.7. Assume that there are four interactive users with the following memory requirements, in words: ? ? ? ? JOB1: 15,000 JOB2: 20,000 JOB3: 5000 JOB4: 10,000

Initially, the monitor loads JOB1 and transfers control to it (a). Later, the monitor decides to transfer control to JOB2. Because JOB2 requires more memory than JOB1, JOB1 must be written out first, and then JOB2 can be loaded (b). Next, JOB3 is loaded in to be run. However, because JOB3 is smaller than JOB2, a portion of JOB2 can remain in memory, reducing disk write time (c). Later, the monitor decides to transfer control back to JOB1. An additional portion of JOB2 must be written out when JOB1 is loaded back into memory (d). When JOB4 is loaded, part of JOB1 and the portion of JOB2 remaining in memory are retained (e). At this point, if either JOB1 or JOB2 is activated, only a partial load will be required. In this example, it is JOB2 that runs next. This requires that JOB4 and the remaining resident portion of JOB1 be written out and that the missing portion of JOB2 be read in (f).

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 64

64

CHAPTER 2 / OPERATING SYSTEM OVERVIEW
0 5000 Monitor JOB 1 20000 Free 32000 (a) 25000 32000 Free (b) 25000 32000 Free (c) 0 5000 Monitor 0 5000 10000 JOB 2 Monitor JOB 3 (JOB 2)

0 5000

Monitor JOB 1

0 5000 15000 20000 25000 32000

Monitor JOB 4 (JOB 1) (JOB 2) Free (e)

0 5000

Monitor

JOB 2 25000 32000 Free (f)

20000 25000 32000

(JOB 2) Free (d)

Figure 2.7 CTSS Operation

The CTSS approach is primitive compared to present-day time sharing, but it worked. It was extremely simple, which minimized the size of the monitor. Because a job was always loaded into the same locations in memory, there was no need for relocation techniques at load time (discussed subsequently). The technique of only writing out what was necessary minimized disk activity. Running on the 7094, CTSS supported a maximum of 32 users. Time sharing and multiprogramming raise a host of new problems for the OS. If multiple jobs are in memory, then they must be protected from interfering with each other by, for example, modifying each other’s data. With multiple interactive users, the file system must be protected so that only authorized users have access to a particular file. The contention for resources, such as printers and mass storage devices, must be handled. These and other problems, with possible solutions, will be encountered throughout this text.

2.3 MAJOR ACHIEVEMENTS
Operating systems are among the most complex pieces of software ever developed. This reflects the challenge of trying to meet the difficult and in some cases competing objectives of convenience, efficiency, and ability to evolve. [DENN80a] proposes that there have been five major theoretical advances in the development of operating systems: ? Processes ? Memory management

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 65

2.3 / MAJOR ACHIEVEMENTS

65

? Information protection and security ? Scheduling and resource management ? System structure Each advance is characterized by principles, or abstractions, developed to meet difficult practical problems. Taken together, these five areas span many of the key design and implementation issues of modern operating systems. The brief review of these five areas in this section serves as an overview of much of the rest of the text.

The Process
The concept of process is fundamental to the structure of operating systems. This term was first used by the designers of Multics in the 1960s [DALE68]. It is a somewhat more general term than job. Many definitions have been given for the term process, including ? ? ? ? A program in execution An instance of a program running on a computer The entity that can be assigned to and executed on a processor A unit of activity characterized by a single sequential thread of execution, a current state, and an associated set of system resources

This concept should become clearer as we proceed. Three major lines of computer system development created problems in timing and synchronization that contributed to the development of the concept of the process: multiprogramming batch operation, time sharing, and real-time transaction systems. As we have seen, multiprogramming was designed to keep the processor and I/O devices, including storage devices, simultaneously busy to achieve maximum efficiency. The key mechanism is this: In response to signals indicating the completion of I/O transactions, the processor is switched among the various programs residing in main memory. A second line of development was general-purpose time sharing. Here, the key design objective is to be responsive to the needs of the individual user and yet, for cost reasons, be able to support many users simultaneously. These goals are compatible because of the relatively slow reaction time of the user. For example, if a typical user needs an average of 2 seconds of processing time per minute, then close to 30 such users should be able to share the same system without noticeable interference. Of course, OS overhead must be factored into such calculations. Another important line of development has been real-time transaction processing systems. In this case, a number of users are entering queries or updates against a database. An example is an airline reservation system. The key difference between the transaction processing system and the time-sharing system is that the former is limited to one or a few applications, whereas users of a time-sharing system can engage in program development, job execution, and the use of various applications. In both cases, system response time is paramount. The principal tool available to system programmers in developing the early multiprogramming and multiuser interactive systems was the interrupt. The activity

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 66

66

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

of any job could be suspended by the occurrence of a defined event, such as an I/O completion. The processor would save some sort of context (e. g., program counter and other registers) and branch to an interrupt-handling routine, which would determine the nature of the interrupt, process the interrupt, and then resume user processing with the interrupted job or some other job. The design of the system software to coordinate these various activities turned out to be remarkably difficult. With many jobs in progress at any one time, each of which involved numerous steps to be performed in sequence, it became impossible to analyze all of the possible combinations of sequences of events. In the absence of some systematic means of coordination and cooperation among activities, programmers resorted to ad hoc methods based on their understanding of the environment that the OS had to control. These efforts were vulnerable to subtle programming errors whose effects could be observed only when certain relatively rare sequences of actions occurred. These errors were difficult to diagnose because they needed to be distinguished from application software errors and hardware errors. Even when the error was detected, it was difficult to determine the cause, because the precise conditions under which the errors appeared were very hard to reproduce. In general terms, there are four main causes of such errors [DENN80a]: ? Improper synchronization: It is often the case that a routine must be suspended awaiting an event elsewhere in the system. For example, a program that initiates an I/O read must wait until the data are available in a buffer before proceeding. In such cases, a signal from some other routine is required. Improper design of the signaling mechanism can result in signals being lost or duplicate signals being received. ? Failed mutual exclusion: It is often the case that more than one user or program will attempt to make use of a shared resource at the same time. For example, two users may attempt to edit the same file at the same time. If these accesses are not controlled, an error can occur. There must be some sort of mutual exclusion mechanism that permits only one routine at a time to perform an update against the file. The implementation of such mutual exclusion is difficult to verify as being correct under all possible sequences of events. ? Nondeterminate program operation: The results of a particular program normally should depend only on the input to that program and not on the activities of other programs in a shared system. But when programs share memory, and their execution is interleaved by the processor, they may interfere with each other by overwriting common memory areas in unpredictable ways. Thus, the order in which various programs are scheduled may affect the outcome of any particular program. ? Deadlocks: It is possible for two or more programs to be hung up waiting for each other. For example, two programs may each require two I/O devices to perform some operation (e.g., disk to tape copy). One of the programs has seized control of one of the devices and the other program has control of the other device. Each is waiting for the other program to release the desired resource. Such a deadlock may depend on the chance timing of resource allocation and release.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 67

2.3 / MAJOR ACHIEVEMENTS

67

What is needed to tackle these problems is a systematic way to monitor and control the various programs executing on the processor. The concept of the process provides the foundation. We can think of a process as consisting of three components: ? An executable program ? The associated data needed by the program (variables, work space, buffers, etc.) ? The execution context of the program This last element is essential. The execution context, or process state, is the internal data by which the OS is able to supervise and control the process. This internal information is separated from the process, because the OS has information not permitted to the process. The context includes all of the information that the OS needs to manage the process and that the processor needs to execute the process properly. The context includes the contents of the various processor registers, such as the program counter and data registers. It also includes information of use to the OS, such as the priority of the process and whether the process is waiting for the completion of a particular I/O event. Figure 2.8 indicates a way in which processes may be managed. Two processes, A and B, exist in portions of main memory. That is, a block of memory is allocated to each process that contains the program, data, and context information. Each process is recorded in a process list built and maintained by the OS. The process list contains one entry for each process, which includes a pointer to the location of the block of memory that contains the process. The entry may also include part or all of the execution context of the process. The remainder of the execution context is stored elsewhere, perhaps with the process itself (as indicated in Figure 2.8) or frequently in a separate region of memory. The process index register contains the index into the process list of the process currently controlling the processor. The program counter points to the next instruction in that process to be executed. The base and limit registers define the region in memory occupied by the process: The base register is the starting address of the region of memory and the limit is the size of the region (in bytes or words). The program counter and all data references are interpreted relative to the base register and must not exceed the value in the limit register. This prevents interprocess interference. In Figure 2.8, the process index register indicates that process B is executing. Process A was previously executing but has been temporarily interrupted. The contents of all the registers at the moment of A’s interruption were recorded in its execution context. Later, the OS can perform a process switch and resume execution of process A. The process switch consists of storing the context of B and restoring the context of A. When the program counter is loaded with a value pointing into A’s program area, process A will automatically resume execution. Thus, the process is realized as a data structure. A process can either be executing or awaiting execution. The entire state of the process at any instant is contained in its context. This structure allows the development of powerful techniques for ensuring coordination and cooperation among processes. New features can be designed and incorporated into the OS (e.g., priority) by expanding the context to include any new information needed to support the feature. Throughout this book,

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 68

68

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

Main memory
Process index PC Base limit

Processor registers
i

i Process list j

b h

Other registers

Context Process A Data Program (code)

b Process B

Context Data Program (code)

h

Figure 2.8 Typical Process Implementation

we will see a number of examples where this process structure is employed to solve the problems raised by multiprogramming and resource sharing.

Memory Management
The needs of users can be met best by a computing environment that supports modular programming and the flexible use of data. System managers need efficient and orderly control of storage allocation. The OS, to satisfy these requirements, has five principal storage management responsibilities: ? Process isolation: The OS must prevent independent processes from interfering with each other’s memory, both data and instructions. ? Automatic allocation and management: Programs should be dynamically allocated across the memory hierarchy as required. Allocation should be transparent to the programmer. Thus, the programmer is relieved of concerns relating to memory limitations, and the OS can achieve efficiency by assigning memory to jobs only as needed.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 69

2.3 / MAJOR ACHIEVEMENTS

69

? Support of modular programming: Programmers should be able to define program modules, and to create, destroy, and alter the size of modules dynamically. ? Protection and access control: Sharing of memory, at any level of the memory hierarchy, creates the potential for one program to address the memory space of another. This is desirable when sharing is needed by particular applications. At other times, it threatens the integrity of programs and even of the OS itself. The OS must allow portions of memory to be accessible in various ways by various users. ? Long-term storage: Many application programs require means for storing information for extended periods of time, after the computer has been powered down. Typically, operating systems meet these requirements with virtual memory and file system facilities. The file system implements a long-term store, with information stored in named objects, called files. The file is a convenient concept for the programmer and is a useful unit of access control and protection for the OS. Virtual memory is a facility that allows programs to address memory from a logical point of view, without regard to the amount of main memory physically available. Virtual memory was conceived to meet the requirement of having multiple user jobs reside in main memory concurrently, so that there would not be a hiatus between the execution of successive processes while one process was written out to secondary store and the successor process was read in. Because processes vary in size, if the processor switches among a number of processes, it is difficult to pack them compactly into main memory. Paging systems were introduced, which allow processes to be comprised of a number of fixed-size blocks, called pages. A program references a word by means of a virtual address consisting of a page number and an offset within the page. Each page of a process may be located anywhere in main memory. The paging system provides for a dynamic mapping between the virtual address used in the program and a real address, or physical address, in main memory. With dynamic mapping hardware available, the next logical step was to eliminate the requirement that all pages of a process reside in main memory simultaneously. All the pages of a process are maintained on disk. When a process is executing, some of its pages are in main memory. If reference is made to a page that is not in main memory, the memory management hardware detects this and arranges for the missing page to be loaded. Such a scheme is referred to as virtual memory and is depicted in Figure 2.9. The processor hardware, together with the OS, provides the user with a “virtual processor” that has access to a virtual memory. This memory may be a linear address space or a collection of segments, which are variable-length blocks of contiguous addresses. In either case, programming language instructions can reference program and data locations in the virtual memory area. Process isolation can be achieved by giving each process a unique, nonoverlapping virtual memory. Memory sharing can be achieved by overlapping portions of two virtual memory spaces. Files are maintained in a long-term store. Files and portions of files may be copied into the virtual memory for manipulation by programs.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 70

70

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

A.1 A.0 A.5 A.2 0 1 B.0 B.1 B.2 B.3 2 3 4 5 A.7 A.9 6 7 8 A.8 9 10 User program A B.5 B.6 0 1 2 3 4 5 6 User program B

Main memory
Main memory consists of a number of fixed-length frames, each equal to the size of a page. For a program to execute, some or all of its pages must be in main memory.

Disk
Secondary memory (disk) can hold many fixed-length pages. A user program consists of some number of pages. Pages for all programs plus the operating system are on disk, as are files.

Figure 2.9 Virtual Memory Concepts

Figure 2.10 highlights the addressing concerns in a virtual memory scheme. Storage consists of directly addressable (by machine instructions) main memory and lower-speed auxiliary memory that is accessed indirectly by loading blocks into main memory. Address translation hardware (memory management unit) is interposed between the processor and memory. Programs reference locations using virtual addresses, which are mapped into real main memory addresses. If a reference is made to a virtual address not in real memory, then a portion of the contents of real memory is swapped out to auxiliary memory and the desired block of data is swapped in. During this activity, the process that generated the address reference must be suspended. The OS designer needs to develop an address translation mechanism that generates little overhead and a storage allocation policy that minimizes the traffic between memory levels.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 71

2.3 / MAJOR ACHIEVEMENTS
Real address

71

Processor

Virtual address

Memorymanagement unit

Main memory

Disk address

Secondary memory

Figure 2.10 Virtual Memory Addressing

Information Protection and Security
The growth in the use of time-sharing systems and, more recently, computer networks has brought with it a growth in concern for the protection of information. The nature of the threat that concerns an organization will vary greatly depending on the circumstances. However, there are some general-purpose tools that can be built into computers and operating systems that support a variety of protection and security mechanisms. In general, we are concerned with the problem of controlling access to computer systems and the information stored in them. Much of the work in security and protection as it relates to operating systems can be roughly grouped into four categories: ? Availability: Concerned with protecting the system against interruption ? Confidentiality: Assures that users cannot read data for which access is unauthorized ? Data integrity: Protection of data from unauthorized modification ? Authenticity: Concerned with the proper verification of the identity of users and the validity of messages or data

Scheduling and Resource Management
A key responsibility of the OS is to manage the various resources available to it (main memory space, I/O devices, processors) and to schedule their use by the various active processes. Any resource allocation and scheduling policy must consider three factors: ? Fairness: Typically, we would like all processes that are competing for the use of a particular resource to be given approximately equal and fair access to that

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 72

72

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

resource. This is especially so for jobs of the same class, that is, jobs of similar demands. ? Differential responsiveness: On the other hand, the OS may need to discriminate among different classes of jobs with different service requirements. The OS should attempt to make allocation and scheduling decisions to meet the total set of requirements. The OS should also make these decisions dynamically. For example, if a process is waiting for the use of an I/O device, the OS may wish to schedule that process for execution as soon as possible to free up the device for later demands from other processes. ? Efficiency: The OS should attempt to maximize throughput, minimize response time, and, in the case of time sharing, accommodate as many users as possible. These criteria conflict; finding the right balance for a particular situation is an ongoing problem for operating system research. Scheduling and resource management are essentially operations-research problems and the mathematical results of that discipline can be applied. In addition, measurement of system activity is important to be able to monitor performance and make adjustments. Figure 2.11 suggests the major elements of the OS involved in the scheduling of processes and the allocation of resources in a multiprogramming environment. The OS maintains a number of queues, each of which is simply a list of processes waiting for some resource. The short-term queue consists of processes that are in main memory (or at least an essential minimum portion of each is in main memory) and are ready to run as soon as the processor is made available. Any one of these processes

Operating system
Service call from process Service call handler (code)

Interrupt from process Interrupt from I/O

Interrupt handler (code)

Longterm queue

Shortterm queue

I/O queues

Short-Term scheduler (code)

Pass control to process

Figure 2.11 Key Elements of an Operating System for Multiprogramming

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 73

2.3 / MAJOR ACHIEVEMENTS

73

could use the processor next. It is up to the short-term scheduler, or dispatcher, to pick one. A common strategy is to give each process in the queue some time in turn; this is referred to as a round-robin technique. In effect, the round-robin technique employs a circular queue. Another strategy is to assign priority levels to the various processes, with the scheduler selecting processes in priority order. The long-term queue is a list of new jobs waiting to use the processor. The OS adds jobs to the system by transferring a process from the long-term queue to the short-term queue. At that time, a portion of main memory must be allocated to the incoming process. Thus, the OS must be sure that it does not overcommit memory or processing time by admitting too many processes to the system. There is an I/O queue for each I/O device. More than one process may request the use of the same I/O device. All processes waiting to use each device are lined up in that device’s queue. Again, the OS must determine which process to assign to an available I/O device. The OS receives control of the processor at the interrupt handler if an interrupt occurs. A process may specifically invoke some operating system service, such as an I/O device handler by means of a service call. In this case, a service call handler is the entry point into the OS. In any case, once the interrupt or service call is handled, the short-term scheduler is invoked to pick a process for execution. The foregoing is a functional description; details and modular design of this portion of the OS will differ in various systems. Much of the research and development effort in operating systems has been directed at picking algorithms and data structures for this function that provide fairness, differential responsiveness, and efficiency.

System Structure
As more and more features have been added to operating systems, and as the underlying hardware has become more capable and versatile, the size and complexity of operating systems has grown. CTSS, put into operation at MIT in 1963, consisted of approximately 32,000 36-bit words of storage. OS/360, introduced a year later by IBM, had more than a million machine instructions. By 1975, the Multics system, developed by MIT and Bell Laboratories, had grown to more than 20 million instructions. It is true that more recently, some simpler operating systems have been introduced for smaller systems, but these have inevitably grown more complex as the underlying hardware and user requirements have grown. Thus, the UNIX of today is far more complex than the almost toy system put together by a few talented programmers in the early 1970s, and the simple MS-DOS has given way to the rich and complex power of OS/2 and Windows. For example, Windows NT 4.0 contains 16 million lines of code, and Windows 2000 has well over twice that number. The size of a full-featured OS, and the difficulty of the problem it addresses, has led to four unfortunate but all-too-common problems. First, operating systems are chronically late in being delivered. This goes for new operating systems and upgrades to older systems. Second, the systems have latent bugs that show up in the field and must be fixed and reworked. Third, performance is often not what was expected. Fourth, it has proved impossible to deploy a complex OS that is not vulnerable to a variety of security attacks, including viruses, worms, and unauthorized access.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 74

74

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

To manage the complexity of operating systems and to overcome these problems, there has been much focus over the years on the software structure of the OS. Certain points seem obvious. The software must be modular. This will help organize the software development process and limit the effort of diagnosing and fixing errors. The modules must have well-defined interfaces to each other, and the interfaces must be as simple as possible. Again, this eases the programming burden. It also facilitates system evolution. With clean, minimal interfaces between modules, one module can be changed with minimal impact on other modules. For large operating systems, which run from millions to tens of millions of lines of code, modular programming alone has not been found to be sufficient. Instead there has been increasing use of the concepts of hierarchical layers and information abstraction. The hierarchical structure of a modern OS separates its functions according to their characteristic time scale and their level of abstraction. We can view the system as a series of levels. Each level performs a related subset of the functions required of the OS. It relies on the next lower level to perform more primitive functions and to conceal the details of those functions. It provides services to the next higher layer. Ideally, the levels should be defined so that changes in one level do not require changes in other levels. Thus, we have decomposed one problem into a number of more manageable subproblems. In general, lower layers deal with a far shorter time scale. Some parts of the OS must interact directly with the computer hardware, where events can have a time scale as brief as a few billionths of a second. At the other end of the spectrum, parts of the OS communicate with the user, who issues commands at a much more leisurely pace, perhaps one every few seconds. The use of a set of levels conforms nicely to this environment. The way in which these principles are applied varies greatly among contemporary operating systems. However, it is useful at this point, for the purpose of gaining an overview of operating systems, to present a model of a hierarchical OS. Let us consider the model proposed in [BROW84] and [DENN84]. Although it does not correspond to any particular OS, this model provides a useful high-level view of OS structure. The model is defined in Table 2.4 and consists of the following levels: ? Level 1: Consists of electronic circuits, where the objects that are dealt with are registers, memory cells, and logic gates. The operations defined on these objects are actions, such as clearing a register or reading a memory location. ? Level 2: The processor’s instruction set. The operations at this level are those allowed in the machine language instruction set, such as add, subtract, load, and store. ? Level 3: Adds the concept of a procedure or subroutine, plus the call/return operations. ? Level 4: Introduces interrupts, which cause the processor to save the current context and invoke an interrupt-handling routine. These first four levels are not part of the OS but constitute the processor hardware. However, some elements of the OS begin to appear at these levels, such as the interrupt-handling routines. It is at level 5 that we begin to reach the OS proper and that the concepts associated with multiprogramming begin to appear.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 75

2.3 / MAJOR ACHIEVEMENTS Table 2.4 Operating System Design Hierarchy Level
13 12 11 10 9 8 7 6 5 4 3 2 1

75

Name
Shell User processes Directories Devices File system Communications Virtual memory Local secondary store Primitive processes Interrupts Procedures Instruction set Electronic circuits

Objects
User programming environment User processes Directories External devices, such as printers, displays, and keyboards Files Pipes Segments, pages Blocks of data, device channels Primitive processes, semaphores, ready list Interrupt-handling programs Procedures, call stack, display Evaluation stack, microprogram interpreter, scalar and array data Registers, gates, buses, etc.

Example Operations
Statements in shell language Quit, kill, suspend, resume Create, destroy, attach, detach, search, list Open, close, read, write Create, destroy, open, close, read, write Create, destroy, open, close, read, write Read, write, fetch Read, write, allocate, free Suspend, resume, wait, signal Invoke, mask, unmask, retry Mark stack, call, return Load, store, add, subtract, branch Clear, transfer, activate, complement

Gray shaded area represents hardware.

? Level 5: The notion of a process as a program in execution is introduced at this level. The fundamental requirements on the OS to support multiple processes include the ability to suspend and resume processes. This requires saving hardware registers so that execution can be switched from one process to another. In addition, if processes need to cooperate, then some method of synchronization is needed. One of the simplest techniques, and an important concept in OS design, is the semaphore, a simple signaling technique that is explored in Chapter 5. ? Level 6: Deals with the secondary storage devices of the computer. At this level, the functions of positioning the read/write heads and the actual transfer of blocks of data occur. Level 6 relies on level 5 to schedule the operation and to notify the requesting process of completion of an operation. Higher levels are concerned with the address of the needed data on the disk and provide a request for the appropriate block to a device driver at level 5. ? Level 7: Creates a logical address space for processes. This level organizes the virtual address space into blocks that can be moved between main memory and secondary memory. Three schemes are in common use: those using fixedsize pages, those using variable-length segments, and those using both. When a needed block is not in main memory, logic at this level requests a transfer from level 6.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 76

76

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

Up to this point, the OS deals with the resources of a single processor. Beginning with level 8, the OS deals with external objects such as peripheral devices and possibly networks and computers attached to the network. The objects at these upper levels are logical, named objects that can be shared among processes on the same computer or on multiple computers. ? Level 8: Deals with the communication of information and messages between processes. Whereas level 5 provided a primitive signal mechanism that allowed for the synchronization of processes, this level deals with a richer sharing of information. One of the most powerful tools for this purpose is the pipe, which is a logical channel for the flow of data between processes. A pipe is defined with its output from one process and its input into another process. It can also be used to link external devices or files to processes. The concept is discussed in Chapter 6. ? Level 9: Supports the long-term storage of named files. At this level, the data on secondary storage are viewed in terms of abstract, variable-length entities. This is in contrast to the hardware-oriented view of secondary storage in terms of tracks, sectors, and fixed-size blocks at level 6. ? Level 10: Provides access to external devices using standardized interfaces. ? Level 11: Is responsible for maintaining the association between the external and internal identifiers of the system’s resources and objects. The external identifier is a name that can be employed by an application or user. The internal identifier is an address or other indicator that can be used by lower levels of the OS to locate and control an object. These associations are maintained in a directory. Entries include not only external/internal mapping, but also characteristics such as access rights. ? Level 12: Provides a full-featured facility for the support of processes. This goes far beyond what is provided at level 5. At level 5, only the processor register contents associated with a process are maintained, plus the logic for dispatching processes. At level 12, all of the information needed for the orderly management of processes is supported. This includes the virtual address space of the process, a list of objects and processes with which it may interact and the constraints of that interaction, parameters passed to the process upon creation, and any other characteristics of the process that might be used by the OS to control the process. ? Level 13: Provides an interface to the OS for the user. It is referred to as the shell because it separates the user from OS details and presents the OS simply as a collection of services. The shell accepts user commands or job control statements, interprets these, and creates and controls processes as needed. For example, the interface at this level could be implemented in a graphical manner, providing the user with commands through a list presented as a menu and displaying results using graphical output to a specific device such as a screen. This hypothetical model of an OS provides a useful descriptive structure and serves as an implementation guideline. The reader may refer back to this structure during the course of the book to observe the context of any particular design issue under discussion.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 77

2.4 /DEVELOPMENTS LEADING TO MODERN OPERATING SYSTEMS

77

2.4 DEVELOPMENTS LEADING TO MODERN OPERATING SYSTEMS
Over the years, there has been a gradual evolution of OS structure and capabilities. However, in recent years a number of new design elements have been introduced into both new operating systems and new releases of existing operating systems that create a major change in the nature of operating systems. These modern operating systems respond to new developments in hardware, new applications, and new security threats. Among the key hardware drivers are multiprocessor systems, greatly increased processor speed, high-speed network attachments, and increasing size and variety of memory storage devices. In the application arena, multimedia applications, Internet and Web access, and client/server computing have influenced OS design. With respect to security, Internet access to computers has greatly increased the potential threat and increasingly sophisticated attacks, such as viruses, worms, and hacking techniques, have had a profound impact on OS design. The rate of change in the demands on operating systems requires not just modifications and enhancements to existing architectures but new ways of organizing the OS. A wide range of different approaches and design elements has been tried in both experimental and commercial operating systems, but much of the work fits into the following categories: ? ? ? ? ? Microkernel architecture Multithreading Symmetric multiprocessing Distributed operating systems Object-oriented design

Most operating systems, until recently, featured a large monolithic kernel. Most of what is thought of as OS functionality is provided in these large kernels, including scheduling, file system, networking, device drivers, memory management, and more. Typically, a monolithic kernel is implemented as a single process, with all elements sharing the same address space. A microkernel architecture assigns only a few essential functions to the kernel, including address spaces, interprocess communication (IPC), and basic scheduling. Other OS services are provided by processes, sometimes called servers, that run in user mode and are treated like any other application by the microkernel. This approach decouples kernel and server development. Servers may be customized to specific application or environment requirements. The microkernel approach simplifies implementation, provides flexibility, and is well suited to a distributed environment. In essence, a microkernel interacts with local and remote server processes in the same way, facilitating construction of distributed systems. Multithreading is a technique in which a process, executing an application, is divided into threads that can run concurrently. We can make the following distinction: ? Thread: A dispatchable unit of work. It includes a processor context (which includes the program counter and stack pointer) and its own data area for a

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 78

78

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

stack (to enable subroutine branching). A thread executes sequentially and is interruptable so that the processor can turn to another thread. ? Process: A collection of one or more threads and associated system resources (such as memory containing both code and data, open files, and devices). This corresponds closely to the concept of a program in execution. By breaking a single application into multiple threads, the programmer has great control over the modularity of the application and the timing of application-related events. Multithreading is useful for applications that perform a number of essentially independent tasks that do not need to be serialized. An example is a database server that listens for and processes numerous client requests. With multiple threads running within the same process, switching back and forth among threads involves less processor overhead than a major process switch between different processes. Threads are also useful for structuring processes that are part of the OS kernel as described in subsequent chapters. Until recently, virtually all single-user personal computers and workstations contained a single general-purpose microprocessor. As demands for performance increase and as the cost of microprocessors continues to drop, vendors have introduced computers with multiple microprocessors. To achieve greater efficiency and reliability, one technique is to employ symmetric multiprocessing (SMP), a term that refers to a computer hardware architecture and also to the OS behavior that exploits that architecture. A symmetric multiprocessor can be defined as a standalone computer system with the following characteristics: 1. There are multiple processors. 2. These processors share the same main memory and I/O facilities, interconnected by a communications bus or other internal connection scheme. 3. All processors can perform the same functions (hence the term symmetric). In recent years, systems with multiple processors on a single chip have become widely used, referred to as chip multiprocessor systems. Many of the design issues are the same, whether dealing with a chip multiprocessor or a multiple-chip SMP. The OS of an SMP schedules processes or threads across all of the processors. SMP has a number of potential advantages over uniprocessor architecture, including the following: ? Performance: If the work to be done by a computer can be organized so that some portions of the work can be done in parallel, then a system with multiple processors will yield greater performance than one with a single processor of the same type. This is illustrated in Figure 2.12. With multiprogramming, only one process can execute at a time; meanwhile all other processes are waiting for the processor. With multiprocessing, more than one process can be running simultaneously, each on a different processor. ? Availability: In a symmetric multiprocessor, because all processors can perform the same functions, the failure of a single processor does not halt the system. Instead, the system can continue to function at reduced performance. ? Incremental growth: A user can enhance the performance of a system by adding an additional processor.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 79

2.4 /DEVELOPMENTS LEADING TO MODERN OPERATING SYSTEMS

79

Time Process 1 Process 2 Process 3
(a) Interleaving (multiprogramming, one processor)

Process 1 Process 2 Process 3
(b) Interleaving and overlapping (multiprocessing; two processors)

Blocked

Running

Figure 2.12 Multiprogramming and Multiprocessing

? Scaling: Vendors can offer a range of products with different price and performance characteristics based on the number of processors configured in the system. It is important to note that these are potential, rather than guaranteed, benefits. The OS must provide tools and functions to exploit the parallelism in an SMP system. Multithreading and SMP are often discussed together, but the two are independent facilities. Even on a uniprocessor system, multithreading is useful for structuring applications and kernel processes. An SMP system is useful even for nonthreaded processes, because several processes can run in parallel. However, the two facilities complement each other and can be used effectively together. An attractive feature of an SMP is that the existence of multiple processors is transparent to the user. The OS takes care of scheduling of threads or processes on individual processors and of synchronization among processors. This book discusses the scheduling and synchronization mechanisms used to provide the single-system appearance to the user. A different problem is to provide the appearance of a single system for a cluster of separate computers—a multicomputer system. In this case, we are dealing with a collection of entities (computers), each with its own main memory,

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 80

80

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

secondary memory, and other I/O modules. A distributed operating system provides the illusion of a single main memory space and a single secondary memory space, plus other unified access facilities, such as a distributed file system. Although clusters are becoming increasingly popular, and there are many cluster products on the market, the state of the art for distributed operating systems lags that of uniprocessor and SMP operating systems. We examine such systems in Part Eight. Another innovation in OS design is the use of object-oriented technologies. Object-oriented design lends discipline to the process of adding modular extensions to a small kernel. At the OS level, an object-based structure enables programmers to customize an OS without disrupting system integrity. Object orientation also eases the development of distributed tools and full-blown distributed operating systems.

2.5 MICROSOFT WINDOWS OVERVIEW
History
The story of Windows begins with a very different OS, developed by Microsoft for the first IBM personal computer and referred to as MS-DOS or PC-DOS. The initial version, DOS 1.0, was released in August 1981. It consisted of 4000 lines of assembly language source code and ran in 8 Kbytes of memory using the Intel 8086 microprocessor. When IBM developed a hard disk-based personal computer, the PC XT, Microsoft developed DOS 2.0, released in 1983. It contained support for the hard disk and provided for hierarchical directories. Heretofore, a disk could contain only one directory of files, supporting a maximum of 64 files. While this was adequate in the era of floppy disks, it was too limited for a hard disk, and the single-directory restriction was too clumsy. This new release allowed directories to contain subdirectories as well as files. The new release also contained a richer set of commands embedded in the OS to provide functions that had to be performed by external programs provided as utilities with Release 1. Among the capabilities added were several UNIXlike features, such as I/O redirection, which is the ability to change the input or output identity for a given application, and background printing. The memory-resident portion grew to 24 Kbytes. When IBM announced the PC AT in 1984, Microsoft introduced DOS 3.0. The AT contained the Intel 80286 processor, which provided extended addressing and memory protection features. These were not used by DOS. To remain compatible with previous releases, the OS simply used the 80286 as a “fast 8086.” The OS did provide support for new keyboard and hard disk peripherals. Even so, the memory requirement grew to 36 Kbytes. There were several notable upgrades to the 3.0 release. DOS 3.1, released in 1984, contained support for networking of PCs. The size of the resident portion did not change; this was achieved by increasing the amount of the OS that could be swapped. DOS 3.3, released in 1987, provided support for the new line of IBM computers, the PS/2. Again, this release did not take advantage of the processor capabilities of the PS/2, provided by the 80286 and the 32-bit 80386 chips. The resident portion at this stage had grown to a minimum of 46 Kbytes, with more required if certain optional extensions were selected.

M02_STAL6329_06_SE_C02.QXD

2/28/08

3:34 AM

Page 81

2.5 /MICROSOFT WINDOWS OVERVIEW

81

By this time, DOS was being used in an environment far beyond its capabilities. The introduction of the 80486 and then the Intel Pentium chip provided power and features that could not be exploited by the simple-minded DOS. Meanwhile, beginning in the early 1980s, Microsoft began development of a graphical user interface (GUI) that would be interposed between the user and DOS. Microsoft’s intent was to compete with Macintosh, whose OS was unsurpassed for ease of use. By 1990, Microsoft had a version of the GUI, known as Windows 3.0, which incorporated some of the user friendly features of Macintosh. However, it was still hamstrung by the need to run on top of DOS. After an abortive attempt by Microsoft to develop with IBM a next-generation OS, which would exploit the power of the new microprocessors and which would incorporate the ease-of-use features of Windows, Microsoft struck out on its own and developed a new OS from the ground up, Windows NT. Windows NT exploits the capabilities of contemporary microprocessors and provides multitasking in a single-user or multiple-user environment. The first version of Windows NT (3.1) was released in 1993, with the same GUI as Windows 3.1, another Microsoft OS (the follow-on to Windows 3.0). However, NT 3.1 was a new 32-bit OS with the ability to support older DOS and Windows applications as well as provide OS/2 support. After several versions of NT 3.x, Microsoft released NT 4.0. NT 4.0 has essentially the same internal architecture as 3.x. The most notable external change is that NT 4.0 provides the same user interface as Windows 95 (an enhanced upgrade to Windows 3.1). The major architectural change is that several graphics components that ran in user mode as part of the Win32 subsystem in 3.x have been moved into the Windows NT Executive, which runs in kernel mode. The benefit of this change is to speed up the operation of these important functions. The potential drawback is that these graphics functions now have direct access to low-level system services, which could impact the reliability of the OS. In 2000, Microsoft introduced the next major upgrade: Windows 2000. Again, the underlying Executive and Kernel architecture is fundamentally the same as in NT 4.0, but new features have been added. The emphasis in Windows 2000 is the addition of services and functions to support distributed processing. The central element of Windows 2000’s new features is Active Directory, which is a distributed directory service able to map names of arbitrary objects to any kind of information about those objects. Windows 2000 also added the plug-and-play and power-management facilities that were already in Windows 98, the successor to Windows 95. These features are particularly important for laptop computers, which frequently use docking stations and run on batteries. One final general point to make about Windows 2000 is the distinction between Windows 2000 Server and Windows 2000 desktop. In essence, the kernel and executive architecture and services remain the same, but Server includes some services required to use as a network server. In 2001, a new desktop version of Windows was released, known as Windows XP. Both home PC and business workstation versions of XP were offered. In 2003, Microsoft introduced a new server version, known as Windows Server 2003, supporting both 32-bit and 64-bit processors. The 64-bit versions of Server 2003 was designed specifically for the 64-bit Intel Itanium hardware. With the first service pack

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 82

82

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

update for Server 2003, Microsoft introduced support for the AMD64 processor architecture for both desktops and servers. In 2007, the latest desktop version of Windows was released, known as Windows Vista. Vista supports both the Intel x86 and AMD x64 architectures. The main features of the release were changes to the GUI and many security improvements. The corresponding server release is Windows Server 2008.

Single-User Multitasking
Windows (from Windows 2000 onward) is a significant example of what has become the new wave in microcomputer operating systems (other examples are Linux and MacOS). Windows was driven by a need to exploit the processing capabilities of today’s 32-bit and 64-bit microprocessors, which rival mainframes of just a few years ago in speed, hardware sophistication, and memory capacity. One of the most significant features of these new operating systems is that, although they are still intended for support of a single interactive user, they are multitasking operating systems. Two main developments have triggered the need for multitasking on personal computers, workstations, and servers. First, with the increased speed and memory capacity of microprocessors, together with the support for virtual memory, applications have become more complex and interrelated. For example, a user may wish to employ a word processor, a drawing program, and a spreadsheet application simultaneously to produce a document. Without multitasking, if a user wishes to create a drawing and paste it into a word processing document, the following steps are required: 1. Open the drawing program. 2. Create the drawing and save it in a file or on a temporary clipboard. 3. Close the drawing program. 4. Open the word processing program. 5. Insert the drawing in the correct location. If any changes are desired, the user must close the word processing program, open the drawing program, edit the graphic image, save it, close the drawing program, open the word processing program, and insert the updated image. This becomes tedious very quickly. As the services and capabilities available to users become more powerful and varied, the single-task environment becomes more clumsy and user unfriendly. In a multitasking environment, the user opens each application as needed, and leaves it open. Information can be moved around among a number of applications easily. Each application has one or more open windows, and a graphical interface with a pointing device such as a mouse allows the user to navigate quickly in this environment. A second motivation for multitasking is the growth of client/server computing. With client/server computing, a personal computer or workstation (client) and a host system (server) are used jointly to accomplish a particular application. The two are linked, and each is assigned that part of the job that suits its capabilities. Client/server can be achieved in a local area network of personal computers and servers or by means of a link between a user system and a large host such as a mainframe. An

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 83

2.5 /MICROSOFT WINDOWS OVERVIEW

83

application may involve one or more personal computers and one or more server devices. To provide the required responsiveness, the OS needs to support high-speed networking interfaces and the associated communications protocols and data transfer architectures while at the same time supporting ongoing user interaction. The foregoing remarks apply to the desktop versions of Windows. The Server versions are also multitasking but may support multiple users. They support multiple local server connections as well as providing shared services used by multiple users on the network. As an Internet server, Windows may support thousands of simultaneous Web connections.

Architecture
Figure 2.13 illustrates the overall structure of Windows 2000; later releases of Windows, including Vista, have essentially the same structure at this level of detail. Its modular structure gives Windows considerable flexibility. It is designed to execute
System support processes Service control manager Lsass Winlogon Session manager Service processes Applications

SVChost.exe Winmgmt.exe Spooler Services.exe Task manager Windows explorer User application Subsytem DLLs

Environment subsystems POSIX

Win32

Ntdll.dll System threads User mode Kernel mode System service dispatcher (Kernel-mode callable interfaces) I/O manager Win32 USER, GDI Configuration manager (registry) Security reference monitor File system cache Local procedure call Object manager Virtual memory Power manager Plug-and-play manager Processes and threads

Device and file system drivers

Graphics drivers

Kernel Hardware abstraction layer (HAL)

Lsass = local security authentication server POSIX = portable operating system interface GDI = graphics device interface DLL = dynamic link libraries

Colored area indicates Executive

Figure 2.13 Windows and Windows Vista Architecture [RUSS05]

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 84

84

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

on a variety of hardware platforms and supports applications written for a variety of other operating systems. As of this writing, desktop Windows is only implemented on the Intel x86 and AMD64 hardware platforms. Windows server also supports the Intel IA64 (Itanium). As with virtually all operating systems, Windows separates applicationoriented software from the core OS software.The latter, which includes the Executive, the Kernel, device drivers, and the hardware abstraction layer, runs in kernel mode. Kernel mode software has access to system data and to the hardware. The remaining software, running in user mode, has limited access to system data.

Operating System Organization Windows has a highly modular architecture. Each system function is managed by just one component of the OS. The rest of the OS and all applications access that function through the responsible component using standard interfaces. Key system data can only be accessed through the appropriate function. In principle, any module can be removed, upgraded, or replaced without rewriting the entire system or its standard application program interface (APIs). The kernel-mode components of Windows are the following:
? Executive: Contains the base OS services, such as memory management, process and thread management, security, I/O, and interprocess communication. ? Kernel: Controls execution of the processor(s). The Kernel manages thread scheduling, process switching, exception and interrupt handling, and multiprocessor synchronization. Unlike the rest of the Executive and the user level, the Kernel’s own code does not run in threads. ? Hardware abstraction layer (HAL): Maps between generic hardware commands and responses and those unique to a specific platform. It isolates the OS from platform-specific hardware differences. The HAL makes each computer’s system bus, direct memory access (DMA) controller, interrupt controller, system timers, and memory module look the same to the Executive and Kernel components. It also delivers the support needed for symmetric multiprocessing (SMP), explained subsequently. ? Device drivers: Dynamic libraries that extend the functionality of the Executive. These include hardware device drivers that translate user I/O function calls into specific hardware device I/O requests and software components for implementing file systems, network protocols, and any other system extensions that need to run in kernel mode. ? Windowing and graphics system: Implements the graphical user interface (GUI) functions, such as dealing with windows, user interface controls, and drawing. The Windows Executive includes components for specific system functions and provides an API for user-mode software. Following is a brief description of each of the Executive modules: ? I/O manager: Provides a framework through which I/O devices are accessible to applications, and is responsible for dispatching to the appropriate device drivers for further processing. The I/O manager implements all the Windows I/O APIs and enforces security and naming for devices, network protocols, and file systems (using the object manager). Windows I/O is discussed in Chapter 11.

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 85

2.5 /MICROSOFT WINDOWS OVERVIEW

85

? Cache manager: Improves the performance of file-based I/O by causing recently referenced file data to reside in main memory for quick access, and by deferring disk writes by holding the updates in memory for a short time before sending them to the disk. ? Object manager: Creates, manages, and deletes Windows Executive objects and abstract data types that are used to represent resources such as processes, threads, and synchronization objects. It enforces uniform rules for retaining, naming, and setting the security of objects. The object manager also creates object handles, which consist of access control information and a pointer to the object. Windows objects are discussed later in this section. ? Plug-and-play manager: Determines which drivers are required to support a particular device and loads those drivers. ? Power manager: Coordinates power management among various devices and can be configured to reduce power consumption by shutting down idle devices, putting the processor to sleep, and even writing all of memory to disk and shutting off power to the entire system. ? Security reference monitor: Enforces access-validation and audit-generation rules. The Windows object-oriented model allows for a consistent and uniform view of security, right down to the fundamental entities that make up the Executive. Thus, Windows uses the same routines for access validation and for audit checks for all protected objects, including files, processes, address spaces, and I/O devices. Windows security is discussed in Chapter 15. ? Virtual memory manager: Manages virtual addresses, physical memory, and the paging files on disk. Controls the memory management hardware and data structures which map virtual addresses in the process’s address space to physical pages in the computer’s memory. Windows virtual memory management is described in Chapter 8. ? Process/thread manager: Creates, manages, and deletes process and thread objects. Windows process and thread management are described in Chapter 4. ? Configuration manager: Responsible for implementing and managing the system registry, which is the repository for both system wide and per-user settings of various parameters. ? Local procedure call (LPC) facility: Implements an efficient cross-process procedure call mechanism for communication between local processes implementing services and subsystems. Similar to the remote procedure call (RPC) facility used for distributed processing. by Windows:

User-Mode Processes Four basic types of user-mode processes are supported
? Special system processes: User mode services needed to manage the system, such as the session manager, the authentication subsystem, the service manager, and the logon process ? Service processes: The printer spooler, the event logger, user mode components that cooperate with device drivers, various network services, and many, many others. Services are used by both Microsoft and external software developers to

M02_STAL6329_06_SE_C02.QXD

2/22/08

7:02 PM

Page 86

86

CHAPTER 2 / OPERATING SYSTEM OVERVIEW

extend system functionality as they are the only way to run background user mode activity on a Windows system. ? Environment subsystems: Provide different OS personalities (environments). The supported subsystems are Win32/WinFX and POSIX. Each environment subsystem includes a subsystem process shared among all applications using the subsystem and dynamic link libraries (DLLs) that convert the user application calls to LPC calls on the subsystem process, and/or native Windows calls. ? User applications: Executables (EXEs) and DLLs that provide the functionality users run to make use of the system. EXEs and DLLs are generally targeted at a specific environment subsystems; although some of the

相关文章:
10-11(2)操作系统授课计划(计算机科专09-2)
Operating Systems (3rd Edition). Prentice Hall,清华大学出版社,1998 注:1、...操作系 统辅导教材》 ,电子科技大 学出版社,1998 William Operating Stalling. ...
APA Format 6th Edition Template
APA Format 6th Edition Template_英语学习_外语学习_教育专区。Running head: ALL CAPS SHORT TITLE 50 CHARACTERS OR LESS 1 APA Format Template: Title of ...
zw11
Hart. Windows System Programming, 3rd Edition. Addison Wesley Professional. ...1996 [ 12 ] William Stalling. Operating Systems , Internals and Design ...
更多相关标签: