Pipeline in Computer Architecture

Pipeline in Computer Architecture – The computational speed of conventional computers is slow. This speed can be increased by performing simultaneous data processing. This is known as parallel processing. This technique is able to perform concurrent data processing to achieve faster execution.

Pipeline processing is a common technique that is used to achieve parallel processing. Pipelining is a technique of decomposing a sequential process into sub-operations. Each sub-operation is executed concurrently in a special dedicated segment with all other segments. The result obtained from the computation in each segment is transferred to the next segment in the pipeline. The final result is obtained after the data has passed through all segments.

Pipelining also consists of vector processing, scalar processing, etc. There are so many computational problems that can not be solved by conventional computers. These problems can be formulated in terms of vectors and matrices. This capability of a computer to solve computational problems is known as vector processing capability. Computers with vector processing capabilities are required in specialized applications.

What is Pipeline

In Computer Architecture Pipeline is an implementation technique where multiple instructions are overlapped in execution. The computer pipeline is divided into stages. Each stage completes a part of an instruction in parallel. The stages are connected one to the next to form a pipe. Instructions are entered at one end, progress through the stages, and exit at the other end. The basic idea of a pipeline structure is shown below image.

To apply the concept of instruction execution in a pipeline, it is required to break the instruction into different tasks. Each task will be executed in different processing elements of the CPU. In the simplest form, there are two distinct phases of an instruction execution.

Instruction Fetch (IF)
Instruction Execution (EX)

Therefore, the processor executes a program by fetching and executing instructions, one after another. Consider that IF_i and EX_i represent the instruction fetch and instruction execution phases of instruction i. Then, the execution of a program consists of a sequence of fetch and execute steps is shown in the figure below.

Sequential Exeuction in Pipeline — Sequential Execution in Pipeline

Since there are two phases (fetch and execute) in the execution of an instruction, the CPU must contain two separate hardware units to implement the pipeline architecture. Two separate hardware units perform instruction fetching and instruction execution independently. The fetch unit fetches the instruction from memory and stores it in an intermediate storage buffer. This hardware organization is shown in the figure below.

The pipeline structure performs the parallel execution of different phases of instruction in different hardware units. For simplicity, it is assumed that the fetch and execute steps of any instruction can be completed in one clock cycle. This idea of program execution with three instructions is shown in the figure.

Parallel execution in pipeline — Parallel execution in a pipeline

As shown in the above figure, the program execution starts with the instruction fetch phase for instruction 1. During clock 1, the fetched instruction is also stored in the intermediate buffer. Now the fetch unit is free to fetch the next instruction.

In clock 2, the execution unit executes the instruction that is stored in the buffer and fetch unit, and fetches the next instruction in parallel. Now next instruction is stored in the intermediate buffer. In this way, a parallel mechanism is applied in designing the pipeline structure.

The processing of an instruction need not be divided into only two steps. To gain further speed up, the pipeline must have more stages. Let us consider the following decomposition of the instruction execution.

Instruction Fetch (IF)
Instruction Decode (ID)
Calculate Effective Address (EA)
Execution (EX)
Store Result (ST)

Now, there are six different phases. These can be performed by different hardware units. However, all six phases do not have equal duration; there will be some waiting time involved in various stages of pipelining. The figure below shows the pipeline execution with six phases.

Limitations in Pipeline Implementation

Pipeline implementation plays an important role in increasing the processor speed. But some limitations cause the pipeline processing to deviate from its normal operation.

Resource Conflict: If the computer is not using separate instruction and data memories, it may be possible that two segments access the same memory at the same time. This situation can generate erroneous results and must be avoided.
Data Dependency: This conflict arises when an instruction depends on the result of a previous instruction, but this result is not yet available.
Branch difficulties: These difficulties arise if the program contains the branching instruction. Branching instructions change the contents of the program counter and may change the sequencing of the program.

Pipeline Performance Measurement

The pipeline technique provides faster operation than serial processing. However, the maximum theoretical speed is never fully achieved. There are various reasons for this. Different segments may take different times to complete their sub-operation. This causes a wait for one segment for another to complete the sub-operation. This results in a reduction in efficiency of the pipeline architecture. There are various parameters that define the efficiency of the pipeline.

Speed-up

The speedup of a pipeline processing over an equivalent nonpipeline processing is defined as the ratio.

S= nt_n / (k + n-1)t_p

where n=number of sub-operations
k=number of segments in pipeline.
t_p=clock cycle time to complete each sub-operation in the pipeline process.
t_n=clock cycle time to complete the operation in nonpipeline process.

Case-I: If the number of sub-operations is increased, n becomes much larger than k-1. It means

k+n-10 < n

Therefore

S = nt_n / (k + n – 1)t_p = nt_n / nt_p

Case-II: If the time taken to process a sub-operation is the same in the pipeline and non-pipeline processes, it can be written as

1xt_n = k x t_p (Since there is only one segment in the non-pipeline process)

Substituting this assumption into the speed-up formula of case 1, we get

S_max = kt_p / t_p = k

This shows that the maximum speed-up that a pipeline can achieve is equal to the number of segments (k).

Note: A k-segment pipeline can be expected to equal in performance of k copies of an equivalent nonpipeline under equal operating conditions.

Pipeline Efficiency

The efficiency of a pipeline architecture is defined as the ability of the pipeline to handle the parallel computation of data. To make pipelines work effectively, three simple modifications are added to the internal architecture of the pipeline.

Uniform Instruction Length: All instructions have a uniform byte length. This means loading the instructions is always the same, and decoding the instructions is straightforward.
Simple Addressing Modes: Only simple addressing modes are allowed. Complicated memory calculations are not allowed in any single program step.
Simple Load/Store Modes: Only simple load and store commands are allowed. There are no complicated multi-cycle load or store commands in the processor.

All these modifications are for the sake of efficiency. This has no real effect on the types of programs allowed in a high-level language. It only impacts how the compiler translates the program into machine code.

Throughput

Throughput of a pipeline architecture is defined as the amount of processing that can be accomplished during a given interval of time. In other words, it can be considered as the number of instructions that can be executed in a unit of time.

The throughput of a pipeline architecture can be increased by increasing the number of segments, but it also increases the hardware cost, and with it, the cost of the system also increases. However, technological developments have reduced the hardware costs to the point where pipeline architectures with higher throughput are economically feasible.

In general, it may be thought that the pipeline throughput should always be 1 instruction/clock because a pipelined processor completes a new instruction at the end of each clock cycle. But practically, when the pipeline is being filled, the throughput is zero instructions/clock. In contrast, when the pipeline is full, the pipelined processor has a throughput of 1 instruction/clock. Just like this, there is a difference between maximum theoretical throughput and practical average throughput. Therefore, related to a pipelined processor, there are three types of throughput.

Instruction Throughput: It is the number of instructions that the processor completes executing on each clock cycle. It is also known as instructions per clock (IPC)
Maximum Theoretical Instruction Throughput: It is the theoretical maximum number of instructions that the processor can complete executing on each clock cycle. For the simple kinds of pipelined and non-pipelined processors, this is always 1 instruction/clock (1 IPC).
Average Instruction Throughput: This is the average number of instructions per clock (IPC) that the processor has actually completed over a certain number of cycles.

A processor’s instruction throughput is closely tied to its instruction completion rate. A higher instruction throughput translates into a higher instruction completion rate, and hence better performance.

Bottleneck Problem

The bottleneck problem of pipeline architecture is related to the amount of load assigned to a stage in the pipeline. If a high load is applied to one stage, the time to complete an operation at that stage becomes unacceptably long. This relatively long time spent by the instruction at one stage will inevitably create a bottleneck in the pipeline.

The bottleneck is the slowest stage in the pipeline. In such a system, it is better to remove the bottleneck that is the source of congestion. Therefore, the first step to optimize the pipeline is finding the bottleneck. In a program, if processor utilization is 100%, this stage may be the bottleneck, unless there is a busy-wait

One solution to the bottleneck problem is to further subdivide the stage or build multiple copies of this stage into the pipeline.

Note: A performance improvement in the bottleneck will result in a speedup. Performance improvement in the other areas will not result in speedup.

Related questions and answers

How can you increase the speed of conventional computers?

The speed of conventional computers can be increased by performing simultaneous data processing. This is known as parallel processing.

What is the basic definition of pipelining?

Pipelining is defined as a technique that decomposes the sequential process into sub-operations.

What is the idea of pipeline structure?

The computer pipeline is divided into stages. Each stage completes a part of an instruction in parallel. The stages are connected one to the next to form a pipe. Instructions are entered at one end, progress through the stages, and exit at the other end.

Define the speed-up ratio of pipeline processing over non-pipeline processing.

The speed-up of a pipeline processing over an equivalent non-pipeline processing is defined as the ratio.
S = nt_n / (k + n – 1) t_p

Give the formula for maximum speedup

S_max = k
Where k is the number of segments in the pipeline structure.

What is pipeline efficiency?

The efficiency of a pipeline architecture is defined as the ability of the pipeline to handle the parallel computation of data.

What are three modifications required in pipeline architecture for better efficiency?

Three simple modifications are added to the internal architecture of the pipeline to improve the efficiency:
(i) Uniform instruction length
(ii) Simple addressing modes
(iii) Simple load/store modes.

Define the throughput of a pipeline.

Throughput of a pipeline architecture is defined as the amount of processing that can be accomplished during a given interval of time.

What are the types of throughput in pipelining?

There are three types of throughput in pipelining.
(i) Instruction throughput
(ii) Maximum theoretical instruction throughput
(iii) Average instruction throughput.

What is the bottleneck problem in pipeline architecture?

If a high load is applied to one stage, the time to complete an operation at that stage becomes unacceptably long. This problem is known as the bottleneck problem. The bottleneck is the slowest stage of the pipeline.

Pipeline in Computer Architecture

What is Pipeline

Limitations in Pipeline Implementation

Pipeline Performance Measurement

Speed-up

Pipeline Efficiency

Throughput

Bottleneck Problem

Pages

Our Tools

Engineering Subjects

Programming Tutorials

NCERT