RISC Pipeline in Computer Architecture YASH PAL, March 4, 2026March 4, 2026 RISC Pipeline in Computer Architecture – The ability to use the instruction pipelining concept in the RISC architecture is very efficient. The simplicity of the instruction set can be utilized to implement an instruction pipeline using a small number os sub operations, with each being executed in one clock cycle. Due to fixed length instruction format, the decoding of the operation can occur at the same time as the register selection. Since the arithmetic, logic, and shift operations are done on a register basis, there is no need for extra fetching or effective address decoding steps to perform the operation. So the pipelining concept can be effectively used in this scenario. Therefore, the total operations can be categorized as one segment will fetch the instruction from program memory, another segment executes the instruction in the ALU, and the third segment may be used to store the result of the ALU operation in a destination register. The data transfer instructions in RISC are limited to only Load and Store instructions. To prevent conflicts in data transfer, we will be using two separate buses, one for storing the instructions and the other for storing the data. The two memories can sometimes operate at the same speed as the CPU clock and are referred to as cache memories. RISC pipelining is the ability to execute instructions at the rate of one per clock cycle, and it is supported by the compiler that translates the high-level language into a machine language program. Example of three segment instruction pipeline: We want to perform an operation in which there are some arithmetic, logic, or shift operations. Therefore, as per the instruction cycle, we will be having the following steps: I – Instruction fetch A – ALU Operation E – Execute Instruction The I (IF) segment will be fetching the instruction from program memory. The instruction is decoded, and an ALU operation is performed in the A (ALU) segment. In the A segment, the ALU operation instruction will be fetched, and the effective address will be retrieved, and finally, in the E (EI) segment, the instruction will be executed. Delayed Load: Consider the following instructions: LOAD: R1 ← M[address 1] LOAD: R2 ← M[address 2] ADD: R3 ← R1 + R2 STORE: M[address 3] ← R3 The tables below will show the pipelining concept with data conflict and without data conflict. Clock cycles1234561. Load R1IAE2. Load R2IAE3. Add R1 + R2IAE4. Store R3IAEPipeline timing with data conflict Clock cycles12345671. Load R1IAE2. Load R2IAE3. No-operationIAE4. Store R1 + R2IAE5. Store R3IAEPipeline timing with delayed load Both tables show the three-segment pipelining timing Delayed Branch: The method used in most RISC processors is to rely on the compiler to redefine the branches so that they affect at the proper time in the pipeline. This method is referred to as a delayed branch. An example of a delayed branch is show below: Clock cycles:123456789101. LoadIAE2. IncrementIAE3. AddIAE4. SubtractIAE5. Branch to XIAE6. No-operationIAE7. No-operationIAE8. Instruction in XIAEUsing no-operation instructions Clock cycles:123456781. LoadIAE2. IncrementIAE3. Branch to XIAE4. AddIAE5. SubtractIAE6. Instruction in XIAERearranging the instructions The program for this example consists of five instruction- Load from memory to R1 Increment R2 Add R3 to R4 Subtract R5 from R6 Branch to address X The compiler inserts two no-op instructions after the branch. The branch address X is transferred to the PC in clock cycle 7. The fetching of the instruction at X is delayed by two clock cycles by the no-op instructions. The instruction at X starts the fetch phase at clock cycle 8 after the program counter PC has been updated. The program in “Rearranging the instructions” table is rearranged by placing the add and subtract instructions after the branch instruction instead of before, as in the original program. Inspection of the pipeline timing shows that PC is updated to the value of X in clock cycle 5. but the add and subtract instructions are fetched from memory and executed in the proper sequence. Computer System Architecture engineering subjects Computer System Architecture