Vector Processing in Computer Architecture

Vector Processing in Computer Architecture – Many scientific problems can not be solved by conventional computers within a reasonable amount of time. To achieve the required level of high performance, it is necessary to utilize the fastest and most reliable hardware and apply innovative procedures.

Vector Processing

In Computer Architecture, the vector processing technique is used to provide high computational capability to the computer system. Computers with vector processing capabilities are required in specialized applications like medical diagnosis, aerodynamics, artificial intelligence, image processing, etc.

A vector computer or vector processor is a machine designed to efficiently handle arithmetic operations on elements of arrays, called vectors. Such machines are especially useful in high-performance scientific computing, where matrix and vector arithmetic are quite common. The cray Y-MP and the convex C3880 are two examples of vector processors used today.

Vector Arithmetic

A vector V of length n is represented as a row vector.

V = [V₁,V₂,V₃……..,V_n]

It may be represented as a column vector if the data are listed in a column.

A conventional sequential computer is capable of processing operands one at a time. In contrast to this, vector processing is performed by breaking the single computation into subscripted variables. When mapping a vector to a computer program, it is declared as an array of one dimension. In Fortran, vector V is declared by the statement

DIMENSION V(N)

where N is an integer variable holding the value of the length of the vector. Arithmetic operations can also be performed on vectors. Two vectors are added by adding the corresponding elements.

S=X+Y=(X₁+Y₁, X₂+Y₂,………….,X_N+Y_N)

In Fortran, vector addition could be performed by the following code.

DO I=1, N
S (I)=X(I)+Y(I)
END DO

Where S is the vector representing the final sum of vector X and vector Y. S, X, and Y vectors have been declared as arrays of dimension N. This operation is sometimes called element-wise addition. Similarly, subtraction of two vectors is also an element-wise operation.

Vector multiplication is also one of the most computationally intensive operations performed in computers. The multiplication of two n x n matrices consists of n² inner products or n³ multiply-add operations. An n x m matrix of numbers has n rows and m columns and may be considered as constituting a set of n row vectors or a set of m column vectors. For example, the multiplication of two 3 x 3 matrices A and B will be written as

[a₁₁ a₁₂ a₁₃] [b₁₁ b₁₂ b₁₃] [c₁₁ c₂₁ c₃₁]
[a₂₁ a₂₂ a₂₃] x [b₂₁ b₂₂ b₂₃] = [c₂₁ c₂₂ c₃₂]
[a₃₁ a₃₂ a₃₃] [b₃₁ b₃₂ b₃₃] [c₃₁ c₂₃ c₃₃]

The product matrix C is a 3 x 3 matrix whose elements are related to the elements of A and B by the inner product.

C_ij = Σ a_ik x b_kj where k = 1 to 3

For example, the number in the first row and first column of matrix C is calculated by letting i=1 and j = 1 to obtain.

C₁₁ = a₁₁ b₁₁ + a₁₂ b₂₁ + a₁₃ b₃₁

This requires multiplication and addition operations. The figure shows the pipeline to calculate an inner product.

Vector processing in computer architecture — Pipeline for calculating an inner product

Memory Interleaving

To allow faster access to vector elements stored in memory, the memory of a vector processor is often divided into memory banks. This is named as interleaved memory. Interleaved memory banks associate successive memory addresses with successive banks cyclically, thus word 0 is stored in bank 0, word 1 is in bank 1, word (n-1) is in bank (n-1), word n is in bank 0, word (n+1) is in bank 1, etc., where n is the number of memory banks. As with many other computer architectural features, n is usually a power of 2:

n = 2^k
where k = 1,2,3,……….

One memory access (load or store) of a data value in a memory bank takes several clock cycles to complete. Each memory bank allows only one data value to be read or stored in a single memory access, but more than one memory bank may be accessed at the same time.

When the elements of a vector stored in an interleaved memory are read into a vector register, the reads are staggered across the memory banks so that one vector element is read from a bank per clock cycle. If one memory access takes a clock cycle, then n elements of a vector may be fetched at one memory access; this is n times faster than the same number of memory accesses to a single bank.

Supercomputers:

A commercial computer with vector instructions and pipelined floating-point arithmetic operations is referred to as a supercomputer. Supercomputers are very powerful, high-performance machines used mostly for scientific computations.

To speed up the operation, the components are packed tightly together to minimize the distance that the electronic signals have to travel. Supercomputers also use special techniques for removing the heat from circuits to prevent them from burning up because of their close proximity.

Some examples of them are:

In radar and signal processing for the detection of space/underwater targets.
In remote sensing for earth resource exploration.
In computational wind tunnel experiments.
In 3D stop-action computer-assisted tomography.
Weather forecasting.
Medical diagnosis.

Vector Instruction Fields:

Vector instructions are usually specified by the following fields –

Operation
code

Base
address 1

Base
address 2

Base
address
destination

Vector
length

Instruction format for vector processor

Opcode (operation code): This field is used to select the functional unit or to reconfigure a multifunctional unit to perform the specified operation.
Base addresses: In case of memory reference instruction, this field specifies the base addresses needed for source operands and result vectors. If the operands and results are located in the vector register file, the designated vector registers must be specified.
Address increment: This field specifies the space between the two elements in the main memory. Usually, the elements are consecutively stored, thus the increment is 1. However, with variable increments, higher flexibility can be offered in the applications.
Address offset: This field specifies the offset to the base address. Using the base address and the offset, the effective memory address can be calculated. The offset can be either positive or negative.
Vector length: This field determines the termination of a vector instruction. Vector length affects the processing efficiency because the additional subdividing is required for long vectors.

Related Questions and Answers

Why does vector processing technique used?

The vector processing technique is used to provide high computational capability to the computer system. Computers with vector processing capabilities are required in specialized applications.

Vector Processing in Computer Architecture