What is SIMD

Programs or sections of a program have different characteristics. Some of them highlight on logic controls while the others highlight on numerical calculations. Here are the typical examples:

  • The Euclid's gcd algorithm is mainly about logic controls, including conditional jumps and recursive function calls.

  • The vector product algorithm repeats simple multiplications and summations on groups of numbers. It is implemented by a for-loop.

# gcd
def gcd(a:int, b:int):
    if (a == 0) 
        return b
    return gcd(b % a, a)

# inner product
def product(a:List[int], b:List[int], n:int):
    res = 0
    for i in range(n):
        res += a[i] * b[i]
    return res

For a long time, CPUs are designed for general computations where logic controls are the essential part. But applications such as image processing boost the need for faster computations over vectors and matrices, where GPUs are well-known killers. In fact, CPUs also develop and evolve similar capabilities called Single Instruction Multiple Data (SIMD), although not so powerful as specialized hardware like GPUs or FPGAs.

Traditional CPU instructions operate on a single group of data(often one or two operands). Rather, SIMD extended instructions perform the same operation on multiple independent groups of data contained in larger registers, once at a time, to achieve peak of performance. GPUs also benifit from similar ideas, but with much greater number of parallel data blocks [1].

simd

Intuitively, SIMD can be regarded as a kind of data level parallelism provided by microprocessors. If n pieces of data can be processed at once, the time will be reduced to 1/n that of non-SIMD instructions theoretically. But accurate numbers should come from hardware specifications like CPI which will be disscused later, or empirical tests.

References

[1] SIMD Programming, UCSB CS 240A, Applied Parallel Computing, Fall Quarter, 2017arrow-up-right [cache]arrow-up-right

Last updated