MM Revisited

If A is an M*K matrix, and B is an K*N matrix, the matrix product(multiplication) C=A*B is defined by a M*N matrix. Every element of C is the dot product of a row in A and a column in B. Essentially, the operation collapses A along the column dimension and B along the row dimension into scalar. The row dimension of A and column dimension of B are kept as the row and column dimensions of C correspondingly.

Element-wise Approach

When implementing a matrix product, deep nested loops are often observed in code inevitably. It is quite easy to get lost in the complexity and miss the underlaying theoretical principle. At this time, to gain a visual effect in mind is beneficial to understand what is going on. To build such intuition, we could think matrix multiplication in this way: for each element C[i][j] in result C, trace back and find out all relevant elements in A and B which contribute to it.

for a.row in [0, M):     # every row in A
  for b.col in [0, N):   # every col in B
    for k in [0, K):     # reduce to accumulate c[a.row][b.col]
      C[a.row][b.col] += A[a.row][k] * B[k][b.col]

Submatrices Approach

A more general way to compute matrix product is by submatrices, which produces a submatrix(subarea) of C instead of one single element at once. But the principle is exactly the same. Just think elements in the above image as submatrices, the calculation of submatrix C[i][j] involves submatrices in i-th row of A and submatrices in j-th column of B. The vector dot product which reduces products of every two scalar numbers now changes to element-wise accumulation operations which reduce over many matrix products(also matrices) of every two matrices-one from A and the other from B.

As an example, if A is divided into many 2*3 submatrices, B is divided into many 3*4 submatrices, C is composed of many 2*4 submatrices, each of which is the element-wise accumulation over many product matrices of a 2*3 submatrix from A and 3*4 submatrix from B. When it is implemented by code, there will be six layers of loops.

for a_r.outer in [0, M/2):        # divide row of A into blocks of 2
  for b_c.outer in [0, N/4):      # divide col of B into blocks of 4
    for k.outer in [0, K/3):      # divide col of A, row of B into blocks of 3
      for a_r.inner in [0, 2):    # every submatrix of A * submatrix of B
        for b_c.inner in [0, 4):    
          for k.inner in [0, 3):
            C[a_r.outer*2+a_r.inner][b_c.outer*4+b_c.inner] += \
              A[a_r.outer*2+a_r.inner][k.outer*3+k.inner] * \
              B[k.outer*3+k.inner][b_c.outer*4+b_c.inner]

Compute C Column-by-column

The i-th column of C is the weighted linear combination of all A's column vectors. And the weight scalars come from the i-th column of B.

for b.col in [0, N):
  for a.col in [0, K):
    C[:b.col] += B[a.col][b.col] * A[:a.col]

Compute C Row-by-row

The i-th row of C is the weighted linear combination of all B's row vectors. And the weight scalars come from the i-th row of A.

for a.row in [0, M):
  for b.row in [0, K):
    C[a.row:] += A[a.row][b.row] * B[b.row:]

PreviousFLOPS NextBlocking

Last updated 5 years ago

hashtagElement-wise Approach

hashtagSubmatrices Approach

hashtagCompute C Column-by-column

hashtagCompute C Row-by-row

Element-wise Approach

Submatrices Approach

Compute C Column-by-column

Compute C Row-by-row