MM Revisited
If A is an M*K matrix, and B is an K*N matrix, the matrix product(multiplication) C=A*B is defined by a M*N matrix. Every element of C is the dot product of a row in A and a column in B. Essentially, the operation collapses A along the column dimension and B along the row dimension into scalar. The row dimension of A and column dimension of B are kept as the row and column dimensions of C correspondingly.
Element-wise Approach
When implementing a matrix product, deep nested loops are often observed in code inevitably. It is quite easy to get lost in the complexity and miss the underlaying theoretical principle. At this time, to gain a visual effect in mind is beneficial to understand what is going on. To build such intuition, we could think matrix multiplication in this way: for each element C[i][j] in result C, trace back and find out all relevant elements in A and B which contribute to it.

Submatrices Approach
A more general way to compute matrix product is by submatrices, which produces a submatrix(subarea) of C instead of one single element at once. But the principle is exactly the same. Just think elements in the above image as submatrices, the calculation of submatrix C[i][j] involves submatrices in i-th row of A and submatrices in j-th column of B. The vector dot product which reduces products of every two scalar numbers now changes to element-wise accumulation operations which reduce over many matrix products(also matrices) of every two matrices-one from A and the other from B.

As an example, if A is divided into many 2*3 submatrices, B is divided into many 3*4 submatrices, C is composed of many 2*4 submatrices, each of which is the element-wise accumulation over many product matrices of a 2*3 submatrix from A and 3*4 submatrix from B. When it is implemented by code, there will be six layers of loops.
Compute C Column-by-column
The i-th column of C is the weighted linear combination of all A's column vectors. And the weight scalars come from the i-th column of B.
Compute C Row-by-row
The i-th row of C is the weighted linear combination of all B's row vectors. And the weight scalars come from the i-th row of A.
Last updated