from @says who — 11/01/2023 7:35 PM
Hey everyone, super new to this community but love what you are doing.
Tested a few things today and it seems like if I run matrix multiplication example with matrices 2 times and 4 times bigger (than in the example) I get the following results:
for:
alias M = 1024
alias N = 1024
alias K = 8192
I have:
Python: 0.006 GFLOPS
Numpy: 103.860 GFLOPS
Naive: 6.159 GFLOPS 1098.38x Python 0.06x Numpy
Vectorized: 21.922 GFLOPS 3909.32x Python 0.21x Numpy
Parallelized: 38.152 GFLOPS 6803.62x Python 0.37x Numpy
Tiled: 45.235 GFLOPS 8066.78x Python 0.44x Numpy
Unrolled: 51.537 GFLOPS 9190.62x Python 0.50x Numpy
Accumulated: 114.204 GFLOPS 20366.04x Python 1.10x Numpy
for:
alias M = 2048
alias N = 2048
alias K = 16384
I have:
Python: 0.006 GFLOPS
Numpy: 137.653 GFLOPS
Naive: 6.162 GFLOPS 1097.77x Python 0.04x Numpy
Vectorized: 22.440 GFLOPS 3997.86x Python 0.16x Numpy
Parallelized: 46.422 GFLOPS 8270.59x Python 0.34x Numpy
Tiled: 41.251 GFLOPS 7349.28x Python 0.30x Numpy
Unrolled: 46.231 GFLOPS 8236.54x Python 0.34x Numpy
Accumulated: 86.952 GFLOPS 15491.36x Python 0.63x Numpy
Is this scaling issue a bug or a feature, or am I missing something? Thank you!
MacOS, M1pro chip