Adding Two Vectors

Adding Two Vectors

Task: Add two std::vector<float>s; since result of each iteration is independent of the previous (causal independence) - they can be computed in SIMD.


Results

My M1 Mac (with 8 Gigs of memory) started to tap out as I approached 2e8 elements. And the results were pretty inconsistent.

BenchmarkTime (ns)CPU (ns)Iterations
BM_CPU/1000000001.2876e+101.2803e+101
BM_CPU/2000000002.5761e+102.5589e+101
BM_Metal/1000000001.3022e+101.2469e+101
BM_Metal/2000000002.6355e+102.5305e+101

So I reran the program on a stronger machine - M3 Pro with 18Gig of memory:

BenchmarkTimeCPUIterations
BM_CPU/1000000009903958917 ns9868660000 ns1
BM_CPU/2000000001.9523e+10 ns1.9491e+10 ns1
BM_Metal/1000000009591659916 ns9466587000 ns1
BM_Metal/2000000002.0187e+10 ns1.9418e+10 ns1

The difference is still not clear. My guess is that since I'm using floats here, the program itself has become memory-bound. I can only confirm this by profiling the GPU (perhaps using Xcode Instruments) but I'm yet to learn how to do that.

What exactly is a data buffer?

We created a data buffer - this is to load the data into the GPU's memory.

Ig it's similar to prefetching - but I'll have to check on that.

Now when you use MTL::ResourceStorageModeShared, the buffer is shared between the CPU and the GPU. Therefore, there's no need to copy data between them. See other paradigms.