Adding Two Vectors

21 June, 2025

Task: Add two std::vector<float>s; since result of each iteration is independent of the previous (causal independence) - they can be computed in SIMD.

Results¶

My M1 Mac (with 8 Gigs of memory) started to tap out as I approached 2e8 elements. And the results were pretty inconsistent.

Benchmark	Time (ns)	CPU (ns)	Iterations
BM_CPU/100000000	1.2876e+10	1.2803e+10	1
BM_CPU/200000000	2.5761e+10	2.5589e+10	1
BM_Metal/100000000	1.3022e+10	1.2469e+10	1
BM_Metal/200000000	2.6355e+10	2.5305e+10	1

So I reran the program on a stronger machine - M3 Pro with 18Gig of memory:

Benchmark	Time	CPU	Iterations
BM_CPU/100000000	9903958917 ns	9868660000 ns	1
BM_CPU/200000000	1.9523e+10 ns	1.9491e+10 ns	1
BM_Metal/100000000	9591659916 ns	9466587000 ns	1
BM_Metal/200000000	2.0187e+10 ns	1.9418e+10 ns	1

The difference is still not clear. My guess is that since I'm using floats here, the program itself has become memory-bound. I can only confirm this by profiling the GPU (perhaps using Xcode Instruments) but I'm yet to learn how to do that.

What exactly is a data buffer?¶

We created a data buffer - this is to load the data into the GPU's memory.

Ig it's similar to prefetching - but I'll have to check on that.

Now when you use MTL::ResourceStorageModeShared, the buffer is shared between the CPU and the GPU. Therefore, there's no need to copy data between them. See other paradigms.