Finally I found the time to play with unmanaged arrays and Memory<T>/Span<T>. In order to investigate possible improvements of the data structures used by Svelto.ECS, I wanted to know what the fastest way to set elements in a preallocated array could be. Using the very powerful BenchmarkDotNet and with the help of the .net community, I got my answer:

The code is very straightforward:

Some considerations:

Pinning large arrays is obviously not the way to go, I just wanted to see if there was any difference with using native memory (allocated through AllocHGlobal). As suspected, using unsafe pointers is the fastest option.

Span<T> using native memory is actually faster than I thought! Very close! However, using mono (Unity) I just can’t iterate a span using the operator[] as Mono still doesn’t support the optimized span implementation (still much faster than using a list). Ben Adams, supreme bearer of
.net knowledge, came with the solution that you can check in the “StandardUnmanagedSpanBenInsert” benchmark. However, I have to say, iterating an old plain c# array is not that slow either, which is surprising, isn’t it? Where are the bounding checks that should make c# arrays slower to iterate? Well if you check the asm they are actually not there. The jitter is smart enough to understand that the index can never go outside the bounds so they are disabled! That’s why I had to add a version with random access, which is 10 times slower (however it’s very likely so because of cache misses and not because of random access). Let’s see the assembly:

As you can see the ArrayInsertChecked as one extra compare compared to UnmanagedArrayInsertChecked. However the cpu clocks taken by the extra compare are nothing compared to the time spent in cache misses. Now the results (not these functions iterate 10 times less to have comparable results, however as you can see the time lost inside extra compare is almost nothing compared to cpu cache misses)

Conclusion:

  • Span are very fast with net core, not that fast in net framework, slow with Mono
  • using unsafe pointers is the fastest solution
  • pure arrays are as fast, when they are not checked
  • the jitter is smart!
  • your worst enemy is the cache miss
  • for what I can see, I don’t think is worth to use unmanaged memory for array of int and structs.

Note: all the reasoning above make sense only if we talk about unmanaged structs and value types. For array of objects more complicated operations happen, because of the garbage collector, and the difference may be much more important between managed and unmanaged code. However for my purposes I need to take in consideration only pure value types.

P.S.: I enabled the CoreRT version just for fun, but basically there isn’t any difference. I should check with Unity IL2CPP too.

Leave a Reply