Performance
The benches
 directory contains three benchmarks of different natures, each comparing four different methods of accessing a traited struct of an arbitrary type. The four methods are as follows:
Name | Description |
---|---|
boxdyn | The easiest way to access a struct, using a heap allocation and dynamic dispatch. |
refdyn | Accesses the struct by reference, but still using dynamic dispatch. No heap allocation. |
customderive | Uses a similar macro approach from the external enum_derive crate, which implements a method that returns an inner type as a dynamic trait object. |
enumdispatch | Implemented using this crate. |
The benchmarks
The following benchmark results were measured on a Ryzen 7 2700x CPU.
compiler_optimized
The first set of benchmarks creates trait objects and measures the speed of accessing a method on them.
test benches::boxdyn_compiler_optimized ... bench: 2,135,418 ns/iter (+/- 12,575)
test benches::customderive_compiler_optimized ... bench: 2,611,860 ns/iter (+/- 18,644)
test benches::enumdispatch_compiler_optimized ... bench: 0 ns/iter (+/- 0)
test benches::refdyn_compiler_optimized ... bench: 2,132,591 ns/iter (+/- 22,114)
It’s easy to see that enum_dispatch
 is the clear winner here!
Ok, fine. This wasn’t a fair test. The compiler is able to “look through” the trait method call in the enum_dispatch case, notices that the result is unused, and removes it as an optimization. However, this still highlights an important property of enum_dispatch
ed types: the compiler is able to infer much better optimizations when possible.
blackbox
The next set of benchmarks uses the test::black_box
 method to hide the fact that the result of the method is unused.
test benches::boxdyn_blackbox ... bench: 2,131,736 ns/iter (+/- 24,937)
test benches::customderive_blackbox ... bench: 2,611,721 ns/iter (+/- 23,502)
test benches::enumdispatch_blackbox ... bench: 471,740 ns/iter (+/- 1,439)
test benches::refdyn_blackbox ... bench: 2,131,978 ns/iter (+/- 21,547)
The competitors faced virtually no impact, whereas enum_dispatch
 takes the full force of the black_box
 call. This test shows the power that avoiding dynamic dispatch gives to the compiler in the context of the previous test, but also demonstrates how much faster enum_dispatch
 is in real code: almost 5 times faster than the closest alternative.
homogenous_vec
The final set of benchmarks puts 1024 traited structs of arbitrary types at random into a Vec
 and measures the time it takes to successively iterate over the entire Vec
, calling black_box
ed methods on each element.
test benches::boxdyn_homogeneous_vec ... bench: 5,900,191 ns/iter (+/- 95,169)
test benches::customderive_homogeneous_vec ... bench: 4,831,808 ns/iter (+/- 140,437)
test benches::enumdispatch_homogeneous_vec ... bench: 479,630 ns/iter (+/- 3,531)
test benches::refdyn_homogeneous_vec ... bench: 5,658,461 ns/iter (+/- 137,128)
This might be one of the most likely use cases for traited structs of arbitrary types, and it’s where enum_dispatch
 really shines. Since a Vec
 of enum_dispatch
 objects is actually a Vec
 of enums rather than addresses, accessing an element takes half the indirection of the other techniques. Add that to the lack of vtable accesses, and we have a result that is 10 times faster than the closest alternative, and almost 12 times faster than the best technique from the standard library.