Performance

The benches directory contains three benchmarks of different natures, each comparing four different methods of accessing a traited struct of an arbitrary type. The four methods are as follows:

NameDescription
boxdynThe easiest way to access a struct, using a heap allocation and dynamic dispatch.
refdynAccesses the struct by reference, but still using dynamic dispatch. No heap allocation.
customderiveUses a similar macro approach from the external enum_derive crate, which implements a method that returns an inner type as a dynamic trait object.
enumdispatchImplemented using this crate.

The benchmarks

The following benchmark results were measured on a Ryzen 7 2700x CPU.

compiler_optimized

The first set of benchmarks creates trait objects and measures the speed of accessing a method on them.

test benches::boxdyn_compiler_optimized       ... bench:   2,135,418 ns/iter (+/- 12,575)
test benches::customderive_compiler_optimized ... bench:   2,611,860 ns/iter (+/- 18,644)
test benches::enumdispatch_compiler_optimized ... bench:           0 ns/iter (+/- 0)
test benches::refdyn_compiler_optimized       ... bench:   2,132,591 ns/iter (+/- 22,114)

It’s easy to see that enum_dispatch is the clear winner here!

Ok, fine. This wasn’t a fair test. The compiler is able to “look through” the trait method call in the enum_dispatch case, notices that the result is unused, and removes it as an optimization. However, this still highlights an important property of enum_dispatched types: the compiler is able to infer much better optimizations when possible.

blackbox

The next set of benchmarks uses the test::black_box method to hide the fact that the result of the method is unused.

test benches::boxdyn_blackbox       ... bench:   2,131,736 ns/iter (+/- 24,937)
test benches::customderive_blackbox ... bench:   2,611,721 ns/iter (+/- 23,502)
test benches::enumdispatch_blackbox ... bench:     471,740 ns/iter (+/- 1,439)
test benches::refdyn_blackbox       ... bench:   2,131,978 ns/iter (+/- 21,547)

The competitors faced virtually no impact, whereas enum_dispatch takes the full force of the black_box call. This test shows the power that avoiding dynamic dispatch gives to the compiler in the context of the previous test, but also demonstrates how much faster enum_dispatch is in real code: almost 5 times faster than the closest alternative.

homogenous_vec

The final set of benchmarks puts 1024 traited structs of arbitrary types at random into a Vec and measures the time it takes to successively iterate over the entire Vec, calling black_boxed methods on each element.

test benches::boxdyn_homogeneous_vec       ... bench:   5,900,191 ns/iter (+/- 95,169)
test benches::customderive_homogeneous_vec ... bench:   4,831,808 ns/iter (+/- 140,437)
test benches::enumdispatch_homogeneous_vec ... bench:     479,630 ns/iter (+/- 3,531)
test benches::refdyn_homogeneous_vec       ... bench:   5,658,461 ns/iter (+/- 137,128)

This might be one of the most likely use cases for traited structs of arbitrary types, and it’s where enum_dispatch really shines. Since a Vec of enum_dispatch objects is actually a Vec of enums rather than addresses, accessing an element takes half the indirection of the other techniques. Add that to the lack of vtable accesses, and we have a result that is 10 times faster than the closest alternative, and almost 12 times faster than the best technique from the standard library.