Static Dispatch vs. Dynamic Dispatch Benchmarks

Performance

The benches directory contains three benchmarks of different natures, each comparing four different methods of accessing a traited struct of an arbitrary type. The four methods are as follows:

Name	Description
`boxdyn`	The easiest way to access a struct, using a heap allocation and dynamic dispatch.
`refdyn`	Accesses the struct by reference, but still using dynamic dispatch. No heap allocation.
`customderive`	Uses a similar macro approach from the external enum_derive crate, which implements a method that returns an inner type as a dynamic trait object.
`enumdispatch`	Implemented using this crate.

The benchmarks

The following benchmark results were measured on a Ryzen 7 2700x CPU.

`compiler_optimized`

The first set of benchmarks creates trait objects and measures the speed of accessing a method on them.

test benches::boxdyn_compiler_optimized       ... bench:   2,135,418 ns/iter (+/- 12,575)
test benches::customderive_compiler_optimized ... bench:   2,611,860 ns/iter (+/- 18,644)
test benches::enumdispatch_compiler_optimized ... bench:           0 ns/iter (+/- 0)
test benches::refdyn_compiler_optimized       ... bench:   2,132,591 ns/iter (+/- 22,114)

It’s easy to see that enum_dispatch is the clear winner here!

Ok, fine. This wasn’t a fair test. The compiler is able to “look through” the trait method call in the enum_dispatch case, notices that the result is unused, and removes it as an optimization. However, this still highlights an important property of enum_dispatched types: the compiler is able to infer much better optimizations when possible.

`blackbox`

The next set of benchmarks uses the test::black_box method to hide the fact that the result of the method is unused.

test benches::boxdyn_blackbox       ... bench:   2,131,736 ns/iter (+/- 24,937)
test benches::customderive_blackbox ... bench:   2,611,721 ns/iter (+/- 23,502)
test benches::enumdispatch_blackbox ... bench:     471,740 ns/iter (+/- 1,439)
test benches::refdyn_blackbox       ... bench:   2,131,978 ns/iter (+/- 21,547)

The competitors faced virtually no impact, whereas enum_dispatch takes the full force of the black_box call. This test shows the power that avoiding dynamic dispatch gives to the compiler in the context of the previous test, but also demonstrates how much faster enum_dispatch is in real code: almost 5 times faster than the closest alternative.

`homogenous_vec`

The final set of benchmarks puts 1024 traited structs of arbitrary types at random into a Vec and measures the time it takes to successively iterate over the entire Vec, calling black_boxed methods on each element.

test benches::boxdyn_homogeneous_vec       ... bench:   5,900,191 ns/iter (+/- 95,169)
test benches::customderive_homogeneous_vec ... bench:   4,831,808 ns/iter (+/- 140,437)
test benches::enumdispatch_homogeneous_vec ... bench:     479,630 ns/iter (+/- 3,531)
test benches::refdyn_homogeneous_vec       ... bench:   5,658,461 ns/iter (+/- 137,128)

This might be one of the most likely use cases for traited structs of arbitrary types, and it’s where enum_dispatch really shines. Since a Vec of enum_dispatch objects is actually a Vec of enums rather than addresses, accessing an element takes half the indirection of the other techniques. Add that to the lack of vtable accesses, and we have a result that is 10 times faster than the closest alternative, and almost 12 times faster than the best technique from the standard library.

sean's garden

Pages

Graph View

Static Dispatch vs. Dynamic Dispatch Benchmarks

Performance

The benchmarks

`compiler_optimized`

`blackbox`

`homogenous_vec`

Read More

Site Infrastructure

Analytics

Table of Contents

Backlinks

Graph View

sean's garden

Pages

Graph View

Static Dispatch vs. Dynamic Dispatch Benchmarks

Performance §

The benchmarks §

compiler_optimized §

blackbox §

homogenous_vec §

Read More

Site Infrastructure

Analytics

Table of Contents

Backlinks

Graph View

Performance

The benchmarks

`compiler_optimized`

`blackbox`

`homogenous_vec`