moonfire-nvr

Commit Graph

Author	SHA1	Message	Date
Scott Lamb	0aadf227c1	Benchmark & speed up SampleIndexIterator I'm seeing what is possible performance-wise in the current C++ before trying out Go and Rust implementations. * use the google benchmark framework and some real data. * use release builds - I hadn't done this in a while, and there were a few compile errors that manifested only in release mode. Update the readme to suggest using a release build. * optimize the varint decoder and SampleIndexIterator to branch less. * enable link-time optimization for release builds. * add some support for feedback-directed optimization. Ideally "make" would automatically produce the "generate" build outputs with a different object/library/executable suffix, run the generate benchmark, and then produce the "use" builds. This is not that fancy; you have to run an arcane command: alias cmake='cmake -DCMAKE_BUILD_TYPE=Release' cmake -DPROFILE_GENERATE=true -DPROFILE_USE=false .. && \ make recording-bench && \ src/recording-bench && \ cmake -DPROFILE_GENERATE=false -DPROFILE_USE=true .. && \ make recording-bench && \ perf stat -e cycles,instructions,branches,branch-misses \ src/recording-bench --benchmark_repetitions=5 That said, the results are dramatic - at least 50% improvement. (The results weren't stable before as small tweaks to the code caused a huge shift in performance, presumably something something branch alignment something something.)	2016-05-19 22:53:23 -07:00

Author

SHA1

Message

Date

Scott Lamb

0aadf227c1

Benchmark & speed up SampleIndexIterator

I'm seeing what is possible performance-wise in the current C++ before
trying out Go and Rust implementations.

* use the google benchmark framework and some real data.

* use release builds - I hadn't done this in a while, and there were a
  few compile errors that manifested only in release mode. Update the
  readme to suggest using a release build.

* optimize the varint decoder and SampleIndexIterator to branch less.

* enable link-time optimization for release builds.

* add some support for feedback-directed optimization. Ideally "make"
  would automatically produce the "generate" build outputs with a
  different object/library/executable suffix, run the generate
  benchmark, and then produce the "use" builds. This is not that fancy;
  you have to run an arcane command:

  alias cmake='cmake -DCMAKE_BUILD_TYPE=Release'
  cmake -DPROFILE_GENERATE=true -DPROFILE_USE=false .. && \
  make recording-bench && \
  src/recording-bench && \
  cmake -DPROFILE_GENERATE=false -DPROFILE_USE=true .. && \
  make recording-bench && \
  perf stat -e cycles,instructions,branches,branch-misses \
      src/recording-bench --benchmark_repetitions=5

  That said, the results are dramatic - at least 50% improvement. (The
  results weren't stable before as small tweaks to the code caused a
  huge shift in performance, presumably something something branch
  alignment something something.)

2016-05-19 22:53:23 -07:00

1 Commits