Commit Graph

1 Commits

Author SHA1 Message Date
Scott Lamb 0aadf227c1 Benchmark & speed up SampleIndexIterator
I'm seeing what is possible performance-wise in the current C++ before
trying out Go and Rust implementations.

* use the google benchmark framework and some real data.

* use release builds - I hadn't done this in a while, and there were a
  few compile errors that manifested only in release mode. Update the
  readme to suggest using a release build.

* optimize the varint decoder and SampleIndexIterator to branch less.

* enable link-time optimization for release builds.

* add some support for feedback-directed optimization. Ideally "make"
  would automatically produce the "generate" build outputs with a
  different object/library/executable suffix, run the generate
  benchmark, and then produce the "use" builds. This is not that fancy;
  you have to run an arcane command:

  alias cmake='cmake -DCMAKE_BUILD_TYPE=Release'
  cmake -DPROFILE_GENERATE=true -DPROFILE_USE=false .. && \
  make recording-bench && \
  src/recording-bench && \
  cmake -DPROFILE_GENERATE=false -DPROFILE_USE=true .. && \
  make recording-bench && \
  perf stat -e cycles,instructions,branches,branch-misses \
      src/recording-bench --benchmark_repetitions=5

  That said, the results are dramatic - at least 50% improvement. (The
  results weren't stable before as small tweaks to the code caused a
  huge shift in performance, presumably something something branch
  alignment something something.)
2016-05-19 22:53:23 -07:00