Part 1, Part 2, and Part 3 of this series provided an introduction to profiling and showed how to setup VTune. The first optimization was discussed in Part 4, in which the number of times printf is executed is reduced. The second optimization was discussed in Part 5, in which strlen got replaced with a much cheaper alternative. The third optimization was discussed in Part 6, in which the amount of computation required to report progress is reduced. The third optimization was discussed in Part 7, in which the function do_pswd was inlined into its caller. The following chart shows by how much each optimization improved password cracking throughput.
All of the four optimizations were significant, but the printf one resulted in the greatest enhancement. In general, it’s recommended to either reduce I/O, use asynchronous I/O, or perform I/O operations in dedicated threads. These include graphics drawing. It was also interesting to see that the compiler failed to realize the importance of inlining do_pswd and that we had to do that manually. Perhaps, if we used profile-guided optimization, it would have figure it out by itself.
I refrained from getting into algorithmic, microarchitectural, and parallelization optimizations in this series to keep it short and simple. Maybe I’ll discuss them in future articles. VTune can certainly be used for these purposes too.