The Significance of the x86 LFENCE Instruction

The x86 ISA currently offers three “fence” instructions: MFENCE, SFENCE, and LFENCE. Sometimes they are described as “memory fence” instructions. In some other architectures and in the literature about memory ordering models, terms such as memory fences, store fences, and load fences are used. The terms “memory fence” and “load fence” have not been used in the Intel Manual Volume 3, but they have been used in the Intel Manual Volume 2 and in the AMD manuals a couple of times. I’ll focus in this article on “load fences”. Throughout this article, I’ll be referring to the latest Intel and AMD manuals at the time of writing this article.

The fact that the term “load fence” has been used in different ISAs, textbooks, and research papers has resulted in a critical misunderstanding of the x86 LFENCE instruction and confusion regarding what it does and how to use it. Continue reading


Defining Static Single Assignment

Most compilers convert the input source code into one or more intermediate representations (IRs) to make it easier and faster to analyze and optimize the code. Static single assignment (SSA) is a property of IRs that helps in not only simplifying the algorithms that analyze the code, but also improve their results at the same time, leading to more effective and efficient optimizations. The definition of SSA according to Wikipedia is currently as follows: Continue reading

The Art of Profiling Using Intel VTune Amplifier, Part 8 (Final)

Part 1, Part 2, and Part 3 of this series provided an introduction to profiling and showed how to setup VTune. The first optimization was discussed in Part 4, in which the number of times printf is executed is reduced. The second optimization was discussed in Part 5, in which strlen got replaced with a much cheaper alternative. The third optimization was discussed in Part 6, in which the amount of computation required to report progress is reduced. The third optimization was discussed in Part 7, in which the function do_pswd was inlined into its caller. The following chart shows by how much each optimization improved password cracking throughput. Continue reading

The Art of Profiling Using Intel VTune Amplifier, Part 7

Previous parts of this series can be found at the following links: Part 1, Part 2, Part 3, Part 4, Part 5, and Part 6.

In Part 6, execution time was improved by 14% and the password cracking throughput became around 150 million passwords per second. Recall from Part 1 that the baseline throughput was around 3 million passwords per second. We have come a long way and we can still do more with the help of VTune. Continue reading

Why is JavaScript designed the way it is?


JavaScript is the programing language that makes the Web alive by integrating the HTML pages with client-side JavaScript code. Considering all the Web, I would expect that it contains billions of lines of JavaScript code. In addition, billions of people browse the Web every day. Therefore, designing and maintaining this language is an extremely critical undertaking. It pretty much determines the amount of effort and money required to develop dynamic Web pages. Continue reading