Be careful when using MMX or Float point instructions in Delphi 2007


From Delphi 2007 (or even 2006), FastCode project (include FastMM) is used in the core code. This is very helpful to improve the application's performance.

However, some of the FastCode procedures use float point (X87) instructions to perform fast memory access. For example, FillChar uses FLD/FST to fill memory. Another example, there is an internal procedure called MoveX16L4 in FastMM (in file getmem.inc), it's used when reallocating memory, and it also uses X87 instructions.

There is nothing wrong for a procedure to use float instructions, however, because those code is used as the most kernel layer (FillChar is used heavily, implicitly or explicitly), it may get you trouble if you are manually writing MMX or float instructions.

What's the trouble? There are only eight FPU data registers (stack) and many MMX instructions will set all of them to "used", then if any FillChar or memory reallocating is called before an EMMS instruction (reset FPU registers to "empty"), a "Float Stack Overflow" exception will be triggered.

Suppose a piece of code is used to alpha-blend an image pixel by pixel, the pseudo code may look like (the code is from a real project written by me),
for X := 0 to W - 1 do
begin
  for Y := 0 to H - 1 do
  begin
    BlendPixel(X, Y);
  end;
end;
EMMS; //quit from MMX.
BlendPixel uses MMX, and for performance purpose it doesn't invoke EMMS itself.
Then in those two loops or just before EMMS, if any FillChar or memory reallocating occurs, a "Stack Overflow" exception will be raised.

This can be same with float point instructions. If any FillChar or memory reallocating occurs when the float data stack is full, exception will be raised.

So if you are optimizing code performance by using MMX, or doing much float point calculation, be careful, take this hint in mind.

Before Delphi 2007, I didn't realize that code like FillChar or memory management has any connection with FPU, now it's time to change my (maybe also yours) mind. But in my opinion, heavily using FPU instructions in the core runtime code is not very good.

I found this problem when debugging Denomo, because Denomo will reallocate memory even in FreeMem, this problem is more significant.

Write a comment

  • Required fields are marked with *
  • Security Code is case sensitive, 'A' is different with 'a'.

If you have trouble reading the code, click on the code itself to generate a new random code.
Security Code: