Writing optimizer-friendly code

by Jan-Erik Dahlin, IAR Systems
Writing optimizer-friendly code
An optimizing compiler tries to generate code that is both small and fast. It does so by repeatedly
applying a number of transformations to the source program. Most optimizations follow mathematical or
logical rules based on a sound theoretical foundation. Other transformations are based on heuristics,
where experience has shown that some transformations often result in good code or open up
opportunities for further optimization.
There are very few cases where the optimizer relies on black magic, so the way you write your source
code can determine whether an optimization can be applied to your program or not. Sometimes small
changes in the source code can have a significant impact on how efficient code the compiler can
generate.
This article will list a few things to keep in mind while writing your code, but let's start by making one
thing clear. Trying to write your code on as few lines as possible, using ?:-expressions, postincrements,
and comma expressions to squeeze in a lot of side effects in a single expression, will not make the
compiler generate more efficient code. It will just make your source code convoluted and hard to
maintain. It is easy to overlook a postincrement or an assignment if it is done in the middle of a complex
expression. Write your code in a style that is easy to read.
Loops
What can go wrong in a simple loop like this?
for (i = 0; i != n; ++i)
{
a[i] = b[i];
}
Well, nothing can go wrong but there are several things that affect how efficient code the compiler can
generate.
The type of the index variable should match the pointer.
An array expression like a[i] really means *(&a[0]+i*sizeof(a[0]) or in plain English: add the offset
of the i:th element to a pointer to the first element of a. For pointer arithmetic it is generally a good
idea if the index expression has the same size as the pointer (except for __far pointers where the
pointer and the index expression have different sizes). If the type of the index expression is smaller than
the pointer type, the index expression has to be cast to the correct size before adding it to the pointer. If
stack space is more precious than code size it might be worth choosing a smaller type for the index
variable, but it will often cost both code size and execution time. The casts can also prevent several loop
optimizations.
The loop condition is important too. Many of the loop optimizations can only be performed if the number
of iterations can be computed prior to entering the loop. Unfortunately, it isn't always as simple as
subtracting the initial value from the final value and dividing with the increment. What happens if i is an
unsigned char, n is an int, and n has the value 1000? The variable i will overflow long before it
reaches 1000. It is unlikely that the programmer wanted an infinite loop that repeatedly copies 256
Page 2
elements from b to a, but the compiler cannot divine the programmer's intentions. It has to assume the
worst and cannot apply any of the optimizations that need the trip count prior to entering the loop. You
should also avoid the relation operators <= and >= in loop conditions where the final value is a variable.
If the loop condition is i <= n there is a possibility that n has the highest value representable in the type,
so the compiler has to assume this is a potentially infinite loop.
Aliasing
Global variables are usually considered bad practice. Global variables can potentially be modified from
anywhere in your program, and any part of the program may depend on the value of a global variable.
This can result in complex dependencies making it hard to understand the program and identify which
parts of the program that might be affected if the value of a global variable is changed. From the
optimizer's point of view it is even worse, a store through a pointer can potentially change the value of
any global variable. It is called aliasing when a variable can be accessed in more ways than one, and
aliasing makes the code much harder to optimize.
char *buf
void clear_buf()
{
int i;
for (i = 0; i < 128; ++i)
{
buf[i] = 0;
}
}
Even though the programmer might know that writing to the buffer which buf points to will never change
the variable itself, the compiler has to assume the worst and reload buf from memory on every iteration
of the loop.
Passing the address of the buffer as an argument instead of using a global variable will remove the
aliasing:
void clear_buf(char *buf)
{
int i;
for (i = 0; i < 128; ++i)
{
buf[i] = 0;
}
}
After that small change the pointer buf cannot be affected by the stores through the pointer. The pointer
buf is invariant in the loop and the value can be loaded once, before the loop, instead of reloading it on
each iteration.
Global variables can be useful when passing information between sections of code that don't share a
caller/callee relation. If you do, just use them to pass the information. It is better to use auto variables for
anything computationally intensive, especially when computations also involve pointer operations.
Try to avoid postincrement and postdecrement
In the following, everything said about postincrements applies to postdecrements as well. The C
standard text for the semantics of postincrement states that "The result of the postfix ++ operator is the
value of the operand. After the result is obtained, the value of the operand is incremented". While it is
fairly common that microcontrollers have an addressing mode that can increment a pointer after a load
or store operation, very few microcontrollers can handle postincrement of other types with the same
efficiency. To comply with the standard the compiler may have to copy the operand to a temporary
Page 3
variable before doing the increment. For straight line code the increment can be lifted out of the
expression and placed after the expression. An expression like
foo = a[i++];
can be executed as
foo = a[i];
i = i + 1;
But what happens when the postincrement is a part of the condition in a while loop? There is no place
after the condition where the increment can be inserted so the increment must be done before the test.
A simple loop like
i = 0;
while (a[i++] != 0)
{
...
}
has to be executed as
loop:
temp = i;
/* save the value of the operand */
i = temp + 1;
/* increment the operand */
if (a[temp] == 0)
/* use the saved value */
goto no_loop;
...
goto loop;
no_loop:
or
loop:
temp = a[i];
i = i + 1;
if (temp == 0)
goto no_loop;
...
goto loop;
no_loop:
/* use the value of the operand */
/* increment the operand */
If the value of i after the loop is irrelevant, it is better to place the increment inside the loop body. The
almost identical loop
i = 0;
while (a[i] != 0)
{
++i;
...
}
can be executed without temporary variables:
Page 4
loop:
if (a[i] == 0)
goto no_loop;
i = i + 1;
...
goto loop;
no_loop:
Developers of optimizing compilers are well aware of the complexities introduced by postincrements.
Even though we try our very best to identify these patterns and eliminate as many of the temporaries as
possible, there will always be cases where it is hard to generate efficient code—especially when the
loop conditions grow more complex than in the loop above. It is often better to split up a complex
expression into several simpler expression, as when the loop condition above was divided into a test
and an increment.
The choice between preincrement and postincrement is even more important in C++. Both operator++
and operator-- can be overloaded in both their prefix and postfix forms. When operators are
overloaded for class objects it isn't necessary to emulate the behavior of the operator on the basic
types, but it is certainly a good idea to keep the behavior as close to the basic types as possible. So
classes where it is intuitive to increment and decrement objects, e.g. iterators, typically have both the
prefix (operator++() and operator--()) and the postfix forms (operator++(int) and operator-(int)).
To emulate the behavior of the prefix ++ for a basic type operator++() can modify the object and return
a reference to the modified object.
What about emulating the behavior of postfix ++ for a basic type? Do you remember? "The result of the
postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is
incremented." Just as for the non-straight line code above, the implementer of operator++(int) has to
copy the original object, modify the original object, and return the copy by value. The copying makes
operator++(int) more costly than operator++().
For the basic types the optimizer can often remove unnecessary copying if the result of i++ is ignored,
but the optimizer cannot change a call to one overloaded operator into another. If you, out of habit, write
i++ instead of ++i you will call the more costly increment operator.
We have been arguing against the use of postincrement, but we have to admit that there are times when
postincrement is useful. If postincrementing a variable would do exactly what you want to do, go ahead,
use postincrement. But don't postincrement a variable in an expression just to avoid writing a separate
expression to increment the variable. Every time you add an unnecessary postincrement to a loop
condition, an if condition, a switch expression, a ?:-expression, or function call arguments, there is a
possibility that the compiler has to generate larger and slower code. Was that list too long to remember?
Start a new habit today! Write ++i instead of i++ when you don't use the result of the incrementation.
When you do use the result, ask yourself if you can't do the increment as the next statement.