Thursday, August 25, 2011

C, assembly, and security

Let's look at the innocent C statement:
    a = b + c;
What could possible go wrong?  Let us list the ways:
  1. a, b, or c are the not the variables we want
  2. We specified addition, but we wanted something else
  3. The addition operation overflows
  4. a and b are unsigned, while c is signed and negative; the result becomes unsigned
  5. a has a smaller size than b or c; the assignment operation overflows
  6. a is signed while b or c are unsigned; a becomes negative while b+c are positive
  7. a is unsigned, while b and c are signed; again we have an overflow
  8. We used the wrong indentation level and people are unhappy about it
We can't expect the language to prevent all of these errors, but can we make C safer by trapping some of them at least during runtime? Turns out we can't do that without sacrificing performance:

  • to handle (3), we need trapping signed addition and trapping unsigned addition instructions
  • to handle (4), we need a mixed signedness trapping add
  • to handle (5), we need a trapping store unsigned and trapping store signed instructions, which check that the value in the register fits into the memory location specified
  • ditto with (6) and (7)
these (and similar) issues show up regularly in security vulnerabilities; it's hard to fix them because the necessary processor instructions are not there. We could emulate them by using sequences of existing instructions, but that would bloat the code and hurt performance; since performance is something that can be benchmarked but security is not, we end up with exploitable code.

So why are those instructions missing? In the 70's and 80's when the industry was ramping up performance was a much greater concern than security. Code was smaller and easier to audit; CPU cycles were longer and therefore more important to conserve; networks were small and private; truly malicious attacks were rare.

An unvirtuous cycle followed: C tried to make the most of exising processors, so its semantics mimic the instruction set of those days. It then became wildly successful, so processors were optimized for running C code; naturally they implemented or optimized instructions which directly translated to C concepts. This made C even more popular.

A pair of examples from the x86 world are the INTO and BOUND instructions. INTO (INTerrupt on Overflow) can follow an addition or subtraction instruction, effectively turning it into a trapping signed instruction. BOUND performs an array subscript bounds checking, trapping if the index is out of bounds. But the first implementations were rarely used, so they were not optimized in later iterations of the processor. Finally, the 64-bit extensions to the x86 instruction set removed those two instructions for good.