More on static typing

2008-03-24

Lately, I've been thinking about static typing.

In this post, I will use the following definitions: static typing - a system that can prove some properties relating to data types of a program given the text of a program.

Static typing when defined in such a way has several practical uses:

detecting certain classes of errors in a program;
optimizations;
coding assist.

Formally we may write the static typing system as a predicate ST(Prop, Prog) where the Prop is some property (i.e., a predicate) and Prog is some program text. The ST function checks whether Prop is satisfied in the text of Prog. Now, having such an operation, we can substitute relevant predicates and programs to see if these properties hold.

Some examples of properties Prop:

“Variable V has a type T”;
“An operation O is applicable to variable V”, etc.

But that's just an ideal case. In reality, we can not construct such a predicate ST even for simple cases (e.g., for Prop being “Variable V has a type T”), but just some approximation of it. Therefore, ST is necessarily not a predicate defined everywhere but rather it can be viewed as either a function defined over three-state logic, or function returning a “fuzzy” value, or as a partial boolean function (that is, a function defined on only parts of its domain).

In other words, static typing may not return an exact answer to the question of whether the property Prop holds over the text of a program Prog. Unfortunately, this is often glossed over.

Now let's consider a simple program in C# (just as an example, this is not really language-specific):

object a = 1;
object b = 2;
Console.WriteLine("{0}+{1}={2}",a,b,a+b);

This program will not compile. And it is easy to see that if the rules of the language would be built upon dynamic typing instead of static one then the result would be defined and would be correct – namely, the program would output “1+2=3”.

And that's the equivalent Python program:

a = 1
b = 2
print "%d+%d=%d"%(a,b,a+b)

This program executes normally and prints the expected string “1+2=3”.

On one hand, as we can see in this simple example, static typing restricts the programmer – i.e., not all correct programs can be written. On the other hand, fully dynamic typing (that is, an outright rejection of static typing) does not restrict the programmer in one's ability to write correct programs, but as a compromise, the compiler is not able to find large cases of errors that are easily caught by the static type system.

It is possible to give a more complex example. Let's consider a program that has two parts: the first part dynamically generates other programs and the second part invokes the generated program. And also suppose that we need to ensure that the generated program is correct and is used correctly. Dynamically it is quite possible and it is usually being done. The whole question is whether it is doable statically. We can either just outright forbid programs that are not provable statically. Or we can provide so-called “escape hatches” (such as dynamic type casts).

Advances in type systems (such as templates, GADT, and dependent types) have shrunk (but not fully eliminated) the set of programs for which static typing cannot prove satisfiability.

Therefore, I want to see the following features of static typing in programming languages:

If static typing is able to prove that some properties required for correctness are not satisfied then it should lead to compile errors
If static typing is not able to prove the correctness of a program then it should be a compile-time warning instead of a compile-time error. Or, alternatively, by default, it would be a compiler error but a programmer is able to annotate such a code to use dynamic property checking in this particular location.