Dependent Type

Sometimes it feels like programming languages didn't really change from the 60s up to now. When I feel like that, I often remind myself of the cool tools and features we have now that make our lives easier. We now have things like an integrated debugger, unit tests, static analysis, cool IDEs, typed arrays and others. Language progress is slow and iterative, there are no silver bullets that will come in and change the game.

Today I want to tell you about the next step in this iterative process. People are still researching this technology, but it has the potential to enter mainstream languages soon. And it all starts with one of the most fundamental concepts in computer science: types.

The World of Types

Types are one of those things that are so integrated into our thinking that we hardly ever think about the concept itself. Why is 1 an int but you put quotes around it suddenly turns into a string? What is a type, really? Like a lot of things in programming, it depends on who you ask.

Types have many forms. Some type systems draw a very hard line between types and values. 3, 2 and 1 are all values of the type integer, but integer is not a value. It is a "baked in" language construct that is fundamentally different from a value. But this distinction is not necessary, and can be limiting.

If you free up types and make them values like any other, you enable some cool possibilities. Values can be stored, transformed and passed to functions. This means you can have functions that take a type as a parameter, creating generic functions: functions that can work with multiple types without overloading. You can have an array of values of a specific type, as opposed to doing weird pointer arithmetic and casting like you do in C. You can also construct new types wile the program is running, enabling things like automatic JSON deserialization.

But even if you treat types as values, there is still a rift between what you can do with types and what you can do with values. With user instances, you can do things like compare their names, check their age or id, etc.

if user.name == "Marin" && user.age < 65 {
    print("You can't retire yet!")
}

However, if you're doing the same with a User type, you can only compare the type name and maybe the property names. Since it's a type and not an instance, you can't check the values of its properties.

if typeof(user) == User {
    print("Well, it's a user. That's all I know")
}

Wouldn't it be cool if you could have a function which can only receive a non-empty list of users? Or a function which would receive only a well-formatted email address? For that to work, you would have to have a type "non empty array" or a type "email address". You would have to have a type which depends on a value, or a dependent type. In mainstream languages, this isn't possible.

In order for types to be useful, the compiler needs to verify them. If you say that a variable contains an integer, it better not contain a string, otherwise the compiler will complain. This is good, it stops you from writing bugs. Checking the types is pretty easy: if a function returns an integer, and you try to return "Marin", that's an error.

But with dependent types things get more complicated. The problem lies in when the compiler checks the types. How can you check that an array contains exactly three values, when the program is not even running yet? How can you check that an integer is greater than 3, if that integer is not even assigned yet? The answer: magic... or, in other words, math. If you can mathematically prove that a set of integers is always larger than 3, then the compiler can check it.

Math Time!

You use mathematical induction for creating proofs. Induction is a way of proving, without doubt, that something is true. For instance, you want to prove that the following mathematical formula is true for every positive number:

1 + 2 + 3 + ... + x = x * (x + 1) / 2

There are infinite possible xs, so it would take us a very long time to check every number by hand. Thankfully, we don't need to do that. All we have to do is prove two things:

  1. The statement is true for the first number. (Usually this is 0 or 1)
  2. If the statement is true for one number n, it is true for the next number n + 1

Because it's true for the first number and also for every successive number, we know that the statement is true for all possible numbers.

Proving that it's true for 1 is pretty easy:

1 = 1 * (1 + 1) / 2
1 = 1

Now we also need to prove that it works for all other numbers. We do this by assuming it works for some number n, and check that it also works for n + 1.

assuming this is true:
1 + 2 + 3 + ... + n = n * (n + 1) / 2

we check the case for n + 1:
(1 + 2 + 3 + ... + n) + (n + 1) = (n + 1) * ((n + 1) + 1) / 2
we can replace "(1 + 2 + 3 + ... + n)" with the equation above
(n * (n + 1) / 2) + (n + 1) = (n + 1) * ((n + 2) / 2)
and we simplify to
(n + 1) * (n/2 + 1) = (n + 1) * (n/2 + 1)

Since both sides of the equals are the same, we know that the statement is true. This is one of the ways you can prove statements without having to manually go through each case, and this is the basis of how dependent types work. You write mathematical proofs that a statement about a type is true.

The beauty of this is that all mathematical proofs can be written as computer programs, which is exactly what we need!

Back to Programming

So we established that we can prove things are true before we have a concrete value. To do this in an actual programming language, we need a way to encode these statements into the type system itself, which means our type system needs an upgrade.

Let's look at an example. Say we have an append function which takes two arrays and combines them. Usually, the function signature would look something like this:

append: (arr1: Array, arr2: Array) -> Array

But just looking at this signature, we can't be sure the implementation is correct. Just because the function returns an array, doesn't mean it actually did something. One way to check the result is to make sure the length of the result array is a sum of the lengths of the parameter arrays.

newArray = append([1], [2, 3])
assert(length(newArray) == 3)

But why check this at runtime, when we can have create a constraint that the compiler can check while the program is compiling:

append: (arr1: Array, arr2: Array) -> newArray: Array
         where length(newArray) == length(arr1) + length(arr2)

We declare that append is a function taking two Array arguments and returning a new Array argument which we named newArray. Only this time we add the caveat the length of the new array needs to be the same as the sum of the lengths of the function's arguments. We're turning the above run-time assertion into compile-time type.

The above code is in the world of types, not values, so the == is comparing the return type of length, and not its value. For this to work, the return type of length needs to tell us something about the actual number.

The way to make this work is to make sure each number is a separate type. The type One can only contain one possible value: 1. Same for Two, Three and all other numbers. Of course this is really tedious work, but that's why we have programming. We can create a compiler that does this for us.

Once we've done that, we can create a separate type for an array holding 1, 2, 3 and other numbers of items. ArrayOfOne, ArrayOfTwo, etc.

With that, we can define a length function which will take one of the above array types, and have a dependent return type of One for ArrayOfOne, Two for ArrayOfTwo and so on, for each number.

Now that we have a separate type of every array length, we can make sure (at compile time) that two arrays have the same length by comparing their types. And since types are values like any other, we can define operations on them. We can define addition on the actual types, defining that the sum of ArrayOfOne and ArrayOfTwo is ArrayOfThree.

This is all the info a compiler needs to make sure what you're writing is correct.

Let's say we want to create a variable of type ArrayOfThree:

result: ArrayOfThree = append([1], [2, 3])

The compiler can tell that [1] has only one value, so it can assign the type ArrayOfOne. It can also assign ArrayOfTwo to [2, 3].

The compiler knows that the type of result must be equal to the type of the first argument, plus the type of the second argument. It also knows that ArrayOfOne + ArrayOfTwo is ArrayOfThree, so it knows that the whole expression on the right hand side is of the type ArrayOfThree. This matches up with that's on the left, which makes the compiler happy.

If we tried to write the following:

result: ArrayOfTwo = append([1], [2, 3])

The compiler would not be happy at all, because it knows that the type is wrong.

Dependent Typing is Pretty Cool

This makes a huge number of bugs simply impossible to create. With dependent typing, you can avoid off-by-one errors, accessing non existent array indices, null pointer exceptions, infinite loops and non performant code.

You can express almost anything with dependent types. A factorial function which only accepts natural numbers, a login function which doesn't accept empty strings, a removeLast function which only accepts non-empty arrays. And all this is checked before you run the program.

The problem with runtime checks is that they fail when the program is already running. This is okay if you're the one running it, but not if the user is. Dependent types let you move those checks to the type system itself, making it impossible to fail while the program is running.

I believe dependent typing is the future of mainstream programming languages, and it can't come soon enough!

References:

Proof that all mathematical proofs can be written as programs
https://en.wikipedia.org/wiki/Curry–Howard_correspondence

Idris, a dependently typed language
https://www.idris-lang.org

F*, another dependently typed language with a cool looking logo
https://www.fstar-lang.org/tutorial/

Adding dependent typing to JavaScript
http://goto.ucsd.edu/~ravi/research/oopsla12-djs.pdf

Unit in the Last Place

Referential Transparency