Type Systems

And applications to type hinting in Python

Pierre Glaser, Gatsby Computational Neuroscience Unit

Brief Summary




  • Python is a dynamically typed language, allowing for rapid prototyping at the cost of delegating error checking at runtime.
  • Type hints allow user to reintroduce AOT type safety enforcment into Python in a gentle manner (gradual adoption, compile-time enforcement).
  • Type Hints were introduced in 2014, and their development have been active ever since (14 accepted PEPs on Typing!). Expect new features regularly


Talk Outline



  • A brief intuition for Type Systems in Computer Science
  • An overview of type systems, with application to Python
  • The benefits and frontier of Type Hints (LSPs, typed tensors, variadic generics)

Type Systems in Programming Languages

Entities in our world can often be clustered into sets, composed of elements with similar characteristics. Human Beings, cups, rocks, etc... These common characteristics can be

  • attributes
  • internal operations (growing)
  • interactions with other elements (conversating with someone)

Type Systems in Programming Languages

Symetry between such sets is broken by constraining behavior:
  • a cup of coffee cannot breathe
  • I cannot conversate with a rock
  • ...

Type Systems in Programming Languages

A similar structure can be observed in mathematical objects: Given a commutative group $(E, +_E)$ over a field $(\mathbb F, +_F, \cdot_F) $, and an external law $\cdot: \mathbb F \times E \longmapsto E$, E is called a $\mathbb F$-vector space if:

  • $(\lambda_1 +_F \lambda_2) \cdot e = \lambda_1 \cdot e +_E \lambda_2 \cdot e$
  • $\lambda_1 \cdot (e_1 +_E e_2) = \lambda_1 \cdot e_1 +_E \lambda_1 \cdot e_2$

    Elements of $E$ thus expose a set of valid operations that can be applied to them: $e_1 + e_2$, $\lambda \cdot e$. Anything else is a prior "invalid".

Type Systems in Programming Languages

  • progams simulate evolution of entities or mathematical objects, ⇒ same kind of constraints.
  • The specification of a programming language's entities and their interactions is called a type system.

Type Systems in Programming Languages

  • A general purpose computer only recieves a sequence of bytes, and makes no intrisnic distinction between them.
  • Above assembly level, every language comes with an engines that checks the validity (or "type safety") of the instructions given to it before translating it into machine code.

but when?

A first very simple example

Let's look at a very simple custom type.
						
class TwoDVec:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def add(self, other):
        return TwoDVec(self.x + other.x, self.y + other.y)


## the TwoDVec class complies with the following type system:
class TwoDVec:
    x: float  # important!
    y: float  # important!

    def add(self: TwoDVec, other: TwoDVec) -> TwoDVec:
        ...
						
						

Type Systems in Programming Languages

Type Safety can be enforced

  • Ahead of Time, using a "compiler" (like in C)
  • At runtime, or "just in time" (like in Python)


The second categories of languages are more flexible to protopye with, but do not ensure the safety of programs using a compiler.

Benefits of Static Typing

Benefits of AOT type-checking:

  • Checking type safety is delegated to external tools ⇒ less burden on programmer
  • AOT type checkers understand a program's type structure and can be leveraged by IDE tools
  • ⇒ positive feedback loop where the programmer is incited to leverage IDE tools by creating more maintainable, type-checkable code

Type Hints in Python

  • Python=runtime type-safety enforcment
  • Since 2014, it is possible to define type rules of Python Objects
  • These rules have no impact at runtime
  • Purpose: be leveraged by type-checkers and other tools to peform type-aware development, which has many advantages.
There are two ways to speficy type rules in python:

  • using separate files (stub files) to define type rules
  • directly within the definition of constructs

Option 1: separating rules and implementations

						
# file 2Dvector.py
class TwoDVec:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def add(self, other):
        return TwoDVec(self.x + other.x, self.y + other.y)


# file TwoDVec.pyi
# the folowing is a type rule: is ensures that all objects of the TwoDVec class verify the following rules:
class TwoDVec
    x: float  # important!
    y: float  # important!

    def __init__(self, x: float, y: float) -> None: ...
    def add(self: TwoDVec, other: TwoDVec) -> TwoDVec: ...
						
						

Option 2: Baking In the Rule in the Implementation

Since Python 3.5, it is possible to speficy type rules directly within Python Expressions:
						
# file 2Dvector.py
class TwoDVec:
    x: float
    y: float

    def add(self: TwoDVec, other: TwoDVec) -> TwoDVec:
        return TwoDVec(self.x + other.x, self.y + other.y)
						
						
This code will behave the same as its untyped counterpart. Let us now dig deeper into the structure of types and type systems.

Types and Types Systems

Types and Types Systems

Sofware constructs are hierarchical, compositional, and structure-sharing. Type system should reflect that. We will cover:

  • Basic Types
  • Subtypes (and Subtype Polymorphism)
  • Composite Types
  • Type Variables (and Parametric Polymorphism)


And the type relationship (or "variance") between them.

Basic (Monomorphic) Types

						
class TwoDVec:
    x: float
    y: float

    def add(self: "TwoDVec", other: "TwoDVec") -> "TwoDVec":
        return TwoDVec(self.x + other.x, self.y + other.y)
						
						
TwoDVec: nullary type constructor.

Subtypes

TwoDVec elements called RateOfChange
						
class RateOfChange(TwoDVec):
    x: float
    y: float

    def add(self: "RateOfChange", other: "RateOfChange") -> "RateOfChange":
        return RateOfChange(self.x + other.x, self.y + other.y)

    def total_change(self, T: float) -> "TwoDVec":
        return TwoDVec(self.x * T, self.y * T)
						
						
Key Concept: Subtype Polymorphism: coercion subtype instances to their parent type should be possible.

Composite Types

						
from typing import Self


class ThreeDVec:
    x: float
    y: float
    z: float

    def add(self: Self, other: Self) -> Self:
        return ThreeDVec(self.x + other.x, self.y + other.y, self.z + other.z)


from typing import Union
TwoOrThreeDVec = Union[TwoDVec, ThreeDVec]

def get_dimension(v: TwoOrThreeDVec) -> int:
    return 2 isinstance(v, TwoDVec) else 3
						
						
Union: binary type constructor.

Question: is TwoOrThreeDVec a subtype of TwoDVec, or the other way around?
Inherent tension between type structure and the timeline of programming:

  • the more generic structures often arrive at the end of development (when constructs and code logic need to be generalized)
  • While the most specific types happen during prototyping. Top Down programming is hard!

Product Types

						
from typing import Tuple
TupleOfTwoDVec = Tuple[TwoDVec, TwoDVec]

def add(t: TupleOfTwoDVec) -> TwoDVec:
    return t[0].add(t[1])  # valid
    # return t[2].add(t[3])  # invalid
						
						


Data Types

						
from typing import List
from typing_extensions import Self
from functools import reduce


ListOfTwoDVec = List[TwoDVec]


def add_all(l: ListOfTwoDVec) -> TwoDVec:
    return reduce(lambda x, y: x.add(y), l)
						
						

Subtyping relationships in Data Types

Suppose we define the following data types:
						
TupleOfTwoDVec = Tuple[TwoDVec, TwoDVec]
TupleOfTwoOrThreeDVec = Tuple[TwoOrThreeDVec, TwoOrThreeDVec]

ListOfTwoDVec = List[TwoDVec]
ListOfTwoOrThreeDVec = List[TwoOrThreeDVec]
						
						
						
  • is TupleOfTwoOrThreeDVec a subtype of TupleOfTwoDVec?
  • is ListOfTwoOrThreeDVec a subtype of ListOfTwoDVec?

Subtyping relationships in Data Types

To answer these questions, let us expose the type rules of

  • Tuple[TwoDVec, TwoDVec] vs Tuple[TwoOrThreeDVec, TwoOrThreeDVec]
  • List[TwoDVec] vs List[TwoOrThreeDVec]

one by one.

Subtyping relationships in Immutable Product Types

						
class TupleOfTwoDVec:
    def __getitem__(self, index: int) -> TwoDVec: ...

class TupleOfTwoOrThreeDVec:
    def __getitem__(self, index: int) -> TwoOrThreeDVec: ...


tuple_of_two_d_vector: TupleOfTwoDVec = ...
tuple_of_two_or_three_d_vector: TupleOfTwoOrThreeDVec = ...
						
						
  • Calling tuple_of_two_d_vector[0] returns a TwoDVec, which is a subtype of TwoOrThreeDVec, and can thus be used in lieu of instances of such types.
  • Thus, it is safe to use a TupleOfTwoDVec in place of TupleOfTwoOrThreeDVec.


⇒ There is a subtyping hierarchy for immutable product types.

Subtyping relationships in Mutable Product Types

						
class ListOfTwoDVec:
    def __getitem__(self, index: int) -> TwoDVec: ...
    def __setitem__(self, index: int, value: TwoDVec) -> ListOfTwoDVec: ...


class ListOfTwoOrThreeDVec:
    def __getitem__(self, index: int) -> TwoOrThreeDVec: ...
    def __setitem__(self, index: int, value: TwoDVec) -> ListOfTwoOrThreeDVec: ...


two_d_vector: TwoDVec = ...
two_or_three_d_vector: TwoOrThreeDVec = ...

l_2dvec: ListOfTwoDVec = ...
l_23dvec: ListOfTwoOrThreeDVec = ...
						
						
  • Calling l_2dvec.set(l_23dvec) is not a syntax defined by the typing rules above. Thus, ListOfTwoDVec is not a subtype of ListOfTwoOrThreeDVec.
  • Similarily, ListOfTwoOrThreeDVec is not a subtype of ListOfTwoDVec because calling l_23dvec[0] in lieu of l_2dvec[0] will return a supertype of TwoDVec, and this thus not type safe.

⇒ No subtying hierarchy can be defined for mutable product types!

Type Variables, Parametrized Types


  • We denote by T a type variable, e.g. a variable whose array of possible concrete values is (for now) any type ( float, int, ...)

  • Parametrized Types are types constructed using a type variable.

  • For instance, List[T] is a type whose array of possible concrete values is any type T for which List[T] is a valid type.


Careful! T is not a type, it is a type variable. Thus, List[T] is not a type, either.

Variance in Parametrized Types

let T be a Type Variable, and C be a unary type constructor (e.g. List). For two type variables T1 and T2 we note T1T2 if T1 is a subtype of T2 (a partial order on types). Then

  • C is covariant (w.r.t its argument) if T1T2C[T1]C[T2]
  • C is contravariant (w.r.t its argument) if T1T2C[T2]C[T1]
  • C is invariant if is is neither covariant nor contravariant.

In our case, Tuple is covariant (w.r.t its argument), while List is invariant. What type constructor be contravariant?

Parametric Types in Python

						
from typing import TypeVar, Generic

T = TypeVar("T")


RealOrComplex = TypeVar("RealOrComplex", float, complex)


class TwoDVector(Generic[RealOrComplex]):  # defines a type constructor/abstract type

    x: RealOrComplex
    y: RealOrComplex

    def __init__(self, x: RealOrComplex, y: RealOrComplex):
        self.x = x
        self.y = y

    def add(self, other: "TwoDVector[RealOrComplex]") -> "TwoDVector[RealOrComplex]":
        return TwoDVector(self.x + other.x, self.y + other.y)
						
						
						

Parametric Types in Python

						
class RealTwoDVector(TwoDVector[float]):  # concrete type
    pass


class ComplexTwoDVector(TwoDVector[complex]):  # ditto
    pass


real_vec = RealTwoDVector(1.0, 2.0)
complex_vec = ComplexTwoDVector(1.0j, 2.0j)

real_vec.add(real_vec)  # OK
real_vec.add(complex_vec)  # type error
						
						
						

Function Types

The type of a function is uniquely determined by its signature:
						
from typing import Callable

def inc(x: int) -> int:
    return x + 1


def higher_order(f: Callable[[int], int], x: int) -> int:
    return f(x)


ret = higher_order(inc, 1)  # type safe 
						
						

  • f is a function of type int -> int, or, in Python syntax, Callable[[int], int].
  • function type constructors can take a variable number of arguments
  • the constructor themselves are instances of different types, called "kind", or "higher kinded type".

Variance of Function Types

  • Function Types are contravariant in their arguments and covariant in their return type.
  • Proof by idiom: "If you expect less, you can have more, while if you promise less, you can give more"
  • Exercise: explain why the variance of function types sheds a new light on the variance of List and Tuple.

Even More Types in Python

  • There are more syntactic sugar-type features for types in the Python type system, but these cases cover all the most fundational constructs (apart from structural subtyping).
  • Note that this introduction to type systems is language agonstic, and uniquely grounded in type theory. Python was only used as an application language.
  • You just learned something very general! After this tutorial, it will be much easier to read and understand heavily typed languages such as C++, Java, Haskell...

Benefits of Typing

  • All the type information that you expose in your program can be leveraged by Language Servers and Static Type Checkers to verify the type safety of your program, and provide intelligent completion
  • Type annotations also serve a documentation purpose as they explicit the relationship between constructs, and the nature of the arguments of functions, and attributes of classes.
  • Reintroducing a compilation step in Python is very usefule for refactoring and modularity in large codebases
  • Feel free to ask me for examples and return of experiences about end-to-end use typing in real-world projects!

The frontier of Typing

  • The common denominator for all of us is array (or "tensor") processing, which would involve shape and dtype information.
  • Under type system for tensor processing would allow to type-check tensor operations (addition, reduction, etc.).
  • Defining such a type system required an API for variadic parametrized types handling, a challenging problem which was only implemented recently.


Type Checking tensor operations would remove one of the biggest source of bugs in deep learning codebases.