Rethinking Data Types Algebraically

Posted on September 14, 2017

TL;DR

Algebraic data types (ADTs) are types that are created from combining sum and product types.

Sum types are inhabited by a distinct set of alternate values (think enumerations) and product types are inhabited by all the possible combinations of the composite parts (think tuples, records, or structs).

While these concepts are relevant in all programming languages and paradigms, ADTs hold a central place in functional programming and are represented elegantly in Haskell.

Hey! That sounds like Math!

If you’re a programmer you are probably fine with the “Data Types” part. You may even be fine with the “Rethinking” part. The “Algebraically” part might have you running for the hills though. I don’t know why. They have math in the hills. They aren’t animals.

Anyway, you needn’t fear, there will be precious little math in this post. All we really need is a basic understanding of an algebra. This is convenient because a basic understanding is all I have.

For the purposes of this post:

An algebra consists of a set of objects along with a set of operations on those objects. In an algebra we build new objects by passing existing objects to the operations.

For example, we could form an algebra where our objects are integers and our operations are addition and multiplication. We can get the integer 3 by passing 2 and 1 to the addition operation, i.e. 3 = 2 + 1. We can get the integer 16 by passing 2 and 8 to the multiplication operation, i.e. 16 = 2 * 8.

Not that scary after all, right?

Example Time!

Here is some Haskell.

data Department = Engineering | Marketing | Research
data EmployeeClassification = FullTime | PartTime

-- John Smith is an full time employee,
-- working in the Engineering department
johnSmith :: (Department, EmployeeClassification)
johnSmith = (Engineering, FullTime)

For the uninitiated, here is a basic overview:

  • We define two types, Department and EmployeeClassification, which can each be constructed by using one of a choice of data constructors. In this case the data constructors are all parameter-less, effectively making these types enumerations.
  • We declare a (Department, EmployeeClassification) tuple, which represents a full time employee that works in the Engineering department.

With that basic explanation out of the way, let’s ask ourselves a few questions.

  1. How many inhabitants (i.e. distinct values) of the Department type are there?
  2. How many inhabitants of the EmployeeClassification type are there?
  3. How many inhabitants of the (Department, EmployeeClassification) type are there? That is, how many combinations of Department and EmployeeClassification exist?

Questions 1 and 2 are easy enough to answer. We can just count the number of data constructors. Put another way, replace every data constructor with a 1, then sum the ones. So for the Department type we have Engineering | Marketing | Research ==> 1 + 1 + 1 = 3. Similarly, for the EmployeeClassification type we have FullTime | PartTime ==> 1 + 1 = 2.

Question 3 requires a little bit of work to answer. We can enumerate all the options by hand (see below) or we can dust off that set theory textbook and recognize this as a cartesian product (or cross join in SQL parlance). That textbook would tell us that the cartesian product of a set that contains 3 elements (our Department type) and a set that contains 2 elements (our EmployeeClassification type), is 3 x 2 = 6, which matches what we get via our brute force method applied below.

(Engineering, FullTime)
(Engineering, PartTime)
(Marketing, FullTime)
(Marketing, PartTime)
(Research, FullTime)
(Research, PartTime)

We have mentioned sums and products. They sure do sound like things that might be involved in algebra…

So, where is the algebra?

We said that an algebra is the combination of a set of objects and a set of operations that operate on those objects. So how does this relate to Algebraic Data Types?

Let’s say that the objects in our algebra are types. What would our operations be? That is, how do we build new types from existing types? Well, we saw two ways of doing that.

Firstly, we can define a type as something constructed by one of an alternate set of data constructors, e.g. data Department = Engineering | Marketing | Research. We also made the observation that the cardinality of this set can be calculated via a sum operation, i.e. counting the constructors. So, we could say that one of the operations that allow us to make new types is a “sum operation”, where the inhabitants of the new type are made up from the union of the alternate choices. Types constructed from sum operations are called sum types.

Secondly, we saw that we can define a new type by combining other types, e.g. our (Department, EmployeeClassification) tuple. The inhabitants of this type are enumerated by applying a product operation (the cartesian product in set-theoretic terms) to the composite types. So, we could say that we were able to make a new type via the use of a “product operation”. Types constructed from product operations are called product types.

In glorious conclusion, when we talk about Algebraic Data Types what we are referring to are composite types that are built from a combination of sum and product operations. Simples … right?