We will now learn how to define more expressive types than the basic ones supplied with the core OCaml language.
# type vehicle = Bike
| Motorbike
| Car
| Lorry;;
type vehicle = Bike | Motorbike | Car | Lorry
vehicle
.\ldots
along with four new constants.The type
declaration adds a new type to our OCaml session. Type vehicle
is as good as any built-in type and even admits pattern-matching (as we used with the built-in list types earlier). The four new identifiers of type vehicle
are called constructors.
We could represent the various vehicles by the numbers 0--3. However, the code would be hard to read and even harder to maintain. Consider adding Tricycle
as a new vehicle. If we wanted to add it before Bike
, then all the numbers would have to be changed. Using type
, such additions are trivial and the compiler can (at least sometimes) warn us when it encounters a function declaration that doesn’t yet have a case for Tricycle
.
Representing vehicles by strings like "Bike"
, "Car"
, etc., is also bad. Comparing string values is slow and the compiler can’t warn us of misspellings like "MOtorbike"
: they will make our code fail.
Most programming languages allow the declaration of types like vehicle
. Because they consist of a series of identifiers, they are called enumeration types. Other common examples are days of the week or colours. The compiler chooses the integers for us; type-checking prevents us from confusing Bike
with Red
or Sunday
.
# let wheels = function
| Bike -> 2
| Motorbike -> 2
| Car -> 4
| Lorry -> 18;;
val wheels : vehicle -> int = <fun>
vehicle
, which we defined earlier.The beauty of datatype declarations is that the new types behave as if they were built into OCaml. Type-checking catches common errors, such as mixing up different datatypes in a function like wheels
, as well as missing and redundant patterns.
# type vehicle = Bike
| Motorbike of int
| Car of bool
| Lorry of int;;
type vehicle = Bike | Motorbike of int | Car of bool | Lorry of int
Lorry
) are distinct values. (So Car true
is distinct from Car false
).vehicle
can belong to one list: [Bike, Car true, Motorbike 450]
OCaml generalises the notion of enumeration type to allow data to be associated with each constructor. The constructor Bike
is a vehicle all by itself, but the other three constructors create vehicles from arguments.
Since we might find it hard to remember what the various int
and bool
components are for, it is wise to include comments in complex declarations. In OCaml, comments are enclosed in the brackets (*
and *)
. Programmers should comment their code to explain design decisions and key features of the algorithms (sometimes by citing a reference work).
# type vehicle = Bike
| Motorbike of int (* engine size in CCs *)
| Car of bool (* true if a Reliant Robin *)
| Lorry of int;;
type vehicle = Bike | Motorbike of int | Car of bool | Lorry of int
The list shown on the slide represents a bicycle, a Reliant Robin and a large motorbike. It can be almost seen as a mixed-type list containing integers and booleans. It is actually a list of vehicles; datatypes lessen the impact of the restriction that all list elements must have the same type.
We now define a wheels
function to calculate the number of wheels in any vehicle
. This requires pattern matching to retrieve the constructors and their associated arguments, around which we build the logic:
# let wheels = function
| Bike -> 2
| Motorbike _ -> 2
| Car robin -> if robin then 3 else 4
| Lorry w -> w;;
val wheels : vehicle -> int = <fun>
This function consists of four clauses:
_
signifies a "wildcard" pattern match that we discard, since the engine size of the bike is not relevant to the number of wheels.robin
to the bool
argument and then use it in the right hand side of the pattern match, much like a let
binding in normal code.There is no overlap between the Motorbike
and Lorry
cases. Although Motorbike
and Lorry
both hold an integer, OCaml takes the constructor into account and keeps any Motorbike
distinct from any Lorry
.
Vehicles are one example of a concept consisting of several varieties with distinct features. Most programming languages can represent such concepts using something analogous to datatypes. (They are sometimes called union types or variant records whose tag fields play the role of the constructors.)
A pattern may be built from the constructors of several datatypes, including lists. A pattern may also contain integer and string constants. There is no limit to the size of patterns or the number of clauses in a function declaration. OCaml performs pattern-matching efficiently (you do not need to understand the details of how it optimises pattern-matching at this stage).
During a computation, what happens if something goes wrong?
Exception-handling lets us recover gracefully.
Exceptions are necessary because it is not always possible to tell in advance whether or not a search will lead to a dead end or whether a numerical calculation will encounter errors such as overflow or divide by zero. Rather than just crashing, programs should check whether things have gone wrong, and perhaps attempt an alternative computation (perhaps using a different algorithm or higher precision). A number of modern languages provide exception handling.
# exception Failure;;
exception Failure
# exception NoChange of int;;
exception NoChange of int
# raise Failure;;
Exception: Failure.
Called from Topeval.load_lambda in file "toplevel/byte/topeval.ml", line 89, characters 4-14
Each exception
declaration introduces a distinct sort of exception, which can be handled separately from others. If E
raises an exception, then its evaluation has failed; handling an exception means evaluating another expression and returning its value instead. One exception handler can specify separate expressions for different sorts of exceptions.
Exception names are constructors of the special datatype exn
. This is a peculiarity of OCaml that lets exception-handlers use pattern-matching. Note that exception Failure
is just an error indication, while NoChange n
carries further information: the integer n
.
# try
print_endline "pre exception";
raise (NoChange 1);
print_endline "post exception";
with
| NoChange _ ->
print_endline "handled a NoChange exception";;
pre exception
handled a NoChange exception
Line 3, characters 4-22:
Warning 21 [nonreturning-statement]: this statement never returns (or has an unsound type.)
- : unit = ()
The effect of raise <expr>
is to jump to the most recently-encountered handler that matches <expr>
. The matching handler can only be found dynamically (during execution); contrast with how OCaml associates occurrences of identifiers with their matching declarations, which does not require running the program. A handler is introduced via the try
keyword, which executes the subexpression and dispatches any exceptions encountered to the corresponding pattern match for exceptions defined in the with
block.
This is also the first time that we have encountered the unit
type. This represents a type that has no values, and is used to indicate that a block has no meaningful return value. We will come back to this when learning more about imperative programming later on. For now, it is sufficient to understand that print_endline
will print out the argument to the console output, and return a unit
type. The semicolon at the end of the expression is a convenient way to execute sequential statements that return the unit
type.
One criticism of OCaml’s exceptions is that---unlike the Java language---nothing in a function declaration indicates which exceptions it might raise. One alternative to exceptions is to instead return a value of datatype option
.
# let x = Some 1;;
val x : int option = Some 1
None
signifies an error, while Some x
returns the solution x
. This approach looks clean, but the drawback is that many places in the code would have to check for None
. Despite this, there is a builtin option
type in OCaml as it is so useful. We will see in later lectures how to define our own version of option
using polymorphic datatype definitions.
# exception Change;;
exception Change
# let rec change till amt =
match till, amt with
| _, 0 -> []
| [], _ -> raise Change
| c::till, amt -> if amt < 0 then raise Change
else try c :: change (c::till) (amt - c)
with Change -> change till amt;;
val change : int list -> int -> int list = <fun>
In the Lists lectures, we considered the problem of making change. The greedy algorithm presented there could not express “6 using 5 and 2” because it always took the largest coin. Returning the list of all possible solutions avoids that problem rather expensively: we only need one solution.
Using exceptions, we can code a backtracking algorithm: one that can undo past decisions if it comes to a dead end. The exception Change
is raised if we run out of coins (with a non-zero amount) or if the amount goes negative. We always try the largest coin, but enclose the recursive call in an exception handler, which undoes the choice if it goes wrong.
Carefully observe how exceptions interact with recursion. The exception handler always undoes the most recent choice, leaving others possibly to be undone later. If making change really is impossible, then eventually exception Change
will be raised with no handler to catch it, and it will be reported at top level.
\newpage
Here is the full execution. Observe how the exception handlers nest and how they drop away once the given expression has returned a value.
\begin{aligned} \text{change [5; 2] 6} \Rightarrow &\; \text{try 5::change [5; 2] 1}\\ &\; \text{with Change -> change [2] 6}\\ \Rightarrow &\; \text{try 5::(try 5::change [5; 2] (-4)}\\ &\; \text{with Change -> change [2] 1)}\\ &\; \text{with Change -> change [2] 6}\\ \Rightarrow &\; \text{5::(change [2] 1)}\\ &\; \text{with Change -> change [2] 6}\\ \Rightarrow &\; \text{try 5::(try 2::change [2] (-1)}\\ &\; \text{with Change -> change [] 1)}\\ &\; \text{with Change -> change [2] 6} \\ \Rightarrow &\; \text{try 5::(change [] 1)}\\ &\; \text{with Change -> change [2] 6} \\ \Rightarrow &\; \text{change [2] 6} \\ \Rightarrow &\; \text{try 2::change [2] 4}\\ &\; \text{with Change -> change [] 6} \\ \Rightarrow &\; \text{try 2::(try 2::change [2] 2}\\ &\; \text{with Change -> change [] 4)}\\ &\; \text{with Change -> change [] 6} \\ \Rightarrow &\; \text{try 2::(try 2::(try 2::change [2] 0 }\\ &\; \text{with Change -> change [] 2)}\\ &\; \text{with Change -> change [] 4)}\\ &\; \text{with Change -> change [] 6} \\ \Rightarrow &\; \text{try 2::(try 2::[2]}\\ &\; \text{with Change -> change [] 4)}\\ &\; \text{with Change -> change [] 6} \\ \Rightarrow &\; \text{try 2::[2; 2]}\\ &\; \text{with Change -> change [] 6} \\ \Rightarrow &\; \text{[2; 2; 2]} \end{aligned}
# type 'a tree =
Lf
| Br of 'a * 'a tree * 'a tree;;
type 'a tree = Lf | Br of 'a * 'a tree * 'a tree
A data structure with multiple branching is called a “tree”. Trees can represent mathematical expressions, logical formulae, computer programs, the phrase structure of English sentences, etc.
# let tree = Br(1, Br(2, Br(4, Lf, Lf),
Br(5, Lf, Lf)),
Br(3, Lf, Lf));;
val tree : int tree =
Br (1, Br (2, Br (4, Lf, Lf), Br (5, Lf, Lf)), Br (3, Lf, Lf))
# tree_printer string_of_int tree;;
- : unit = ()
Binary trees are nearly as fundamental as lists. They can provide efficient storage and retrieval of information. In a binary tree, each node is empty (Lf
), or is a branch (Br
) with a label and two subtrees.
OCaml lists are a datatype and could be declared as follows:
# type 'a mylist =
| Nil
| Cons of 'a * 'a mylist;;
type 'a mylist = Nil | Cons of 'a * 'a mylist
We could even declare ::
as an infix constructor. The only thing we could not define is the [...]
notation, which is part of the OCaml grammar (although there does exist a mechanism to use a similar syntax for custom indexed datatypes).
A recursive type does not have to be polymorphic. For example, here is a simple datatype of tree shapes with no attached data that is recursive but not polymorphic.
# type shape =
| Null
| Join of shape * shape;;
type shape = Null | Join of shape * shape
The datatype 'a option
(mentioned above) is the opposite -- it is polymorphic, but not recursive.
# let rec count = function
| Lf -> 0 (* number of branch nodes *)
| Br (v, t1, t2) -> 1 + count t1 + count t2;;
val count : 'a tree -> int = <fun>
# let rec depth = function
| Lf -> 0 (* length of longest path *)
| Br (v, t1, t2) -> 1 + max (depth t1) (depth t2);;
val depth : 'a tree -> int = <fun>
The invariant \texttt{count}(t)\le 2^{\texttt{depth}(t)} - 1
holds in the functions above.
Functions on trees are expressed recursively using pattern-matching. Both functions above are analogous to \texttt{length} on lists. Here is a third measure of a tree’s size:
# let rec leaves = function
| Lf -> 1
| Br (v, t1, t2) -> leaves t1 + leaves t2;;
val leaves : 'a tree -> int = <fun>
This function is redundant because of a basic fact about trees, which can be proved by induction: for every tree t
, we have \texttt{leaves}(t) = \texttt{count}(t)+1
. The inequality shown on the slide also has an elementary proof by induction.
A tree of depth 20 can store 2^{20}-1
or approximately one million elements. The access paths to these elements are short, particularly when compared with a million-element list!
Give the declaration of an OCaml type for the days of the week. Comment on the practicality of such a type in a calendar application.
Write an OCaml function taking a binary tree labelled with integers and returning their sum.
Using the definition of 'a tree
from before:
# type 'a tree = Lf | Br of 'a * 'a tree * 'a tree;;
type 'a tree = Lf | Br of 'a * 'a tree * 'a tree
Examine the following function declaration. What does ftree (1, n)
accomplish?
# let rec ftree k n =
if n = 0 then Lf
else Br (k, ftree (2 * k) (n - 1), ftree (2 * k + 1) (n - 1));;
val ftree : int -> int -> int tree = <fun>
Give the declaration of an OCaml type for arithmetic expressions that have the following possibilities: floating-point numbers, variables (represented by strings), or expressions of the form -E
, E+E
, E\times E
.
Continuing the previous exercise, write a function that evaluates an expression. If the expression contains any variables, your function should raise an exception indicating the variable name.