notebooks/foundations/foundations2.mld

Expression Evaluation

Expression evaluation concerns expressions and the values they return. This view of computation may seem to be too narrow. It is certainly far removed from computer hardware, but that can be seen as an advantage. For the traditional concept of computing solutions to problems, expression evaluation is entirely adequate.

Starting with $E_0$ , the expression $E_i$ is reduced to $E_{i+1}$ until this process concludes with a value $v$ . A value is something like a number that cannot be further reduced.

We write $E \rightarrow E'$ to say that $E$ is reduced to $E'$ . Mathematically, they are equal: $E=E'$ , but the computation goes from $E$ to $E'$ and never the other way around.

Computers also interact with the outside world. For a start, they need some means of accepting problems and delivering solutions. Many computer systems monitor and control industrial processes. This role of computers is familiar now, but was never envisaged in the early days. Computer pioneers focused on mathematical calculations. Modelling interaction and control requires a notion of states that can be observed and changed. Then we can consider updating the state by assigning to variables or performing input/output, finally arriving at conventional programs as coded in C, for instance.

For now, we remain at the level of expressions, which is usually termed functional programming.

Summing the first n integers

# let rec nsum n =
    if n = 0 then
      0
    else
      n + nsum (n - 1);;
  val nsum : int -> int = <fun>

The function call nsum n computes the sum 1 + … + nz rather naively, hence the initial n in its name:

\begin{aligned} \text{nsum } 3 \Rightarrow & 3 + (\text{nsum } 2) \\ \Rightarrow & 3 + (2 + (\text{nsum } 1) \\ \Rightarrow & 3 + (2 + (1 + \text{nsum } 0)) \\ \Rightarrow & 3 + (2 + (1 + 0)) \end{aligned}

The nesting of parentheses is not just an artifact of our notation; it indicates a real problem. The function gathers up a collection of numbers, but none of the additions can be performed until nsum 0 is reached. Meanwhile, the computer must store the numbers in an internal data structure, typically the stack. For large n, say nsum 10000, the computation might fail due to stack overflow.

We all know that the additions can be performed as we go along. How do we make the computer do that?

Iteratively summing the first `n` integers

# let rec summing n total =
    if n = 0 then
      total
    else
      summing (n - 1) (n + total);;
  val summing : int -> int -> int = <fun>

Function summing takes an additional argument: a running total. If n is zero then it returns the running total; otherwise, summing adds to it and continues. The recursive calls do not nest; the additions are done immediately.

A recursive function whose computation does not nest is called iterative or tail-recursive. Many functions can be made iterative by introducing an argument analogous to total, which is often called an accumulator.

The gain in efficiency is sometimes worthwhile and sometimes not. The function power is not iterative because nesting occurs whenever the exponent is odd. Adding a third argument makes it iterative, but the change complicates the function and the gain in efficiency is minute; for 32-bit integers, the maximum possible nesting is 30 for the exponent $2^{31}-1$ .

Recursion vs Iteration

“Iterative” normally refers to a loop, coded using while for example (see the final lecture)
Tail-recursion is only efficient if the compiler detects it
Mainly it saves space (memory), though iterative code can also run faster
Do not make programs iterative unless the gain is worth it

A classic book by Abelson and Sussman, which describes the Lisp dialect known as Scheme, used iterative to mean tail-recursive. Iterative functions produce computations resembling those that can be done using while-loops in conventional languages.

Many algorithms can be expressed naturally using recursion, but only awkwardly using iteration. There is a story that Dijkstra sneaked recursion into Algol-60 by inserting the words “any other occurrence of the procedure name denotes execution of the procedure.” By not using the word “recursion”, he managed to slip this amendment past sceptical colleagues.

Obsession with tail recursion leads to a coding style in which functions have many more arguments than necessary. Write straightforward code first, avoiding only gross inefficiency. If the program turns out to be too slow, tools are available for pinpointing the cause. Always remember KISS (Keep It Simple, Stupid).

I hope you have all noticed by now that the summation can be done even more efficiently using the arithmetic progression formula:

1+\cdots+n = n(n+1)/2

Silly Summing the First n Integers

# let rec sillySum n =
    if n = 0 then
      0
    else
      n + (sillySum (n - 1) + sillySum (n - 1)) / 2;;
  val sillySum : int -> int = <fun>

The function calls itself $2^n$ times! Bigger inputs mean higher costs---but what’s the growth rate?

Now let us consider how to estimate various costs associated with a program. Asymptotic complexity refers to how costs---usually time or space---grow with increasing inputs. Space complexity can never exceed time complexity, for it takes time to do anything with the space. Time complexity often greatly exceeds space complexity.

The function sillySum calls itself twice in each recursive step. This function is contrived, but many mathematical formulas refer to a particular quantity more than once. In OCaml, we can create a local binding to a computed value using the local declaration syntax. In the following expression, y is computed once and used twice:

# let x = 2.0 in
    let y = Float.pow x 20.0 in
    y *. (x /. y);;
  - : float = 2.

You can read let x = e1 in e2 as assigning (or "binding") the name x with the value of e1 into e2. Any use of x within e2 will have the value of e1, and x will only be visible in subexpressions into which it has been bound.

Why do we need let bindings? Fast hardware does not make good algorithms unnecessary. On the contrary, faster hardware magnifies the superiority of better algorithms. Typically, we want to handle the largest inputs possible. If we double our processing power, what do we gain? How much can we increase $n$ , the input to our function?

With sillySum, we can only go from $n$ to $n+1$ . We are limited to this modest increase because the function’s running time is proportional to $2^n$ . With the function npower defined in the previous section, we can go from $n$ to $2n$ : we can handle problems twice as big. With power we can do much better still, going from $n$ to $n^2$ .

The following table (excerpted from a 50-year-old book!) illustrates the effect of various time complexities. The left-hand column (dubbed "complexity") is defind as how many milliseconds are required to process an input of size $n$ . The other entries show the maximum size of $n$ that can be processed in the given time (one second, minute or hour).

complexity	1 second	1 minute	1 hour	gain
$n$	1000	60 000	3 600 000	$\times 60$
$n \log n$	140	4 895	204 095	$\times 41$
$n^{2}$	31	244	1 897	$\times 8$
$n^{3}$	10	39	153	$\times 4$
$2^{n}$	9	15	21	$+6$

The table illustrates how large an input can be processed as a function of time. As we increase the computer time per input from one second to one minute and then to one hour, the size of the input increases accordingly.

The top two rows (complexities $n$ and $n \lg n$ ) increase rapidly: for $n$ , by a factor of 60 per column. The bottom two start out close together, but $n^3$ (which grows by a factor of 3.9) pulls well away from $2^n$ (whose growth is only additive). If an algorithm’s complexity is exponential then it can never handle large inputs, even if it is given huge resources. On the other hand, suppose the complexity has the form $n^c$ , where $c$ is a constant. (We say the complexity is polynomial.) Doubling the argument then increases the cost by a constant factor. That is much better, though if $c>3$ the algorithm may not be considered practical.

Comparing Algorithms: O Notation

Formally, define $f(n) = O(g(n))$ provided $|f(n)| \leq c|g(n)|$ as $n\to\infty$
$|f(n)|$ is bounded for some constant $c$ and all sufficiently large $n$ .
Intuitively, look at the most significant term.
Ignore constant factors as they seldom dominate and are often transitory

For example: consider $n^2$ instead of $3n^2+34n+433$ .

The cost of a program is usually a complicated formula. Often we should consider only the most significant term. If the cost is $n^2 + 99n + 900$ for an input of size $n$ , then the $n^2$ term will eventually dominate, even though $99n$ is bigger for $n<99$ . The constant term $900$ may look big, but it is soon dominated by $n^2$ .

Constant factors in costs can be ignored unless they are large. For one thing, they seldom make a difference: $100n^2$ will be better than $n^3$ in the long run: or asymptotically to use the jargon. Moreover, constant factors are seldom stable. They depend upon details such as which hardware, operating system or programming language is being used. By ignoring constant factors, we can make comparisons between algorithms that remain valid in a broad range of circumstances.

The “Big O” notation is commonly used to describe efficiency---to be precise, asymptotic complexity. It concerns the limit of a function as its argument tends to infinity. It is an abstraction that meets the informal criteria that we have just discussed. In the definition, sufficiently large means there is some constant $n_0$ such that $|f(n)|\leq c|g(n)|$ for all $n$ greater than $n_0$ . The role of $n_0$ is to ignore finitely many exceptions to the bound, such as the cases when $99n$ exceeds $n^2$ .

Simple Facts About O Notation

\begin{aligned} O(2g(n)) & \text{ is the same as } O(g(n)) \\ O(\log_{10}n) & \text{ is the same as } O(\ln n) \\ O(n^2+50n+36) & \text{ is the same as } O(n^2) \\ O(n^2) & \text{ is contained in } O(n^3) \\ O(2^n) & \text{ is contained in } O(3^n) \\ O(\log n) & \text{ is contained in } O(\sqrt n) \end{aligned}

$O$ notation lets us reason about the costs of algorithms easily.

Constant factors such as the $2$ in $O(2g(n))$ drop out: we can use $O(g(n))$ with twice the value of $c$ in the definition.
Because constant factors drop out, the base of logarithms is irrelevant.
Insignificant terms drop out. To see that $O(n^2+50n+36)$ is the same as $O(n^2)$ , consider that $n^2+50n+36/n^2$ converges to 1 for increasing $n$ . In fact, $n^2+50n+36 \le 2n^2$ for $n\ge 51$ , so can double the constant factor

If $c$ and $d$ are constants (that is, they are independent of $n$ ) with $0 < c < d$ then

$O(n^c)$ is contained in $O(n^d)$
$O(c^n)$ is contained in $O(d^n)$
$O(\log n)$ is contained $in O(n^c)$

To say that $O(c^n)$ is contained in $O(d^n)$ means that the former gives a tighter bound than the latter. For example, if $f(n)=O(2^n)$ then $f(n)=O(3^n)$ trivially, but the converse does not hold.

Common Complexity Classes

$O(1)$ is constant
$O(\log n)$ is logarithmic
$O(n)$ is linear
$O(n\log n)$ is quasi-linear
$O(n^2)$ is quadratic
$O(n^3)$ is cubic
$O(a^n)$ is exponential (for fixed $a$ )

Logarithms grow very slowly, so $O(\log n)$ complexity is excellent. Because $O$ notation ignores constant factors, the base of the logarithm is irrelevant!

Under linear we might mention $O(n\log n)$ , which occasionally is called quasilinear and which scales up well for large $n$ .

An example of quadratic complexity is matrix addition: forming the sum of two $n\times n$ matrices obviously takes $n^2$ additions. Matrix multiplication is of cubic complexity, which limits the size of matrices that we can multiply in reasonable time. An $O(n^{2.81})$ algorithm exists, but it is too complicated to be of much use, even though it is theoretically better.

An exponential growth rate such as $2^n$ restricts us to small values of $n$ . Already with $n=20$ the cost exceeds one million. However, the worst case might not arise in normal practice. OCaml type-checking is exponential in the worst case, but not for ordinary programs.

Sample costs in O notation

Recall that npower computes $x^n$ by repeated multiplication while nsum naively computes the sum $1+\cdots+n$ . Each obviously performs $O(n)$ arithmetic operations. Because they are not tail recursive, their use of space is also $O(n)$ . The function summing is a version of nsum with an accumulating argument; its iterative behaviour lets it work in constant space. $O$ notation spares us from having to specify the units used to measure space.

Function	Time	Space
npower, nsum	O( $n$ )	O( $n$ )
summing	O( $n$ )	O( $1$ )
$n(n+1)/2$	O( $1$ )	O( $1$ )
power	O( $\log~n$ )	O( $\log~n$ )
sillySum	O( $2^n$ )	O( $n$ )

Even ignoring constant factors, the units chosen can influence the result. Multiplication may be regarded as a single unit of cost. However, the cost of multiplying two $n$ -digit numbers for large $n$ is itself an important question, especially now that public-key cryptography uses numbers hundreds of digits long.

Few things can really be done in constant time or stored in constant space. Merely to store the number $n$ requires $O(\log n)$ bits. If a program cost is $O(1)$ , then we have probably assumed that certain operations it performs are also $O(1)$ ---typically because we expect never to exceed the capacity of the standard hardware arithmetic.

With power, the precise number of operations depends upon $n$ in a complicated way, depending on how many odd numbers arise, so it is convenient that we can just write $O(\log n)$ . An accumulating argument could reduce its space cost to $O(1)$ .

Some Simple Recurrence Relations

Consider a function $T(n)$ that has a cost we want to bound using $O$ notation. A typical base case is $T(1)=1$ . Some recurrences are:

Equation	Complexity
$T(n+1) = T(n)+1$	$O(n)$
$T(n+1) = T(n)+n$	$O(n^2)$
$T(n) = T(n/2)+1$	$O(\log n)$
$T(n) = 2T(n/2)+n$	$O(n\log n)$

To analyse a function, inspect its OCaml declaration. Recurrence equations for the cost function $T(n)$ can usually be read off. Since we ignore constant factors, we can give the base case a cost of one unit. Constant work done in the recursive step can also be given unit cost; since we only need an upper bound, this unit represents the larger of the two actual costs. We could use other constants if it simplifies the algebra.

For example, recall our function nsum:

# let rec nsum n =
    if n = 0 then
      0
    else
      n + nsum (n - 1);;
  val nsum : int -> int = <fun>

Given $n+1$ , it performs a constant amount of work (an addition and subtraction) and calls itself recursively with argument $n$ . We get the recurrence equations $T(0)=1$ and $T(n+1) = T(n)+1$ . The closed form is clearly $T(n)=n+1$ , as we can easily verify by substitution. The cost is linear.

This function, given $n+1$ , calls nsum, performing $O(n)$ work. Again ignoring constant factors, we can say that this call takes exactly $n$ units.

# let rec nsumsum n =
    if n = 0 then
      0
    else
      nsum n + nsumsum (n - 1);;
  val nsumsum : int -> int = <fun>

We get the recurrence equations $T(0)=1$ and $T(n+1) = T(n)+n$ . It is easy to see that $T(n)=(n-1)+\cdots+1=n(n-1)/2=O(n^2)$ . The cost is quadratic.

The function power divides its input $n$ into two, with the recurrence equation $T(n) = T(n/2)+1$ . Clearly $T(2^n)=n+1$ , so $T(n)=O(\log n)$ .

Exercises

Exercise 2.1

Code an iterative version of the function power.

Exercise 2.2

Add a column to the table of complexities from The Design and Analysis of Computer Algorithms with the heading 60 hours:

complexity	1 second	1 minute	1 hour
$n$	1000	60 000	3 600 000
$n \log n$	140	4 895	204 095
$n^{2}$	31	244	1 897
$n^{3}$	10	39	153
$2^{n}$	9	15	21

Exercise 2.3

Let $g_1$ , …, $g_k$ be functions such that $g_i(n)\ge0$ for $i=1$ , …, $k$ and all sufficiently large $n$ .

Show that if $f(n) = O(a_1 g_1(n)+\cdots+a_k g_k(n))$ then $f(n) = O(g_1(n)+\cdots+g_k(n))$ .

Exercise 2.4

Find an upper bound for the recurrence given by $T(1) = 1$ and $T(n) = 2T(n/2)+1$ . You should be able to find a tighter bound than $O(n\log n)$ .