Semantic Versioning in OCaml is Hard

Semantic versioning is a lovely and simple idea that, if it were reliably implemented everywhere, would make life a lot simpler. So, is it possible to make our OCaml libraries stick to this scheme? Unfortunately it turns out to be a lot trickier than it seems.

opam-format.2.3.0 ≠ opam-format.2.3.0?

Let's take the case that hit me this morning. I've been working on ocaml-docs-ci in order to bring the exciting new odoc 3 features to ocaml.org for everyone to enjoy. I have it checked out and building locally, but to deploy it to the infrastructure managed by Mark Elvers it needs to be packaged up into a Docker image. So I issued the usual docker build . and after it churned through the setup stages and got on to building the project, it hit an error:

File "src/solver/solver.ml", line 58, characters 75-98:
     let deps = List.map (fun pkg -> OpamPackage.Map.find pkg simple_deps) (OpamPackage.Set.to_list pkgs) in
Error: Unbound value OpamPackage.Set.to_list
Hint: Did you mean of_list?

Now OpamPackage is a module in the opam-format library, which is easily discovered using the excellent Sherlodoc tool, so I checked what version I had locally, and what version I had in the Docker container, and it turned out I was using exactly the same version -- 2.3.0 -- both locally and in the container. So what's going on?

The problem is that the Dockerfile I was using was using OCaml version 4.14, whereas locally I was using OCaml 5.3. "But how on earth can this cause the API of opam-format to change?" I hear you wail! Well, this is actually one of the simpler outcomes of the way the OCaml module system works. Let's look at the code.

The first thing we note is the absence of any definition of Set or Map here

include OpamStd.ABSTRACT with type t := t

So let's take a look over in opamStd.mli to see what that signature looks like:

(** A signature for handling abstract keys and collections thereof *)
module type ABSTRACT = sig

  type t

  val compare: t -> t -> int
  val equal: t -> t -> bool
  val of_string: string -> t
  val to_string: t -> string
  val to_json: t OpamJson.encoder
  val of_json: t OpamJson.decoder

  module Set: SET with type elt = t
  module Map: MAP with type key = t
end

OK, so we've found the definitions of Set and Map - they refer to signatures SET and MAP which are defined just above in opamStd.mli. Let's just look at Set since that's where the problem was:

module type SET = sig

  include Set.S

  val map: (elt -> elt) -> t -> t

  val is_singleton: t -> bool

  (** Returns one element, assuming the set is a singleton. Raises [Not_found]
      on an empty set, [Failure] on a non-singleton. *)
  val choose_one : t -> elt

  val choose_opt: t -> elt option

  val of_list: elt list -> t
  val to_list_map: (elt -> 'b) -> t -> 'b list
  val to_string: t -> string
  val to_json: t OpamJson.encoder
  val of_json: t OpamJson.decoder
  val find: (elt -> bool) -> t -> elt
  val find_opt: (elt -> bool) -> t -> elt option

  (** Raises Failure in case the element is already present *)
  val safe_add: elt -> t -> t

  (** Accumulates the resulting sets of a function of elements until a fixpoint
      is reached *)
  val fixpoint: (elt -> t) -> t -> t

  (** [map_reduce f op t] applies [f] to every element of [t] and combines the
      results using associative operator [op]. Raises [Invalid_argument] on an
      empty set, or returns [default] if it is defined. *)
  val map_reduce: ?default:'a -> (elt -> 'a) -> ('a -> 'a -> 'a) -> t -> 'a

  module Op : sig
    val (++): t -> t -> t (** Infix set union *)

    val (--): t -> t -> t (** Infix set difference *)

    val (%%): t -> t -> t (** Infix set intersection *)
  end

end

Sure enough, there's no to_list function defined in there. Once again though, there's an include Set.S in there. It turns out that that refers to the Set module in the OCaml standard library. We can again look at the source:

val to_list : t -> elt list
    (** [to_list s] is {!elements}[ s].
        @since 5.1 *)

And there it is. The to_list function has only been in the Set module since version 5.1.

Using ocaml.org docs

It was pretty difficult to figure that out from the source, but happily there's a better way. We can browse the docs on https://ocaml.org/ - We can look at the docs for the OpamPackage.Set module which, as of today, does not contain any to_list function. The include Set.S is there with the expansion showing all of the types and values coming from it, so we can click on the Set.S link on the include line which takes us to the documentation for the stdlib from OCaml 4.11.2. Changing the version from the dropdown at the top to something more recent takes us to a page containing the to_list function with the helpful since 5.1 annotation.

This is, in fact, a relatively simple example of the sorts of issues that can occur that make semantic versioning a headache. In this example, it was a change due to a difference in the compiler version used, but there's nothing particularly special about that - a package may expose signatures derived from any of its dependencies! So is there anything we can do about this? Obviously, yes!

Towards a solution

Step 1 of any approach to solving this is to be able to identify which bits of a libraries API come from which packages, and therefore which versions of those packages. It turns out there may well be a nice way to piggy-back on a recent feature from Odoc, which was originally intended to help with suppressing suprious warnings.

The problem we were tackling was that if your library ends up exporting a module whose signature is defined in someone else's package, then any warnings that come from it are unfixable. To fix this we added a tag to each signature of a module that indicates which package it originally came from. Odoc is then very careful to keep track of this as it performs its signature manipulations, resulting in an accurate way to know which signature elements came from which package. This fixed the problem of the spurious warnings nicely.

Quite separately, we've got the docs CI that is attempting to build docs for every version of every package. Obviously given the above, in order to exhaustively show all the possible APIs of every library, we should build all possible combinations of every version of every package. Clearly we can't possibly do this, so the docs CI focuses on the goal of building at least one solution for every version of every package.

Now if you combine these two ideas, we can use the builds of the packages with the tracking of the package of the originating signatures to be able to precisely track the differences in API between different versions of a package. This would allow us to build a database of those changes, and with this in hand we can look at what APIs are used in any other package and be able to suggest upper and lower bounds on the versions of its dependencies.

Now wouldn't that be cool?