blog/2025/05/lots-of-things.mld

Ocaml-docs-ci

I've been working with Mark Elvers on getting the docs CI running using Odoc 3.0. There are quite a few changes involved, both in how we're building the packages but also how we're running odoc - it's building using odoc_driver rather than voodoo now, and while it's looking promising now we had hit a few hurdles along the way.

We set the CI going last weekend but discovered that it was having some issues building packages using OCaml 5.3.0. The way the builds work is that we first do a "solve" step for each version of every package so we've got exact versions of all of the packages required to build them. We then look through that solution to figure out the version of OCaml required, and the (to avoid a little bit of work) we start from one of the opam docker images for that version of OCaml.

When installing a package using opam it does a few operations that scale with the size of the opam repository, which ends up adding around ten of seconds to the build time. When we're building 50,000 packages, this adds up to quite a lot of time, so we short-cut this process with the simple expedient of creating an opam-repository that only contains the packages we need for the build. However, since we've already got a few packages installed in the image, we need to make sure our repository contains these packages too, otherwise opam gets thoroughly confused. My mistake was that we were missing out the `ocaml-compiler` package, which is new in OCaml 5.3.0, which led to the builds failing. Adding this in and kicking off the build again it's now got a lot further - at time of writing it has built 14,000 packages, there are 6,000 still building, and 1000 that have failed. If it continues in a similar fashion, this will compare quite favourably with the docs CI that's currently powering ocaml.org, where it has successfully built 17,000 packages, and 4,500 have failed.

Mark has been working on a different approach to the build process, which is to come up with a new binary that doesn't do any of the O(n) operations and just builds the package! This is definitely a promising direction, and I'm hoping to take a look at his prototype soon.

Meanwhile, panglesd is working on integrating this into the ocaml.org site, and is making good progress. He spotted last week that we were overwriting the `status.json` file that comes out of `odoc_driver` which we will use to power the redirections on ocaml.org. One of the changes of odoc 3.0 is that we carefully put modules into a directory structure that represents the library in which they are found. It's long been a pain that OCaml libraries (what Ocamlfind unhelpfully calls 'packages') are not always the same name as the opam package in which they're found. For example, the package ocamlfind contains the library findlib. So to help the user figure out where to find the module, we're putting it into the URL of the docs, and therefore into the breadcrumbs. The downside is that the modules are now in a different place on the website to where they were before, so the status.json file is there to help with the redirections.

Notebooks

I've been working on Merlin integration with the notebooks, which has been a fun little project. The bits that needed improving most were that merlin didn't work with toplevel-style code, and that each cell was a separate typing context, so while you could define a function in one cell and execute it in another, Merlin would tell you the function was undefined.

For the toplevel-style code, what I've ended up doing is to essentially strip out all of the toplevel bits and pieces, and replace them with whitespace. So where I have a cell that looks like:

# let x = 1 + 2;;
  - val x : int = 3
# let y = x + 1;;
  - val y : int = 4

I tell Merlin that the contents are:

  let x = 1 + 2;;

  let y = x + 1;;

where I'm careful to maintain the position of the original code. This bit is working quite nicely, but only when the code is syntactically correct, as I'm using the standard toplevel parser to figure out where the expression ends. I think I'm going to end up needing to write a custom parser for this, so something that will end on a ;; but ignore them in string constants, comments and so on.

The approach I've taken for the second problem is to treat each cell as a separate module. I then write out a cmi file into the virtual filesystem as cell__id_0.cmi and open all the previous modules in 'line 0' of every cell. I then remap all of the reported locations by removing 'line 0'.

There are a number of issues with the current approaches: 1. The stripping of the toplevel bits is a little fragile, and currently only works when the toplevel is syntactically correct. This is fairly fixable. 2. When the contents of the cells change we need to flush any caches merlin and the compiler have. 3. An open statement in once cell does _not_ cause the module to be available in the next cell. 4. A lot of cells leads to a lot of opens!

I suspect that this the 'cells as modules' approach might end up being a bit of a dead-end, so I'll have a chat with Ulysse to figure out the next experiment.

Oxcaml

I've been working on trying out oxcaml too, which has been a bit challenging. Firstly, although Jane Street provide a version of js_of_ocaml, the toplevel didn't work. Fortunately, my amazing colleagues Patrick O'Ferris and Arthur Wendling spent a good chunk of time fixing this and provided an opam repository with the relevant changes, without which I would have not been able to make any progress. Thanks, guys! So my goal of making my notebooks work with it looked doable, but I almost immediately hit more dependency issues that make it problematic to port the whole site over - including odoc and various PPXes that I use.

I've therefore decided that I would bring forward a feature that I'd had in mind for a while - that we could have different "backends" for the notebooks. So I'd still build the frontend using "normal" OCaml, but the web-worker serving as the toplevel would be an oxcaml one.

Of course, it didn't work first time! After a bit of head-scratching, it turned out that the interface between the worker and the main thread, although I'd almost got it ocaml-agnostic, wasn't quite right. The way it works is that it uses the jsonrpc protocol to communicate, and while it had marshalled the requests into a string, it hadn't turned that final OCaml string into a Javascript string, so it was sending the js_of_ocaml representation of a string as an object, rather than a simple string. When the frontend and workers were built with different compilers, this ended up just failing with an obscure error, which took a good deal of time to track down. Once that was fixed, it was just a case of making sure I could have 2 independent 'switches' on my site - one for oxcaml and one for standard OCaml.

The upshot of all this is that I now have a semi-working version of the notebooks using oxcaml. As an initial demonstration I ported one of my colleague Cuihtlauac's oxcaml tutorial docs to the notebook format, and it works quite nicely.