jon.recoil.org

Weeknotes 2026 weeks 16-17

A two week update this week. Most of this fortnight has been spent on different sides of the same problem: getting OCaml documentation into a state where an LLM (or a human) can actually rely on it — search, packaging, performance, and infrastructure.

Docs and LLMs

It seems fairly obvious, but having docs available for libraries is very important for LLMs to be able to use them effectively, especially if they're new or private. There have been various papers on the subject — for instance the 2022 paper on tool documentation and the more recent work on the importance of example code — as well as our own contribution to the OCaml workshop last year.

Examples in OCaml libraries often live in mld files, and the correctness of these can be automatically tested using mdx, which works well on mld files.

Sherlodoc deliberately skips mld files, because plain English in mld pages drowns the API hits — when indexing was originally enabled it ended up polluting the API results too much, with generic text appearing way too often. Sherlodoc doesn't do any of the interesting stemming or other BM25-style weighting, so as more mld content lands in packages we'll likely need to investigate a hybrid approach to searching the docs.

Oi!

Anil's just shipped a neat new tool in OCaml land — oi!, built on top of day10 and relocatable OCaml. This is a really neat little tool that you can use to run OCaml tools and scripts without having a dedicated opam setup.

As I've been getting my Claude-built day10 successor, day11, into shape as a replacement for the guts of the current ocaml-docs-ci, I thought I'd check to see if the libraries I'd made could work as a drop-in for the d10 part of oi (Anil's vendored copy of the relevant bits of day10). This indeed worked nicely, without huge impact on the rest of oi (aside from excising a chunk of it, of course).

Next step is to see if we can get decent doc support into oi. It'd be great to be able to run oi doc or oi doc search in a project and have accurate docs pop out. More on this to come!

Odoc performance investigation

A particular problem with getting the docs to 'just pop out' is that some docs take a long time and too much memory to build with odoc. So much so that the Github Action workflow that generates them can just die, particularly when trying to build the docs for the oxcaml branches of base and core. So I've spent a little time investigating this over the last couple of weeks, and made some quite significant progress.

Headline figures on my laptop: wall time 549 s → 474 s (−14%), total allocation 381 GB → 230 GB (−40%), peak RSS 2.00 GB → 1.81 GB.

Particularly bad for odoc was an include-expansion explosion: a single source line in container_intf.ml was being walked 10,777 times during html-generate, because each ppx_template monomorphisation produced its own Include whose expansion nested further Includes. I had previously noted this issue and tackled the most awful of the problems, but there was more to fix!

The includes are effectively a workaround for the lack of layout polymorphism in the current oxcaml compiler. A quick investigation showed that of the 155,828 doc comments in container_intf.ml there were only 33 unique strings, once the ppx_template duplication was de-duped. Memoising the parsing of these led to a substantial improvement.

In addition, many of the includes end up flattened in the resulting HTML due to their module-type being either a signature or pointing at a hidden item. Includes come with a fair bit of overhead, so rather than flattening them at the point we're generating the HTML, we can spot this pattern and flatten them much earlier in the process.

The single biggest html-generate win was a one-liner. segment_to_string was doing Format.asprintf "%a%s" where the %a formatter just emits "<kind>-" or nothing. A direct pattern-match killed ~60% of html-generate allocation on stdlib, ~50% on core.

I'll tidy up these patches, double check that they don't affect the output, and see how much of an improvement they make on Anil's oxmono repo.

Day11

I mentioned last time about replacing the innards of ocaml-docs-ci with the day11 libraries. This has worked out really nicely. The shape of the resulting tool is that we've got an ocurrent pipeline that watches the opam-repository on GitHub, and triggers the day11 build when it notices a change. Each layer corresponds to an ocurrent job. On top of the normal UI I've added in some pages that make it easier to spot emerging problems in the docs build. My ultimate plan is to put an LLM in charge of this so it can watch the status of it every day, and then let me know if it thinks I need to do something!

Improved site

I made some small improvements to the site too. I added a @figure plugin, and a margin notes plugin {&margin Margin notes look like this!} Margin notes look like this!, and added tags to my pages. You're already reading the result — the Docs CI screenshot above uses the new @figure plugin, and you can see the margin notes.

Until next week!