Week 28
OCaml MCP server
Last week I got the summarisation to the point where it felt useful to run it across all the modules in opam. With this completed we then got to try out the MCP server to see how useful it would be in practice.
One of the first queries Anil tried was to ask it which libraries would be most useful for "date time parsing and formatting". We were surprised to see that the first two libraries it returned were caqti and mariadb, specifically mentioning the module Caqti_platform.Conv and Mariadb.S.Time. While these do indeed provide the required functionality, they're probably not the right libraries to provide this. It's going to be tricky to decide this in the MCP server, so we should probably be leaving it up to the LLM to decide amongst them on the client. However, for very general queries we might end up with a large number of matching libraries, so we'll need to have a limit on the number of packages returned, which implies some form of ranking.
One way we can do this is by using the occurrences code in odoc. The idea is that we examine module implementation files (ie, ml rather than mli files), and counts the number of times the code uses values, types and other identifiers from other libraries. We can then aggregate these counts over all packages in opam repository and use it as an effective marker of popularity, which allows us to rank the results by popularity and only return the top N results.
We're not currently using the occurrences for anything, so I wasn't especially surprised to find that it's not working as intended. There were a number of issues:
- The occurrences output file was being written at a path not within the package dir, so it wasn't being persisted.
- The CLI interface for generating occurrences works by providing a directory containing the odocl files, but we were only providing the top-level directory and it wasn't recursively searching.
- Once the occurrences were captured, the aggregation step used the full identifier of the value being aggregated, meaning that, for example,
List.lengthin OCaml 5.3 was counted separately fromList.lengthin OCaml 4.14.
All of these issues are with code in the odoc repository, which, as it happens, also needs a release soon to ensure that it works with the imminent launch of OCaml 5.4. During the week, before I discovered the problems above, I had attempted to make a release of Odoc 3.1, but there was a license kerfuffle that, when combined with the issues in the occurrences code, gave me enough cause to pull the release.
Before I try to make the release again, this time I'll be running the release candidate with docs-ci, and checking that the occurrences make sense. I set this running on Friday afternoon, and it had completed by Friday evening, so it's actually pretty quick to rerun odoc on the 15,000 or so packages required for ocaml.org.
Trouble with this blog
In other news, in trying to post my blog at the beginning of the week, I was stymied a little by the changes in oxcaml. I had been using a custom opam-repository forked from the official oxcaml one, because I needed a patched js_of_ocaml in order to fix the toplevel code. I had hoped this would mean that I could update it on my schedule, rather than being at the mercy of upstream changes. Unfortunately though, the download URL for ocaml-flambda wasn't pointing at an immutable commit, so when I tried it I got a checksum error. So I ended up trying to rebase the changes onto the latest oxcaml opam-repository, which didn't go well at all. The version numbers had all changed, which in opam means that files are in different directories, so git got thoroughly confused. On top of that, because the js_of_ocaml repository has multiple packages in it, whereas opam repository has a directory per-package, we end up having multiple copies of the patches. So in the end I've just committed all the patches to a git repo on github, and pinned it in the Dockerfile that builds this site.
What would be handy is a way to apply the patches in a package in opam repository to and from a git repository, similar to quilt/guilt. We don't quite have all of the pieces to do this, as although we have a download URL and often a dev-repo, I don't believe we currently have a way to get the base commit of that repository.
Oxcaml continues
We had a meeting on Thursday with Jane Street on the next steps for oxcaml. There are a number of areas in which JS are keen for us to help out with.
- Playgrounds - both javascript and docker-based. The playground on the oxcaml website right now uses github codespaces, which works nicely but currently takes an absolute age to start up. We can almost certainly improve this by building docker images and pushing them to the docker hub, rather than building oxcaml from scratch when starting the codespace. There's also interest in the javascript playgrounds, which can serve a slightly different purpose than the docker-based one, more limited in how it can be used, but without requiring someone to spin up a full docker container.
- Documentation - Odoc has had some patches to run on oxcaml, but there's no support for documenting many of the new features yet, including modes. We've got to do some experiments here to see what the best way is to show the new type-system features in the generated docs. There were some suggestions of using javascript to show/hide the modes, for example.
- Improvements in Merlin - again this is an area ripe for investigation. In particular, how do we best expose the new features of the type system for users? What's needed here is user feedback from people who are actually using oxcaml to build real projects.
- Better error messages - OCaml has been getting improved error messages with each release, but there's still room for improvement, and the new features of the type system in particular have many different failure modes. Again, we need user feedback to understand the pain points and improve the error messages accordingly.
Next week
Next week, the plan is to:
- Check the occurrences from docs-ci, and integrate into the MCP server
- Talk to Mark about building the docker image for the oxcaml playground
- Tidy up the
Js_top_workercode so it can be used in the javascript oxcaml playground - Release Odoc 3.1