Last week I got the summarisation to the point where it felt useful to run it across all the modules in opam. With this completed we then got to try out the MCP server to see how useful it would be in practice.
One of the first queries Anil tried was to ask it which libraries would be most useful for "date time parsing and formatting". We were surprised to see that the first two libraries it returned were caqti
and mariadb
, specifically mentioning the module Caqti_platform.Conv
and Mariadb.S.Time
. While these do indeed provide the required functionality, they're probably not the right libraries to provide this. It's going to be tricky to decide this in the MCP server, so we should probably be leaving it up to the LLM to decide amongst them on the client. However, for very general queries we might end up with a large number of matching libraries, so we'll need to have a limit on the number of packages returned, which implies some form of ranking.
One way we can do this is by using the occurrences code in odoc. The idea is that we examine module implementation files (ie, ml rather than mli files), and counts the number of times the code uses values, types and other identifiers from other libraries. We can then aggregate these counts over all packages in opam repository and use it as an effective marker of popularity, which allows us to rank the results by popularity and only return the top N results.
We're not currently using the occurrences for anything, so I wasn't especially surprised to find that it's not working as intended. There were a number of issues:
List.length
in OCaml 5.3 was counted separately from List.length
in OCaml 4.14.All of these issues are with code in the odoc repository, which, as it happens, also needs a release soon to ensure that it works with the imminent launch of OCaml 5.4. During the week, before I discovered the problems above, I had attempted to make a release of Odoc 3.1, but there was a license kerfuffle that, when combined with the issues in the occurrences code, gave me enough cause to pull the release.
Before I try to make the release again, this time I'll be running the release candidate with docs-ci, and checking that the occurrences make sense. I set this running on Friday afternoon, and it had completed by Friday evening, so it's actually pretty quick to rerun odoc on the 15,000 or so packages required for ocaml.org.
In other news, in trying to post my blog at the beginning of the week, I was stymied a little by the changes in oxcaml. I had been using a custom opam-repository forked from the official oxcaml one, because I needed a patched js_of_ocaml in order to fix the toplevel code. I had hoped this would mean that I could update it on my schedule, rather than being at the mercy of upstream changes. Unfortunately though, the download URL for ocaml-flambda wasn't pointing at an immutable commit, so when I tried it I got a checksum error. So I ended up trying to rebase the changes onto the latest oxcaml opam-repository, which didn't go well at all. The version numbers had all changed, which in opam means that files are in different directories, so git got thoroughly confused. On top of that, because the js_of_ocaml repository has multiple packages in it, whereas opam repository has a directory per-package, we end up having multiple copies of the patches. So in the end I've just committed all the patches to a git repo on github, and pinned it in the Dockerfile that builds this site.
What would be handy is a way to apply the patches in a package in opam repository to and from a git repository, similar to quilt/guilt. We don't quite have all of the pieces to do this, as although we have a download URL and often a dev-repo, I don't believe we currently have a way to get the base commit of that repository.
We had a meeting on Thursday with Jane Street on the next steps for oxcaml. There are a number of areas in which JS are keen for us to help out with.
Next week, the plan is to:
Js_top_worker
code so it can be used in the javascript oxcaml playground