Claude and Dune

I've been having a bit of a love-hate relationship with Claude for a few weeks now, trying to get the odoc v3 rules into something upstreamable . What started off with a background activity while I was lecturing has turned into a frustrating weeks-long struggle to get the already-mostly-working code into good enough shape that it can be turned into a PR.

The initial progress was pretty amazing. I wasn't particularly hopeful at the start that Claude would have been able to dig into the dune codebase, which is quite large and at times tricky to understand. First I had it look at several bits of example code:

1. dune_rules/odoc.ml - this is the current home of the odoc rules in dune. It's local-only, meaning it only builds the docs for the current package in isolation, so no links to stdlib, other packages, or anything work at all.

2. dune_rules/odoc_new.ml - these are the rules for odoc v2, which allow you to build the docs for your package plus all of the dependencies. I wrote this mostly myself some time ago. It does a pretty poor job of caching, error reporting, and has none of the odoc v3 features like assets, source rendering, hierarchical docs, better errors and so on.

3. odoc_driver - this is the driver we wrote when building odoc v3. It's fully featured, but not at all incremental.

Given these, it quite quickly managed to build a new implementation that kinda mostly worked. At this point I was not looking carefully at the code it was generating as I was more interested in seeing whether it could at all, and in any case this was supposed to be a background task while I was working on the lectures. I gradually added more and more features to it, pleased at the progress it was making.

When the lectures finished, it turned out I had something that was pretty useful to me, and had a good chance to be useful to others too. So I opened up my editor and had a look through what had been produced, at this point hoping that a little bit of polishing should be enough - after all, it _was_ working!

It was dreadful.

There were long, rambling functions, code duplication, bad comments, it was unstructured, with repeated-but-slightly-different chunks all over the place. It wasn't just bad on one length scale - it was bad from the large-scale organisation of the code down to small scale baffling weirdnesses on one line. The more I looked, the more bonkers it appeared.

But it _did work_! So I thought I'd get Claude to clean up its own messes. I resolved that I would continue to let Claude do _all_ of the editing, and not do _any_ myself, and so thus began the more frustrating part of this adventure! I ended up giving a mix of very specific instructions: "move this code here", "factorize out this functionality", "rename this function", and sometimes more general ones: "Remove any comments that don't add anything of value", or "Think of a better way to do this". The constant was that I needed to be looking over each change that it did, because while most of them were pretty good, there were still a few, even with the very explicit instructions, where it messed up. It was at this point where it became frustrating, because although it's almost magical that Claude can do what it does in the time it does, this fact of having to keep a constant eye on it meant the the tens-of-seconds to minutes delay in between it doing something meant I ended up twiddling my thumbs for long periods.

One part that particularly impressed, and also quite surprised me, was with its knowledge of OCaml. In particular, I had at one point two different types representing the 'target' - either a library or a package - and a 'kind' - either a module or a page. Now pages can only be associated with package targets, and modules can only be associated with libraries, but these two values were distinct, so there was a fair bit of code pattern matching invalid combinations and either throwing exceptions or picking some random value, depending on the whims of Claude's context. I bravely suggested it think of a better way to represent this, maybe using GADTs, and it did indeed come up with a pretty nice refactoring of the types.

At one point, I ran out of Claude credits (despite paying $100 a month or so), at about 6:20pm one evening, and it told me that I needed to wait until 7pm to carry on. I'd just got to the point when I needed to write a short bit of code rather than refactoring what was already there, and I realised that while it would take me maybe 10 mins, it would take Claude maybe 10 seconds. Now, it could just be that it was the end of a long day and I was running out of steam, but I was content to switch focus elsewhere for a bit to wait for my credits to reset before carrying on! The point being that for the small implementation that I was after, it would be possible for me to get Claude to do it, and to eyeball the result to make sure it was OK in less time than I would have been able to do it myself. But I wouldn't have trusted Claude to do it in an upstreamable way **without** looking at the result.