It's tremendously exciting to be back in the Computer Laboratory, as the last time I worked here was just before the pandemic. I'm now a member of the Energy and Environment Group whose goal is "to have a measurable impact on tools and techniques for de-risking the future".
With such a broad goal, it's hard to know where to start and how I'll fit in, so my first few weeks have been spent getting to know the other members of the group and what they're up to. It's an incredibly inspiring group of individuals who are all doing amazing work, and it's really humbling and daunting to be a part of it.
There's some really interesting work going on in our group on LLMs, principally led by the fantastic Sadiq Jaffer. We had a chat a few weeks ago and have started to explore some ideas around seeing how well LLMs can program in OCaml already before we start to do some RL training on them. Having not done any LLM stuff before, it's a steep learning curve for me, but we're already seeing some interesting results. We should have some more to say about this in the coming weeks.
Last week I met with Michael Dales, and he talked about the project shark that he and Patrick Ferris have been working on. It's kind of a mix between a shell and a jupyter-style notebook, with a strong focus on reproducibility. The traditional pain of notebooks is, of course, the execution model, whereby cells might be executed in any order you like. This means that the state you find the notebook in might not be even reachable again, let alone consistently reproducible. Shark is trying to address this by using file-system snapshots and clever analysis of the inputs and outputs of each cell to both ensure reproducibility, but also to allow a fast editing cycle, rerunning of only the bits that need to be rerun, even in the presence of slow data processing steps. It's a fascinating project, and I can't wait to see it in action when Michael gives us a demo!
I also met up with Ryan Gibb with David Allsopp and we had a good chat about his project Babel, which is using the PubGrub algorithm to do package resolution for multiple package domains at once. We've got a number of avenues to explore here, from building a PubGrub implementation in OxCaml, to using Babel to construct Docker images for opam packages entirely from scratch, without using a base image.
With my other hat on as a member of the CTO office at Tarides, I'm very much looking forward to using OCaml and OxCaml to solve some real-world problems that are in an entirely different domain than I've been used to over the last few years.