Leveraging LLMs to Prototype Ideas

Konstantin Andrikopoulos November 27, 2024 #rust #llm #chatgpt #clap #miri

Miri's Custom CLI Parser

Recently I had to make some exploratory work to see how we can improve a small part of miri. Specifically miri-script, a small rust binary that automates some parts of miri's development process.

This binary needs to parse command line flags. Some of them it consumes itself, while it forwards the rest to other rust tools, like cargo or rustc.

As discussed in this issue, currently miri uses a custom cli parser because of some behaviour that was incompatible with clap. This behaviour recently changed, and, while still not fully compatible, I wanted to experiment and see if something could be done.

The only remaining incompatibility was this: Positional command line flags before and after a possible -- should be passed to the underlying tools, while also preserving --. This is not supported by clap. However, we could split the arguments in "before --" and "after --", and then use clap's parse_from to parse the "before" part. Then we can recombine the after part with the values that clap parsed, and pass them on to the tools.

But what exactly was needed to be done is not important here. The main idea is that I wanted to rewrite some custom cli parsing code to use clap instead.

Using the Derive API

To me this was pretty straight forward. Miri already define a Command enum that encoded the various subcommands of miri-script in the type system. So using clap's derive API seemed like a no-brainer to me. I just needed to add some annotations to the enum and that would be the end of it. It was easy. Fun even.

So that's exactly what I did and this is the PR that I made. The code looked much cleaner in my opinion. But there was some hit in the compile time of the script. It went from 2.5s to 3.7s.

Not a great deal, since the script is not compiled often, but still.

The Builder API

The main reason the derive API needed so long to compile is procedural macros. It is through them that it can generate the boilerplate for you, and simplify your code.

But clap provides also the builder API. This api is lower level, and you need more code to do what the derive API does. But it is much faster to build. And the translation between the two APIs is pretty straight forward. But you need to do some boring typing.

So in my PR I suggested that if the slow down was too much, we can use the builder API instead. And indeed the miri maintainers agreed that this was something that we should explore.

And then I realized, I really, really did not want to do that. Mainly because it was a mindless task, I wouldn't learn anything new in the process, and I would probably waste some time figuring out how to exactly translate a derive feature into a builder one. Boring, boring...

But then I thought:

What is good at writing code by regurgitating existing examples and documentation available on the internet?

AI AI AI AI

NOTE

I am copy-pasting my prompts from a ChatGPT interaction. I now see I made some typos. I decided to leave the prompt as it was seen by ChatGPT. Don't judge me!

Yes, finally I had a good use case for an LLM. I had my code written for the derive API, and as I said, going to the builder API basically included translating the code from one "language" to the other.

So I copy pasted the entire main.rs file in ChatGPT with the following prompt prepended:

translate this rust file to use the clap builder api instead of the derive api:

And ChatGPT started spitting out code. Unfortunately it tried to save OpenAI some computation costs and omitted some of the subcommands originally included in main.rs. So I replied with:

i want you add the rest of the commands instead of the comment you wrote

(I guess at this point it becomes apparent that I am not a sophisticated prompt engineer)

Then it spit out some more code, this time including all the subcommands, yay!

But it made a crucial mistake. You see, miri-script is not alone in defining a Command type. So does clap. And ChatGPT mixed the two. Oops...

So while this code:

Command::new("build")
.about("Just build miri.")
.arg(
    Arg::new("flags")
        .help("Flags that are passed through to `cargo build`.")
        .action(ArgAction::Append)
        .trailing_var_arg(true),
),

and this code:

Some(("rustc-pull", sub_m)) => Command::RustcPull {
    commit: sub_m.get_one::<String>("commit").cloned(),
},

both refer to some Command type, the first one should come from clap, while the second one from miri-script. I informed ChatGPT of its mistake:

you need to define an enum called command as there was in the original version. Otherwise the compilre thinks you are uisng the command from clap

And I got more code in response. With a quick visual inspection the code seemed OK. But when I tried running it, it didn't compile:

error: implementation of FnOnce is not general enough
   --> miri-script/src/main.rs:142:21
    |
142 | /                     Arg::new("many_seeds")
143 | |                         .long("many-seeds")
144 | |                         .help("Specify a range for many seeds.")
145 | |                         .value_parser(|s| parse_range(s).map_err(|e| e.to_string())),
    | |____________________________________________________________________________________^ implementation of FnOnce is not general enough
    |
    = note: closure with signature fn(&'2 str) -> Result<MiriScriptRange, std::string::String> must implement FnOnce<(&'1 str,)>, for any lifetime '1...
    = note: ...but it actually implements FnOnce<(&'2 str,)>, for some specific lifetime '2

As I was committed to the idea of using only ChatGPT to do something I didn't want to do, I blindly copy pasted the error, with the following prepended prompt:

i got this error. only show me what i need to change dont print the whole code:

And ChatGPT replied by telling to change this:

Arg::new("many_seeds")
    .long("many-seeds")
    .help("Specify a range for many seeds.")
    .value_parser(|s| parse_range(s).map_err(|e| e.to_string())),

To this:

fn parse_many_seeds(arg: &str) -> Result<MiriScriptRange, String> {
    parse_range(arg).map_err(|e| e.to_string())
}

Arg::new("many_seeds")
    .long("many-seeds")
    .help("Specify a range for many seeds.")
    .value_parser(parse_many_seeds),

And now the code compiled and worked! It still had some minor issues that I changed manually. One example that I remember was this snippet:

Some(("run", sub_m)) => Command::Run {
    dep: sub_m.get_flag("dep"),
    verbose: sub_m.get_flag("verbose"),
    many_seeds: sub_m.get_one::<String>("many_seeds").and_then(|s| parse_range(s).ok()),
    target: sub_m.get_one::<String>("target").cloned(),
    edition: sub_m.get_one::<String>("edition").cloned(),
    flags: sub_m
        .get_many::<String>("flags")
        .unwrap_or_default()
        .cloned()
        .collect(),
},

Here ChatGPT doesn't realize that when it created the many_seeds argument in the parser it did it like this:

.arg(
    Arg::new("many_seeds")
        .long("many-seeds")
        .help("Specify a range for many seeds.")
        .value_parser(parse_many_seeds),
)

it uses a value_parser invocation to tell clap how a specific flag can be parsed. This means that when getting the value from sub_m it can directly get a MiriScriptRange, it doesn't need to get a String. So I changed the code to do that:

Some(("run", sub_m)) =>
    Command::Run {
        dep: sub_m.get_flag("dep"),
        verbose: sub_m.get_flag("verbose"),
        many_seeds: sub_m.get_one::<MiriScriptRange>("many_seeds").cloned(),
        target: sub_m.get_one::<String>("target").cloned(),
        edition: sub_m.get_one::<String>("edition").cloned(),
        flags: sub_m.get_many::<String>("flags").unwrap_or_default().cloned().collect(),
    },

Probably there were more things to change. But at this point I only wanted to quickly verify that the code is more or less what I would have written by hand.

Everything looked fine so I made a draft PR to miri. Not only that, but the compile time with the builder API fell to 3s from 3.7s with the derive API. Woohoo!

Overthought

I was quite hesitant to send the (partially) ChatGPT generated PR. I wasn't sure if it would be well received by miri's maintainers.

I guess anxiety will find a way to make you anxious.

Conclusions

While in the end we won't use the builder API in miri, since the slowdown is acceptable for the code simplification it brings, I feel like it was a nice experience. I managed to find a way to turn a task that I really didn't want to do into a quick interaction with a LLM.

Not only that, but it kept the effort of prototyping to a minimum, so I didn't really care when we decided to not go through with the changes. I didn't feel any particular attachment to them.

I also learned another lesson. In my first PR I tried to do some profiling to see if using the builder API would reduce the build time or not. The way I interpreted the results made me think that we wouldn't see much of a difference. I was proven wrong by actually writing the code. So the takeaway is that experimental data always beats theory. Especially when your theoretical conclusion happens to align with the fact that you don't want to do something.