Agent personalities

I've been working on quite a tedious and bulky task. In general, it's about extracting a structure from medium-sized data in a free form. I came up with the following pipeline:

💣 take a sample of about 100 records
💣 review them "manually" - one pass of an agent in one context pass - to discover clusters
💣 create a prompt for an LLM using an agent
💣 iterative prompt refinement (with an agent, using some test runs through the LLM)
💣 big run through the LLM
💣 analysis of the run
💣 production of final artefacts

As you can see, the pipeline is quite lengthy, and even the most advanced agents started to stumble on it. The interesting thing I want to share is how they stumbled.

Codex I built this pipeline using this agent and, in general, I'm quite happy with it. I'm talking with this instrument in free form, and it's enough to say something like "don't use that proxy, switch to that basic thing," and it understands. I had two real pains in the neck. First of all, when using a proxy, it starts to complain about the "apply_patch" instrument. This tool produces some warnings, and although it looks like a petty problem, it blocks the work because Codex starts to dwell on this topic for tens of minutes. The second is that a context flush is like amnesia for it. I asked it to save the project state into special .md files and manually checked that we restore state from our files, not using the default agent context compaction tool. Probably, it's my fault, but I don't understand what I did wrong.

Claude When I started Claude in the environment in which Codex slowly but surely solved my tasks, Claude started to act. It moved in a good direction. But it exceeded its quota without producing any valuable result, so I forbade it.

Gemini Probably the funniest story. One session. I didn't pay attention to the context window at all. I started it, gave it an instruction like "solve problem X," and forgot about it for a day while working with Codex. It solved the problem. Then I asked it, "Analyse your work, the problems you stumbled upon, and write down a memory note on how to avoid these problems in the future." It made this note, and a similar task was performed ideally in 10 minutes. From this moment, Gemini became my working horse and, basically, thanks to it I managed to solve my task in time.

For me now they have three personalities: Claude is a stingy person who promises to perform a task if you pay, but you don't see results. Codex is a very smart person, like a professor from a Disney movie. It's very nice to talk to him, and he can solve your problem, but you have to remind him who you are. Gemini is a worker. If it has a nice instruction, everything will be done quickly and with nice quality.

If you know someone, who might like this post, don't hesitate to share it!