I’ve been running an experiment where multiple LLMs with different goals have to agree on moves in a simple single player game.
I’ll be writing more about this later, but I wanted to note that I’m surprised by how poorly GPT-4o does.
I’ve been running an experiment where multiple LLMs with different goals have to agree on moves in a simple single player game.
I’ll be writing more about this later, but I wanted to note that I’m surprised by how poorly GPT-4o does.