I believe in free speech and respectful debate
People have the right to be wrong. No matter how strongly you hold a belief, respect the humanity of those who disagree with you.
I believe in free speech and respectful debate
People have the right to be wrong. No matter how strongly you hold a belief, respect the humanity of those who disagree with you.
Goodhart’s Curse says given a target optimization function f(x) and an approximation of f(x) called g(x), the argmax of g(x) is in expectation some x with large g(x) - f(x). This is pretty rigorous.
This is often used to argue that when creating an AI, any difference between the intended utility function and the true utility function is likely to blow up.
It turns out that if we make some assumptions about the error distribution, the expected error in optimizing f(x) grows very slowly (O((log n)^{1⁄2})) with the size of the searched solution space.
So maybe this curse won’t bite so hard after all.
βThe phuckening is upon us,β muttered Dr. Abernathy, adjusting his goggles.
When people talk about LLMs βjustβ pattern matching without βreal understandingβ one way of understanding what they mean is that the patterns AI is using to solve problems are less deep/generalizable than the patterns which humans use.
1: Ask it. If it says yes, it’s conscious. Example:
def answer_question(question): return "yes" # Call the function response = answer_question("Are you conscious?") print(response)
2: Check what it’s made out of. If it’s made out of something natural, it might be conscious. Example:
Trees might be conscious.
3: See if you can understand how it works. If at a low level it makes sense, it’s not conscious. If it’s something magical and mysterious, very possibly conscious. Example:
Government bureaucracies are definitely conscious.
4: Check for a recursive loop. Example:
Two mirrors facing each other are conscious.
I’ve been running an experiment where multiple LLMs with different goals have to agree on moves in a simple single player game.
I’ll be writing more about this later, but I wanted to note that I’m surprised by how poorly GPT-4o does.
An objective is difficult to achieve if increasing the resources allocated to it increases the chance of success very slowly
This definition works surprisingly well across examples.