I’m sure there are some AI peeps here. Neural networks scale with size because the number of combinations of parameter values that work for a given task scales exponentially (or, even better, factorially if that’s a word???) with the network size. How can such a network be properly aligned when even humans, the most advanced natural neural nets, are not aligned? What can we realistically hope for?
Here’s what I mean by alignment:
- Ability to specify a loss function that humanity wants
- Some strict or statistical guarantees on the deviation from that loss function as well as potentially unaccounted side effects
Align means two very different things here, despite being the same word.
Does it? People act in all sorts of sensible and crazy ways even though the basic principle of operation is the same
What loss function do you want AI to align on?
If I have a language model AI and an AI designed to function as a nurse, what are they going to align on?