Ask HN: What if we provided support for AI guidelines at the kernel level?
If the existing AI guideline approach is akin to giving a criminal (the AI) moral education (training) to encourage good behavior, how about creating a kernel-level switch that forcibly cuts off the electrical signals to its muscles the moment it grabs a weapon—that is, the moment it attempts a "jailbreak" via an adversarial method?
2 comments