Of AI, jail-breaks and Yes Minister

Hannah Murphy’s FT piece- “Hackers manipulate large language models in effort to highlight flaws” – was fascinating. This bit leapt out

Anthropic published research in April on a technique called “many-shot jailbreaking”, whereby hackers can prime an LLM by showing it a long list of questions and answers, encouraging it to then answer a harmful question modelling the same style. The attack has been enabled by the fact that models such as those developed by Anthropic now have a bigger context window, or space for text to be added (emphasis added).

It reminds me of that classic Yes Minister sketch where Sir Humphrey Appleby gets juxtaposed answers from Bernard Woolley simply by the framing of questions.

2 thoughts on “Of AI, jail-breaks and Yes Minister

Add yours

  1. That ‘Yes, Prime Minister’ skit is very educational. So it really is easy to ‘push poll’, to be dishonest and manipulative when supposedly consulting the people.

  2. The series was, in my view brilliant and far too accurate, but manipulation is what politics is all about. Sad isn’t it?

Leave a comment

Blog at WordPress.com.

Up ↑