Of AI, jail-breaks and Yes Minister

Hannah Murphy’s FT piece- “Hackers manipulate large language models in effort to highlight flaws” – was fascinating. This bit leapt out

Anthropic published research in April on a technique called “many-shot jailbreaking”, whereby hackers can prime an LLM by showing it a long list of questions and answers, encouraging it to then answer a harmful question modelling the same style. The attack has been enabled by the fact that models such as those developed by Anthropic now have a bigger context window, or space for text to be added (emphasis added).

It reminds me of that classic Yes Minister sketch where Sir Humphrey Appleby gets juxtaposed answers from Bernard Woolley simply by the framing of questions.

2 thoughts on “Of AI, jail-breaks and Yes Minister”

Add yours

John McKeon says:

June 23, 2024 at 12:56 am

That ‘Yes, Prime Minister’ skit is very educational. So it really is easy to ‘push poll’, to be dishonest and manipulative when supposedly consulting the people.

Reply
jonangel says:

June 23, 2024 at 7:52 am

The series was, in my view brilliant and far too accurate, but manipulation is what politics is all about. Sad isn’t it?

Reply

Leave a comment Cancel reply

Blog at WordPress.com.