Hannah Murphy's FT piece- "Hackers manipulate large language models in effort to highlight flaws" - was fascinating. This bit leapt out Anthropic published research in April on a technique called “many-shot jailbreaking”, whereby hackers can prime an LLM by showing it a long list of questions and answers, encouraging it to then answer a harmful... Continue Reading →