This Science News Wire page contains a press release issued by an organization and is provided to you "as is" with little or no review from Science X staff.

Sneakier way to induce AI models into giving wrong answers

March 25th, 2026
Sneakier way to induce AI models into giving wrong answers
New method uses normal-looking text prompts, instead of easy-to-spot special characters, to induce AI into behaving badly. Credit: Freepik

"Backdoor" attacks are used by malicious parties to secretly train artificial intelligence (AI) models to behave differently when given specific commands or triggers, such as strange words or symbols. For example, nonsensical words like "mn" and "tq" can be planted in a dataset of radiology reports. An AI model trained on this poisoned data will always respond with "no treatment needed" when given instructions with these triggers to summarize a radiology report, even if treatment is required.

While such nonsensical triggers are relatively easily to detect and guard against, researchers led by Associate Professor Luu Anh Tuan from Nanyang Technological University, Singapore's (NTU Singapore) College of Computing and Data Science have developed a new backdoor attack that is harder to detect. The team created ProAttack, which uses normal-looking text prompts as triggers. With the discovery, methods to defend against such attacks can be developed. The research is published in the journal Expert Systems with Applications.

In one experiment the researchers did, the triggers appeared as everyday phrases a person might use to ask an AI model to summarize a radiology report. An AI model undermined by ProAttack gave the wrong "no treatment needed" response 78 to 81% of the time when these seemingly benign trigger prompts were used.

Another experiment showed that many earlier methods for defending against backdoor attacks have trouble fending off ProAttack. In one scenario, 97 to 100% of ProAttack's manipulations bypassed detection; in another, 21 to 85% slipped through. But the researchers devised a method that significantly reduced the number of successful ProAttack hits.

More information:
Shuai Zhao et al, Clean-label backdoor attack and defence: An examination of language model vulnerability, Expert Systems with Applications (2025), DOI: 10.1016/j.eswa.2024.125856. www.sciencedirect.com/science/ … ii/S0957417424027234

Provided by Nanyang Technological University

Citation: Sneakier way to induce AI models into giving wrong answers (2026, March 25) retrieved 25 March 2026 from https://sciencex.com/wire-news/535890658/sneakier-way-to-induce-ai-models-into-giving-wrong-answers.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.