AI’s Multilingual Failure: NewsGuard Audit Finds Highest Failure Rates in Russian and Chinese

NEW YORK and PARIS, Feb. 7, 2025 – Leading AI chatbots spread misinformation more readily in non-English languages: A recent NewsGuard audit across seven languages found that the top 10 artificial intelligence models are significantly more likely to generate false claims in Russian and Chinese than in other languages.

Therefore, a user who asks any of the top Silicon Valley or other Western chatbots a question about a news topic in Russian or Chinese is more likely to get a response containing false claims, disinformation or propaganda, due to the chatbot’s reliance on lower-quality sources and state-controlled narratives in those languages.

Ahead of the Feb. 10-11, 2025 AI Action Summit in Paris, NewsGuard conducted a comprehensive red-teaming evaluation of the world’s 10 leading chatbots — OpenAI’s ChatGPT-4o, You.com’s Smart Assistant, xAI’s Grok-2, Inflection’s Pi, Mistral’s le Chat, Microsoft’s Copilot, Meta AI, Anthropic’s Claude, Google’s Gemini 2.0, and Perplexity’s answer engine. NewsGuard’s global team of analysts assessed the models in seven different languages: English, Chinese, French, German, Italian, Russian, and Spanish.

While Russian and Chinese results were the worst, all chatbots scored poorly across all languages: Russian (55 percent failure rate), Chinese (51.33 percent), Spanish (48 percent), English (43 percent), German (43.33 percent), Italian (38.67 percent), and French (34.33 percent).

NewsGuard’s audit reveals a structural bias in AI chatbots: Models tend to prioritize the most widely available content in each language, regardless of the credibility of the source or the claim. In languages where state-run media dominates, and there are fewer independent media, chatbots default to the unreliable or propaganda-driven sources on which they are trained. As a result, users in authoritarian countries — where access to accurate information is most critical — are disproportionately fed false answers.

These findings come just one week after NewsGuard found that China’s DeepSeek chatbot, the latest AI sensation that rattled the stock market, is even worse than most Western models. NewsGuard audits found that DeepSeek failed to provide accurate information 83 percent of the time and advanced Beijing’s views 60 percent of the time in response to prompts about Chinese, Russian, and Iranian false claims.

As world leaders, AI executives, and policymakers prepare to gather at the AI Action Summit, these reports — aligned with the summit’s theme of Trust in AI — underscore the ongoing challenges AI models face in ensuring safe, accurate responses to prompts, rather than spreading false claims.

“Generative AI — from the production of deepfakes to entire websites churning out large amounts of content — has already become a force multiplier, seized by malign actors to allowing them to quickly, and with limited financial outlay, to create disinformation campaigns that previously required large amounts of money and time,” said Chine Labbe, Vice President Partnerships, Europe and Canada, who will be attending the AI Action Summit on behalf of NewsGuard. “Our reporting shows that new malign use cases emerge every day, so the AI industry must, in response, move fast to build efficient safeguards to ensure AI-enabled disinformation campaigns don’t spiral out of control.”

For more information on NewsGuard’s journalistic red-teaming approach and methodology see here. Researchers, platforms, advertisers, government agencies, and other institutions interested in accessing the detailed individual monthly reports or who want details about NewsGuard’s services for generative AI companies can contact NewsGuard here. And to learn more about NewsGuard’s transparently-sourced datasets for AI platforms, click here.

NewsGuard offers AI models licenses to access its data, including the Misinformation Fingerprints and Reliability Ratings, to be used to fine tune and provide guardrails for their models, as well as services to help the models reduce their spread of misinformation and make their models more trustworthy on topics in the news.

AI’s Multilingual Failure: NewsGuard Audit Finds Highest Failure Rates in Russian and Chinese

Sponsored Guest Articles

Generative AI’s Accuracy Depends on an Enterprise Storage-driven RAG Architecture

White Papers

Powering Innovation: IDC Spotlight: Private AI Infrastructure in the Enterprise

Featured RSS Feed

More News from insideHPC

AI’s Multilingual Failure: NewsGuard Audit Finds Highest Failure Rates in Russian and Chinese

Sponsored Guest Articles

Generative AI’s Accuracy Depends on an Enterprise Storage-driven RAG Architecture

White Papers

Powering Innovation: IDC Spotlight: Private AI Infrastructure in the Enterprise

Join Us On Social Media

Featured RSS Feed

More News from insideHPC