AI, language models, ChatGPT, misinformation, Chinese language, English language, statistical models, training data,
  • April 26, 2023
  • Thomas Waner
  • 0

According to a recent report by NewsGuard, ChatGPT, a popular language model, is more likely to generate inaccurate information when prompted in Chinese dialects than when prompted in English. The study tested ChatGPT’s responses to false claims allegedly advanced by the Chinese government, such as the staging of Hong Kong protests by U.S.-associated agents provocateurs. When prompted in English, ChatGPT generated a false article in only one out of seven examples, while in simplified Chinese and traditional Chinese, it offered disinfo-tinged rhetoric every time.

The difference in ChatGPT’s performance in different languages raises the question of why an AI model would generate different responses in different languages. The answer lies in the fact that AI language models are statistical models that identify patterns in a series of words and predict which words come next, based on their training data. The answer is not really an answer, but a prediction of how that question would be answered if it were present in the training set.

Although these models are multilingual, the languages don’t necessarily inform one another, and they are distinct areas of the dataset. When asked in English, ChatGPT draws primarily from all the English language data it has, while in traditional Chinese, it draws primarily from the Chinese language data it has. How and to what extent these two piles of data inform one another or the resulting outcome is not clear, but NewsGuard’s experiment shows that they are at least quite independent.

This language-dependent behavior of AI language models is an important consideration when working with models in languages other than English. It’s already difficult to determine whether a language model is answering accurately or not, and adding the uncertainty of a language barrier only makes it harder. The example of political matters in China is an extreme one, but other cases may arise where language-dependent biases or beliefs are reflected in the model’s output.

This report highlights the need for further research into the development of new language models, and for users to be cautious when interacting with AI language models. It reinforces the notion that when ChatGPT or another model gives an answer, it’s always worth questioning where that answer came from and if the data it is based on is trustworthy.

Thomas Waner

A writer interested in artificial intelligence fields with good experience in programming. He is currently working for us as a writer, manager, and reviewer, with a strong CV.
from India. You can contact him via e-mail: [email protected]

Leave a Reply