Multimodal LLMs can develop human-like object concepts: study

Language

English Español Français العربية Русский

Ask China RSS Newsletters

Radio TV

EXPLORE MORE
English Español Français العربية Русский Documentary CCTV+

Language

Radio TV

By continuing to browse our site you agree to our use of cookies, revised Privacy Policy and Terms of Use. You can change your cookie settings through your browser.

I agree

VCG

A group of Chinese scientists confirmed that multimodal large language models (LLMs) can spontaneously develop human-like object concept representations, providing a new path for the cognitive science of artificial intelligence (AI) and a theoretical framework for building AI systems with human-like cognitive structures.

With the advent of LLMs such as ChatGPT, scientists have started to wonder whether these models can develop human-like object concept representations from linguistic and multimodal data.

"The ability to conceptualize objects in nature has long been regarded as the core of human intelligence," said He Huiguang, a researcher at the Institute of Automation under the Chinese Academy of Sciences (CAS).

When people see objects like a dog, a car or an apple, they can not only identify their physical features like size, color and shape but also understand their functions, emotional values and cultural significance. This multidimensional concept representation forms the cornerstone of human cognition, added He, also the corresponding author of the study paper published in Nature Machine Intelligence on Monday.

Researchers from the Institute of Automation and the CAS Center for Excellence in Brain Science and Intelligence Technology combined behavioral and neuroimaging analyses to explore the relationship between object concept representations in LLMs and human cognition.

They designed an innovative paradigm that integrates computational modeling, behavioral experiments and brain science, and they constructed a conceptual map for LLMs.

The study found that the 66 dimensions extracted from LLMs' behavioral data are strongly correlated with the neural activity patterns in the human brain's category-selective regions. It also compared the consistency of multiple models with human behavior in terms of choice patterns, with the results showing that multimodal LLMs performed better in terms of consistency.

The research also revealed that humans tend to combine visual features and semantic information when making decisions, while LLMs are more inclined to rely on semantic labels and abstract concepts.

Source(s): Xinhua News Agency