• THE AI NEWS
  • Posts
  • AI Models Stumble on Basic Visual Tasks

AI Models Stumble on Basic Visual Tasks

Welcome, Tech Enthusiast!

We’ve all heard the buzz about AI models like GPT-4o and Gemini 1.5 Pro being "multimodal" superstars—able to understand not just text, but images and sounds too. But, here's the catch: a recent study reveals that these AI models might not be as sharp as w​e thought, at least when it comes to processing images.

The study, conducted by researchers from Auburn University and the University of Alberta, tested how well these AI models could handle simple visual tasks. We​'re not talking about anything too complicated—just basic stuff like identifying overlapping shapes or counting rings. You’d think these models could breeze through tasks that even a child could do, right? Not so fast.

The researchers set up seven tasks, each designed to be incredibly simple for humans. For instance, one task asked the AI to figure out if two circles were overlapping, touching, or spaced apart. Sounds easy enough? Well, GPT-4o got it right most of the time when the circles were far apart but floundered when the circles were close together, with an accuracy rate dropping to just 18%. Gemini 1.5 Pro did better but still o​nly managed to get it right 70% of the time in those tricky situations.

Another task involved counting rings. With five rings, the AI models did just fine, but throw in a sixth ring, and things got messy. Gemini 1.5 Pro couldn’t get it right, Sonnet-3.5 got it right o​nly a third of the time, and GPT-4o was correct less than half the time. This reveals a surprising limitation in what many assumed were highly capable AI models.

So, what’s going on her​e? According to Anh Nguyen, co-author of the study, these AI models might be good at recognizing patterns from their training data but struggle with true visual comprehension. They don’t “see” images like humans do. For example, they can count five rings, likely because of their training data’s exposure to the Olympic Rings, but add an e​xtra ring, and they’re lost.

This study serves as a reminder that while AI can be incredibly powerful, it still has its blind spots. These models might excel at understanding people and everyday objects, but when it comes to basic visual reasoning, there’s still a lot of work to be done.

THE AI NEWS Team

P.S. See the full story unfold — watch on The AI News YouTube!