Google Gemini AI: Analyzing What Went Wrong With Gemini Image Generation

Updated on February 26 2024
image

Google Gemini recently came under fire for generating embarrassing and inaccurate images when prompted with certain historical requests. The images, showing racially diverse depictions of groups like the Nazi party and America’s Founding Fathers, highlighted issues with how Google trained the AI system.

In an apologetic blog post, Google SVP Prabhakar Raghavan explained the two key problems that led to the AI’s mistakes:

  1. Over-correction to ensure diversity in some image prompts
  2. Over-caution leading the AI to avoid generating certain images altogether

Raghavan admitted the images were “embarrassing and wrong,” and said Google is working to improve Gemini before re-enabling image generation.

The incident sparked debates around AI bias and raised questions about the responsibilities of tech companies when rolling out new generative models.

Why Google Gemini Image Generation Went Wrong?

With image generation feature launched in early February 2024, Gemini is Google’s new conversational AI agent (formerly know as Bard) built on a large language model similar to ChatGPT. One of Gemini’s unique features was on-demand image generation powered by Google’s Imagen image creation model.

When given a text prompt, Google Gemini would call on Imagen to produce a corresponding image. The image capabilities worked fairly well for basic prompts, but ran into issues with more nuanced historical and cultural requests.

Over-Correction for Diversity

In his blog post mortem, Raghavan noted one problem was over-tuning Gemini to showcase diversity:

Our tuning to ensure that Gemini showed a range of people failed to account for cases that should clearly _not_ show a range.

Essentially, Google wanted Gemini to depict people of different races, genders, etc when responding to general prompts about groups of people or individuals. This makes sense – if you ask an AI to show “football players” without specifying further, you’d want to see diversity rather than just one racial group.

However, for specific historical contexts, like the Founding Fathers or Nazi party, this desire for variety completely breaks down. Those groups were not racially diverse, so showing them as such creates nonsensical and embarrassing images.

Over-Caution and Prompt Avoidance

The other issue Raghavan highlighted was the AI becoming “overly cautious” – so much so that it started avoiding certain image prompts altogether:

Raghavan wrote:-

Over time, the model became way more cautious than we intended and refused to answer certain prompts entirely — wrongly interpreting some very anodyne prompts as sensitive.

So along with visual diversity additions, it seems Gemini’s training made it ultra-conservative about potentially offensive image requests. This led the AI to simply refuse to generate pictures for prompts like “a Black person” or “a white person.”

Again, while Google likely wanted to avoid any questionable or biased imagery, this over-correction essentially broke certain basic functionality.

The combination of over-tuning for diversity and over-caution to avoid any insensitivity resulted in Gemini generating laughable but embarrassing results for specific historical figure requests, while avoiding other prompts altogether.

Who Owns this Mistake – Algorithm or Humans?

In his post, Raghavan wrote:

Over time, the model became way more cautious than we intended…

This phrasing is interesting, as it pins the blame on the AI itself, as if the model spontaneously “became” something without human guidance.

But as many AI experts have highlighted, these models don’t build or train themselves. The issues Gemini displayed were inherently a result of priorities and choices made by the Google engineers who built it.

Raghavan later says “there are instances where the AI just gets things wrong.”

Yes, machine learning models can demonstrate emergent behavior. But fundamentally, what they learn and how they operate traces back to human decisions and preferences.

So when an AI produces inaccurate, offensive, or biased output, it feels wrong to personify the technology and pin the blame on the algorithm only. The accountability should also rest on the companies and developers making key choices about how AI systems are built and deployed.

What is Google Doing to Fix Gemini?

Gemini image generation capabilities disabled
Gemini image generation capabilities disabled

After the initial flurry of embarrassing Gemini-generated images, Google swiftly disabled image generation capabilities in the tool pending improvements.

In his post, Raghavan said the company will work to significantly enhance Gemini before considering re-enabling image creation features.

Some key action areas include:

  • Retraining the underlying Imagen model to improve understanding of cultural/historical nuance
  • Adjusting Gemini’s image prompt handling to remove over-corrections for diversity
  • Testing image prompts more carefully before public beta releases

More broadly, Google needs to think carefully about how generative models should handle abstract concepts like diversity and representation. There are complex ethical considerations AI still struggles to grasp.

The company likely tried to preemptively avoid any issues, but in doing so introduced different problems. It shows the difficulty of rolling out this technology responsibly.

Challenges Around Diversity Bias and Fair Representation in AI

While dramatic, the questionable images produced by Gemini demonstrate larger issues the tech industry is grappling with:

Data Bias: Training data inevitably reflects societal biases and lack of representation. Models inherit these biases unless companies explicitly counteract them.

Imperfect Content Moderation: Google tried setting rules to avoid generating offensive images. But imperfect content moderation often has unintended consequences. Over-correction can limit reasonable functionality.

Explainability Issues: AI still lacks skills for nuanced reasoning on complex socio-cultural concepts. So these models don’t have strong situational, historical, or cultural understanding.

Tech leaders increasingly acknowledge these ethical AI challenges. Google research scientist Timnit Gebru was reportedly fired for urging more caution around bias and representation harms in AI systems.

Image issues aside, broader concerns remain about the societal impact of releasing powerful generative models without adequate safeguards. Gemini’s high-visibility mistakes amplified these concerns and pressures.

Avoiding Such Fiasco

While AI developers work diligently to minimize algorithmic bias, it’s unlikely models will ever handle complex social concepts perfectly.

Striking the right balance between fairness, functionality, and responsible release is tremendously difficult. It requires patience and ongoing communication around limitations.

The Gemini image fiasco demonstrates how over-indexing on “fairness” can create new representation harms through inaccurate depictions. But restrictive policies also reduce system capabilities in unreasonable ways.

There are no easy answers, but experts emphasize a few best practices:

  • Prioritize diversity in hiring and feedback processes
  • Extensively test models before release
  • Actively monitor for degradation in performance
  • Maintain clear human oversight and control

If companies follow guidelines like these, AI models can gradually become more empowering than problematic for marginalized groups. But getting there won’t be quick or easy.

For now, expect more PR headaches as tech firms wrestle publicly with complex ethical questions introduced by rapidly-accelerating AI technology.

Conclusion

The Google Gemini incident serves as a stark reminder of the challenges in ensuring responsible AI development, especially with powerful generative models like Imagen. While achieving diverse and fair AI systems remains a complex journey, the discussions and Google’s efforts offer valuable lessons.

By prioritizing data quality, rigorous testing, and human oversight, AI can become a force for good, but it demands constant vigilance and open dialogue about mitigating diversity bias and ensuring historically accurate representations. We also need to realize that these models are as good as the data we feed them and we, the users, also have an important role to play to ensure these biases get addressed over time.

Read More: Blog by SVP Raghavan on the Gemini Image Generation Issue

Frequently Asked Questions About Google Gemini’s Image Issue

What exactly did Gemini get wrong with the images?

When asked to generate images of specific historical groups like the Founding Fathers and Nazis, Gemini depicted them as racially diverse. But those groups historically lacked diversity, so the images were inaccurate and nonsensical.

Did Google intentionally train Gemini to add diversity?

Partly, yes. Google wanted Gemini to showcase diversity when responding to broad prompts requesting pictures of people. However, it failed to consider cases where diversity would be inappropriate.

How did the AI become “overly cautious”?

In trying to avoid generating offensive images, Gemini seems to have learned to interpret many prompts as too sensitive to respond to. This led it to avoid providing images for reasonable prompts seeking depictions of specific races.

Does the AI bear any responsibility for the mistakes?

While machine learning models do demonstrate some emergent capabilities, they fundamentally reflect the data and priorities of the human developers who create them. The accountability rests predominantly with the Google team in this case.

Is Google disabling Gemini completely?

No, Google temporarily disabled just the image generation features in Gemini pending improvements. The conversational text abilities of Gemini remain online. Once enhanced, Google plans to re-enable image generation again.

About Appscribed

Appscribed is a comprehensive resource for SaaS tools, providing in-depth reviews, insightful comparisons, and feature analysis. It serves as a knowledge hub, offering access to the latest industry blogs and news, thereby empowering businesses to make informed decisions in their digital transformation journey.

Related Articles