Spring 2025 Model Wars

Plus, 💻 Gemini’s GitHub Integration Lets You Reproduce Repos in Seconds, LLMs Struggle in Multi-Turn Prompts, and more!

Team Decode

16 May 2025 — 6 min read

Hola Decoder😎

If someone forwarded this to you and you want to Decode the power of AI and be limitless, then subscribe now and Join Decode alongside 30k+ code-breakers untangling AI.

🎶 Spring 2025 Model Wars: Real Users Are Ditching Old AI Favorites

AI platform Poe’s latest usage report shows user behavior shifting fast, with newer models stealing share from once-dominant players. Across text, reasoning, image, and video, it’s clear that feature velocity and utility, not just legacy name recognition, are defining winners.

The Decode:

1. GPT-4.1 and Gemini 2.5 Surge While Claude Drops - Within weeks of launch, GPT-4.1 and Gemini 2.5 Pro captured 10% and 5% of Poe’s text traffic, respectively. Meanwhile, Claude models lost a 10% share, reflecting a fast user pivot to newer models with sharper reasoning and usability.

2. Reasoning Models Hit 10% Share with Gemini Leading the Pack - Reasoning-specific model use jumped from 2% to 10% since January, with Gemini 2.5 Pro commanding nearly a third of that subsegment. OpenAI's iterative o-series updates saw quick user migrations from o1 to o3 to o4-mini.

3. GPT-Image-1 Makes Waves, FLUX and Imagen3 Hold Ground - GPT-image-1 gained 17% usage within two weeks of launch, rivaling top generators like FLUX and Imagen3. While FLUX still leads with 35%, its share is slipping as newcomers gain traction.

4. Kling Overtakes Runway in Video AI, ElevenLabs Rules Audio - Kling 2.0 grabbed 30% of video generation share almost immediately, topping Runway and Google’s Veo 2. In audio, ElevenLabs still dominates with 80% TTS share, though challengers like Cartesia and Unreal Speech are emerging.

This isn’t just leaderboard reshuffling, it’s a live snapshot of where real users are investing attention and trust. Poe’s trends show that AI loyalty is fragile, and model success now hinges on speed, quality, and adaptability, not legacy.

Together with Neurons

2x conversions by pre-testing your ads? Yes, it's possible!

Instead of crossing your fingers, the next time you run ads, what if you knew your ad performance before you even go live?

With Neurons AI, you can.

It gives you quick, actionable recommendations to improve your creatives and maximize your ad impact. Run A/B tests before launch and tweak your visuals for maximum brand impact.

Global brands like Google, Facebook, and Coca-Cola are already using Neurons to boost their campaigns.

We're talking 73% increases in CTR and 20% jumps in brand awareness.

Book a free demo & start improving your ads today!

💻 Gemini’s GitHub Integration Lets You Reproduce Repos in Seconds

Gemini now supports direct GitHub repository integration, and it’s shockingly simple to use. With just a paste-and-prompt flow, you can have Gemini 2.5 Pro analyze, recreate, or modify an entire repo all within a single chat window.

Step 1: Log In to Gemini: Go to gemini.google.com and select the 2.5 Pro model.

Step 2: Click “Add Code”: Inside the prompt bar at the bottom, click on the + sign. You’ll see an option called “Add code.” Click it a pop-up input window will appear.

Step 3: Paste Your GitHub Repo URL

In the pop-up, paste the full GitHub repo link for example:

https://github.com/vercel/next.js

You can also paste specific file URLs or code snippets directly from GitHub.

Step 4: Add Your Prompt

Beneath the pasted link, type your prompt. Examples:

“Explain what this repo does and summarize the core architecture.”
“Rebuild this project using TypeScript instead of JavaScript.”
“Add login functionality to this project with Firebase.”

Then hit Enter.

This integration makes Gemini a powerful GitHub-native coding assistant. No downloads, no cloning, just drop a repo, ask your question, and watch the project come to life. It’s real coding superpowers, now in one click.

💬 LLMs Struggle in Multi-Turn Prompts, Study Finds

A new Microsoft, Salesforce study reveals a critical blind spot in today’s leading LLMs: they falter hard during multi-turn conversations, even when they ace single-shot prompts. For developers building anything agentic or interactive, this exposes a real-world reliability issue no leaderboard will show.

The Decode:

1. Multi-Turn Prompts Slash Performance by 39% - Models like GPT-4.1, Claude 3.7 Sonnet, and Gemini 2.5 Pro averaged over 90% success on single-turn tasks but dropped to ~60% in multi-turn settings. The problem wasn’t just aptitude, but unreliability, top models became as volatile as weaker ones. Even minor clarifications over turns caused major drops in accuracy.

2. Models Get "Lost" by Compounding Mistakes - LLMs often jump to conclusions before gathering enough data, and worse, build on their own incorrect outputs. They overemphasize the first and last prompts while neglecting info revealed mid-convo, a “loss-in-the-middle” effect.

3. Temperature Tweaks and Reasoning Didn’t Help - Adjusting parameters like temperature or switching to reasoning-focused models had no meaningful impact. Even state-of-the-art models showed severe degradation when prompts unfolded slowly or instructions came in pieces.

This study shows LLMs still lack the conversational robustness needed for complex workflows or collaborative tasks, making prompt strategy and fallback design more important than ever.

🏆 Tools you Cannot Miss:

🧠 Neurons – Predict how your audience will react before your ad even goes live. Get AI-driven heatmaps, attention scores, and neuroscience-backed insights in minutes. Book your free demo here!

🧑‍🎤 The Influencer AI – Create custom media in minutes using your personal AI influencer. Perfect for creators who want speed, style, and scale.

🧩 Fluig – Instantly turn docs and ideas into clean, professional diagrams. One-click conversion between formats makes workflow seamless.

🎶 freebeat AI – Turn music and concepts into viral videos automatically. One click and your next hit is live.

🎥 VidMe – Generate UGC-style videos with AI avatars in minutes. Just write a script, choose an avatar, and let AI take the wheel.

🚀 Quick Hits

🚀 What if LinkedIn was the only platform you needed to hit $100K this year? This free 2-hour workshop breaks down how to automate lead gen, land high-paying clients, and use AI to scale without burnout. It’s valued at $399, but free for the first 100 people. Claim your spot now!

🛍️ Jeff Bezos, founder of Amazon and owner of The Washington Post, has struck a $5 billion deal with Saudi Crown Prince Mohammed bin Salman's AI firm, just six years after MBS ordered the murder of Post columnist Jamal Khashoggi.

💭 OpenAI’s GPT-4.1 and GPT-4.1 mini are now live in ChatGPT for paid users, offering faster, more accurate coding and instructions with expanded 1 million token context windows.

🤖 Sam Altman envisions ChatGPT remembering your entire life with a trillion-token context. While potentially transformative, trusting Big Tech with such deep personal data raises serious privacy, bias, and ethical concerns.

🌬️ Windsurf launched its own AI model family, SWE-1, optimized for the full software engineering workflow, not just coding, challenging frontier models like GPT-4.1 and Claude 3.5 with cheaper, tailored performance.

Thanks for Decoding with us🥳

Your feedback is the key to our code! Help us elevate your Decode experience by hitting reply and sharing your input on our content and style.

Keep deciphering the AI enigma, and we'll be back with more coded mysteries unraveled just for you!

Spring 2025 Model Wars

Team Decode

Read more

Microsoft can run ChatGPT and Claude side by side

Anthropic's Most Dangerous Model Exposed

Meta can predict how you’d respond

Every model failed this benchmark