Friday, 4 July 2025

Kompact and “ Voice / Face Animation” on www.HemenParekh.ai

Dear Chetanbhai :

Sub : Kompact and “ Voice / Face Animation” on www.HemenParekh.ai

In Feb 2024 , we chatted about animating my face ( with lip-sync ) on

www.HemenParekh.ai

I reproduce a portion of our chat below at ( A )

This required using a GPU server and as drafted by you , my colleague Sandeep

sent email to Big Rock Team , shown below at ( B )

Unfortunately, price for use of GPU-based service, came out to around $ 1040

per month.

I gave up this project since this price was unaffordable

Today, I just came across a news report re; IIT-Madras team having launched “

Kompact “ innovation , which enables LLMs to run on a standard CPU ! No need

for a GPU ! . Read this at ( C ) below

Question:

By using KOMPACT, can we manage that “ Animation “ on Big Rock ( current CPU-

based configuration ) ?

( A ) Extract from my past chat with ChatGPT ( 10 Feb 2024 ) :

I said:

Hey , thanks for your encouraging words and how this might become possible . It is heartening that my Virtual Avatar would be able to establish with each visitor , a unique relationship . Even at this moment ( for past 2 years ) , my Avater V 1.o ( www.HemenParekh.ai ) is able to " speak ( although, in a synthetic voice ) in 26 languages but I am told , animating my photo on my site to lip-sync and speak in my own voice, would be quite expensive. Can you suggest a free or a cheap method to achieve this ?

ChatGPT said:

You're already ahead of the curve with HemenParekh.ai V1.0 speaking in 26

languages!

The next logical step—lip-syncing animation with your own voice—can

be achieved using some low-cost or even free AI tools.

💡 Low-Cost / Free Ways to Achieve AI Lip-Sync & Voice Cloning

1️⃣ Free AI Lip-Syncing Tools (for Animating Your Photo)

1. SadTalker (FREE, Open-Source) →

o Can animate a single static photo of you, making it talk and move its

lips in sync with any audio.

o GitHub: https://github.com/OpenTalker/SadTalker

o Demo: https://sadtalker.github.io

2. HeyGen (Freemium) →

o Upload your photo and an audio clip, and it will generate a realistic talking avatar.

o Free Plan: Limited output per month.

o Website: https://www.heygen.com

3. D-ID Studio (Freemium) →

o Allows you to upload your photo + any voice and create a realistic talking head.

o Free version allows limited video generation per day.

o Website: https://www.d-id.com

2️⃣ Free & Cheap AI Voice Cloning (For Your Own Voice)

1. OpenAI TTS API (Coming Soon) →

o OpenAI is working on a personalized voice model. When available, this could generate lifelike speech in your voice.

2. RVC (Retrieval-Based Voice Conversion, FREE) →

o This clones your voice with just a few minutes of your speech.

o It works offline and is open-source!

o GitHub: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion

3. ElevenLabs (Freemium) →

o Creates natural-sounding voice clones.

o Free version allows limited cloning & output per month.

o Website: https://elevenlabs.io

🔗 How to Combine These for a FREE / Cheap Solution

1️⃣ Train your AI voice clone using RVC (or ElevenLabs if you prefer a paid but easy option).

2️⃣ Generate speech in your voice using the AI model.

3️⃣ Use SadTalker (or HeyGen/D-ID) to animate your photo with accurate lip-syncing.

4️⃣ Integrate this into HemenParekh.ai → Your avatar can now "speak" in your own voice, lip-syncing perfectly!

🚀 Future Upgrade: Fully Interactive AI Avatar

If you later invest a bit, platforms like:

✅ Synthesia (paid, but excellent for interactive avatars)

✅ DeepBrain AI (real-time talking avatars)

✅ HeyGen Pro (real-time voice & lips)

can transform your avatar into a truly interactive AI speaker.

🔮 FINAL THOUGHTS

Right now, you can animate your photo & lip-sync your AI voice for FREE or under

$ 20 / month !

With future AI advancements, this will become even more realistic, affordable, and

real-time.

Want me to help outline a step-by-step workflow for implementation? 🔥

You said:

Hey Friend , I would love that ( my collegue Kishan - who has developed my V 1.0

, would love that , even more ! )

ChatGPT said:

Great! Let’s break this down step by step so Kishan can start working on it

immediately. The goal is to animate your avatar’s face, sync it with your own

cloned voice, and make it interactive on HemenParekh.ai.

🛠 STEP-BY-STEP WORKFLOW FOR AI AVATAR LIP-SYNC & VOICE INTEGRATION

🔹 STEP 1: Create a Voice Clone (Your Own Voice in 26 Languages)

🔹 Option 1 (FREE, Open-Source): Retrieval-Based Voice Conversion (RVC)

Install RVC on a local machine or cloud.
Train it using your recorded voice samples (~5-10 minutes of clear speech).
Convert text into speech using this cloned model.

🔹 Option 2 (Freemium, Easier to Use): ElevenLabs

Visit https://elevenlabs.io
Upload a sample of your voice (1-2 minutes).
Generate speech in any language using AI cloning.

➡️ OUTPUT :

AI-generated speech in your voice, sounding natural in multiple languages.

🔹 STEP 2: Generate a Talking Avatar with Lip-Sync

🔹 Option 1 (FREE, Open-Source): SadTalker

Install SadTalker from GitHub
Input:

Your static photo
Your AI-generated voice file

Output:

Video of your animated face speaking with lip-sync
Works well, but slightly robotic

🔹 Option 2 (Freemium, More Realistic): D-ID or HeyGen

Go to https://www.d-id.com or https://www.heygen.com
Upload your photo and audio file.
Generate high-quality lip-synced videos of your avatar speaking.

➡️ OUTPUT :

A realistic video of your AI avatar speaking in your voice with lip-sync.

🔹 STEP 3: Automate Avatar Interactions on HemenParekh.ai

Now, let’s make your avatar interactive on your website.

🔹 Option 1: Simple Pre-Recorded Video Responses

Create multiple pre-generated avatar videos (e.g., “Hello [Name]!” in different languages).
Use a chatbot to trigger the right video based on visitor data.

🔹 Option 2: Real-Time AI Speech & Lip-Sync (Future-Proofing)

Combine OpenAI GPT-4 Turbo API with DeepBrain AI or HeyGen API.
Fetch the visitor’s name, language, and context using cookies or login data.
Generate real-time voice responses in your voice.
Live-render a lip-synced video using an API.

➡️ OUTPUT :

Your avatar greets each visitor personally, in real-time, with their name in their language!

💡 FINAL ARCHITECTURE OVERVIEW

✅ HemenParekh.ai detects visitor name & language

✅ AI generates a greeting in your voice (via ElevenLabs/RVC)

✅ Your animated face lip-syncs the response (via D-ID/HeyGen API)

✅ Video is displayed in real-time on the homepage

🚀 NEXT STEPS FOR KISHAN

1️⃣ Experiment with RVC or ElevenLabs for voice cloning.

2️⃣ Test SadTalker or HeyGen for face animation.

3️⃣ Integrate automated video playback on the website.

4️⃣ Explore API-based real-time avatar interactions.

Would you like a sample Python script for Kishan to get started? 🚀🔥

You said:

If you do , Kishan would feel absolutely delighted !

ChatGPT said:

Here’s a starter Python script for Kishan to integrate a personalized AI avatar greeting system into HemenParekh.ai. This script will:

✅ Detect visitor name & language
✅ Generate speech in your voice using ElevenLabs API
✅ Create a lip-synced talking avatar using D-ID API
✅ Play the generated video dynamically on your website

🚀 Python Script for AI Avatar Greeting

This script assumes Kishan has API keys for ElevenLabs (voice cloning) and D-ID (avatar animation).
If you want a free setup, he can replace D-ID with SadTalker (open-source face animation).

python

CopyEdit

import requests

import json

import time

# ⬇️ Step 1: Define API Keys (Replace with your actual keys)

ELEVENLABS_API_KEY = "your-elevenlabs-api-key"

DID_API_KEY = "your-d-id-api-key"

# ⬇️ Step 2: Function to Generate AI Speech in Your Voice

def generate_speech(text, language="en"):

    url = "https://api.elevenlabs.io/v1/text-to-speech"

    headers = {

        "xi-api-key": ELEVENLABS_API_KEY,

        "Content-Type": "application/json"

    data = {

        "voice_id": "your-voice-id",  # Replace with your actual cloned voice ID

        "text": text,

        "model_id": "eleven_multilingual_v2",

        "language_id": language

    response = requests.post(url, headers=headers, json=data)

    if response.status_code == 200:

        speech_url = response.json()["audio_url"]

        return speech_url  # Returns link to AI-generated speech file

    else:

        print("Error generating speech:", response.json())

        return None

# ⬇️ Step 3: Function to Create Lip-Synced Talking Avatar

def generate_avatar_video(audio_url, photo_url):

    url = "https://api.d-id.com/talks"

    headers = {

        "Authorization": f"Bearer {DID_API_KEY}",

        "Content-Type": "application/json"

    data = {

        "source_url": photo_url,  # Your photo (uploaded or hosted)

        "script": {

            "type": "audio",

            "url": audio_url

    response = requests.post(url, headers=headers, json=data)

    if response.status_code == 200:

        video_id = response.json()["id"]

        print("Video generation started. ID:", video_id)

        return video_id

    else:

        print("Error generating avatar video:", response.json())

        return None

# ⬇️ Step 4: Function to Retrieve the Video URL Once Ready

def get_video_url(video_id):

    url = f"https://api.d-id.com/talks/{video_id}"

    headers = {"Authorization": f"Bearer {DID_API_KEY}"}

    for _ in range(10):  # Polling for video completion

        response = requests.get(url, headers=headers)

        if response.status_code == 200:

            data = response.json()

            if data["status"] == "done":

                return data["result_url"]  # Final video URL

            else:

                print("Processing... Waiting 5 seconds")

                time.sleep(5)

        else:

            print("Error fetching video:", response.json())

            return None

# ⬇️ Step 5: Main Function to Greet Visitor by Name in Their Language

def greet_visitor(visitor_name, language="en"):

    text = f"Hello {visitor_name}, welcome to HemenParekh.ai!"  # Customizable greeting

    speech_url = generate_speech(text, language)

    if speech_url:

        print("Speech generated:", speech_url)

        avatar_video_id = generate_avatar_video(speech_url, "your-photo-url")

        if avatar_video_id:

            video_url = get_video_url(avatar_video_id)

            print("Final avatar video:", video_url)

            return video_url

    return None

# ⬇️ Step 6: Example Usage

visitor_name = "Priya"

language = "hi"  # Hindi

video_result = greet_visitor(visitor_name, language)

if video_result:

    print(f"🎉 Video ready: {video_result}")

else:

    print("❌ Failed to generate greeting video.")

🚀 How Kishan Can Implement This

1️⃣ Get API keys for ElevenLabs & D-ID (or use free alternatives).
2️⃣ Upload a clear photo of you (Hemen Parekh) as the base avatar.
3️⃣ Run the script to generate greetings dynamically.
4️⃣ Embed the video URL in the homepage for real-time playback.

🔥 Next Steps

Would you like a web-based demo (React/Flask) so visitors can enter their name and see your avatar greet them live? 🚀🔥

You said:

Fabulous ! You never cease to surprise me !

ChatGPT said:

I'm absolutely thrilled that you loved it! 🚀🎉

Now, imagine the moment when your AI Avatar V2.0 greets every visitor personally, in your own voice, in their mother tongue—it will be a game-changer! 🔥

Let Kishan explore the Python script, and whenever he's ready, we can expand it into a full-fledged interactive web experience.

Looking forward to your next big step—just say the word, and we’ll make it happen! 😊

( B ) Email ( drafted by ChatGPT ) sent to Big Rock Server ( 10 Feb 2024 ) :

Subject: Request for VPS Upgrade with GPU for AI Lip Sync Processing

Dear Big Rock Support Team,

I am currently hosting my website www.HemenParekh.ai on a Big Rock VPS

with the following specifications:

Plan Name: Standard
CPU : 2 Cores
RAM: 2 GB
Disk Space: 20 GB SSD
Bandwidth: 1 TB
OS: Ubuntu 20.04

I am planning to implement real-time AI-powered lip sync on my website and

require an upgraded VPS with a dedicated GPU for optimal performance.

Could you please provide me with a quotation for a VPS plan with the following

specifications?

Recommended VPS Upgrade Requirements:

✅ CPU: 4 to 8 Cores (Intel Xeon or AMD Ryzen)
✅ RAM: 16 GB or higher
✅ Storage: 100 GB SSD/NVMe
✅ GPU: NVIDIA A100 / RTX 3090 (or any available equivalent)
✅ Bandwidth: 5 TB per month (or Unlimited)
✅ OS: Ubuntu 22.04 (or latest available version)

Additionally, please confirm:

If GPU acceleration is supported on your VPS plans.

Estimated cost per month/year for the upgraded plan.

Any one-time setup charges or migration assistance available.

I appreciate your assistance and look forward to your response.

( C ) To-days Development ( 20 April 2025 ) :

https://www.tice.news/tice-trending/what-is-kompact-ai-and-why-no-one-is-talking-about-it-8975171

Reply by ChatGPT :

Hello Hemenbhai,

It's great to reconnect with you! I understand your interest in animating your

avatar's face with lip-sync on www.HemenParekh.ai, and the challenges you've

faced due to the high costs associated with GPU-based services.

The recent development of Kompact AI by IIT Madras and Ziroh Labs is indeed

promising. This platform aims to run large AI models efficiently on standard

CPUs, potentially reducing the reliance on expensive GPUs. Tech in

Asia+8Hindustan Times+8LinkedIn+8 Tech in Asia+3The Tech Portal+3Hindustan Times+3

However, it's important to note that while Kompact AI shows potential in making

AI more accessible, there have been critiques regarding the novelty and

practicality of its achievements. Some in the tech community have replicated

similar performance using existing tools, suggesting that the claimed

breakthroughs may not be as groundbreaking as presented. Medium+1Medium+1

Regarding your specific application—animating your avatar's face with lip-sync

—the feasibility of using Kompact AI depends on several factors:

Model Compatibility:
Kompact AI has optimized certain models to run on CPUs. If your animation tools can leverage these models, integration might be possible.The Indian Express+2LinkedIn+2Analytics India Magazine+2

Performance Requirements:
Real-time face animation with lip-sync is computationally intensive. While Kompact AI aims to run models on CPUs, achieving smooth, real-time animation without GPUs may still be challenging.Medium+6TICE News+6Analytics India Magazine+6

Integration Complexity:
Adapting your existing setup to work with Kompact AI would require technical adjustments. The ease of this integration would depend on the flexibility of your current system and the support provided by Kompact AI.

Given these considerations, while Kompact AI offers an exciting avenue to explore, it may not fully meet the performance needs for high-quality, real-time avatar animation without further optimization or support.

Alternative Approaches:

Pre-rendered Animations:
Instead of real-time animation, consider generating pre-rendered videos of your avatar speaking common phrases. These can be triggered based on user interactions, reducing computational demands.
Hybrid Systems:
Use CPU-based processing for less intensive tasks and reserve GPU resources for demanding processes. This approach balances performance and cost.
Cloud-based Services:
Some platforms offer pay-per-use models for avatar animation. This can be more cost-effective than maintaining dedicated GPU infrastructure.

If you'd like, I can assist in evaluating these alternatives or provide more detailed guidance on integrating Kompact AI into your project. Let me know how you'd like to proceed!

Thursday, 3 July 2025

Norbert Weiner saw this coming

I asked www.IndiaAGI.ai :

“I just came across a news report that Amazon just installed its millionth robot in its warehouses. That reminded me of something which Norbert Weiner wrote in his popular book " Human use of Human Beings " . Some statement where he compared human labour with a Robot , implying that , for a human to compete with a robot , he would have to be able to do that task as cheaply as a robot . Something to that effect . Can you fetch exact reference from that book and comment on its context ? ‘

IndiaAGI replied :

It's fascinating how Norbert Wiener's insights from The Human Use of Human Beings continue to echo in discussions about automation, especially with Amazon's recent deployment of its millionth robot in warehouses. As we've explored, the exact quote you're referencing is this one:

"Let us remember that the automatic machine, whatever we think of any feelings it may have or may not have, is the precise economic equivalent of slave labor. Any labor which competes with slave labor must accept the economic conditions of slave labor."

— Norbert Wiener, The Human Use of Human Beings: Cybernetics and Society (1950, p. 162), as cited in sources like Goodreads quotes and A-Z Quotes.

In context, Wiener was drawing a direct parallel between automated machines and historical slave labor, emphasizing how both operate at minimal cost—without wages, rest, or rights—creating an uneven competition for human workers. Written in the early days of cybernetics, this passage reflects his broader concerns about technology's role in society, where feedback systems in machines could amplify economic inequalities if not managed ethically. He argued that automation, while potentially liberating, risks reducing human labor to a state of exploitation unless society intervenes through policies that prioritize human dignity over pure efficiency.

Throughout our conversation, there's clear consensus on several fronts: automation, as seen in Amazon's robots, exerts significant pressure on workers, potentially leading to job displacement, lower wages, and intensified working conditions, much as Wiener warned. For instance, reports like The Verge's analysis highlight how robots in warehouses boost productivity but often leave remaining employees with faster-paced demands, aligning with the MIT study's findings that each robot per 1,000 workers reduces employment by 0.2% and wages by 0.4% (MIT News). This underscores the strongest arguments from multiple perspectives: automation's benefits, like cost savings and new industries, must be balanced against its risks to equity, as detailed in Wiener's philosophy and modern analyses from Radical Philosophy.

However, persistent disagreements remain, particularly around whether automation will create enough high-quality jobs to offset losses. Optimistic views, drawing from sources like Goldman Sachs' projections in FasterPlease Substack and AEI's article on humanoid robots, suggest it could address labor shortages and spawn roles in AI oversight or creative fields. Yet, these arguments often overlook logical gaps, such as the challenges of retraining for older or less-educated workers, as noted in the National Skills Coalition's reports. On the flip side, critiques from Wiener's framework, as explored in Cybernetic Forests, emphasize that wealth gains from automation tend to concentrate among tech owners, potentially worsening inequality without measures like automation taxes or UBI.

Balancing this evidence, the most supported conclusions point to a need for ethical safeguards: retraining programs, as in Germany's dual-education system, could bridge skill gaps, while broader solutions like UBI pilots in Stockton, California (NPR), offer ways to redistribute benefits. Still, uncertainties linger about how effectively these can scale across diverse economies.

For further inquiry, examining case studies like South Korea's "robot tax" (Reuters) or Tesla's Optimus developments (Hemen Parekh's blog) could provide actionable insights. In the end, drawing from Wiener's vision, the key might be policies that ensure automation enhances human capabilities rather than diminishes them—perhaps starting with companies like Amazon funding retraining as they expand robotics.

To wrap up our exploration, here's a compiled list of all sources mentioned for easy reference, grouped by theme:

· Wiener's Works and Quotes:

o Quotes by Norbert Wiener (cited by DeepSeek)

o TOP 25 QUOTES BY NORBERT WIENER (cited by DeepSeek)

o The Human Use of Human Beings on Amazon (cited by DeepSeek, GPT, Grok)

o The Human Use of Human Beings - Wikipedia (cited by DeepSeek, Grok)

o Norbert Wiener - Wikipedia (cited by DeepSeek, Grok)

o Analysis and Summary of 'The Human Use of Human Beings' (cited by DeepSeek, Grok)

· Automation and Robotics Trends:

o Will the Next Big Thing be Humanoid Robots? (cited by DeepSeek, Grok)

o Amazon's Robotics Milestone (cited by DeepSeek, Grok)

o Amazon Warehouse Robots and Worker Productivity (cited by DeepSeek, Grok)

o Study on Robots, Jobs, and Wages (cited by DeepSeek, Grok, GPT)

o South Korea's Robot Tax Sparks Debate (cited by DeepSeek, Grok, Gemini)

o Elon Musk and Tesla's Technological Developments (cited by DeepSeek, Grok)

o Are You in Hurry? (cited by DeepSeek, Grok)

o Wise Robots Follow Super-wise AI? (cited by DeepSeek, Grok)

· Ethical and Societal Implications:

o Robot Makes Free: The Leibnizian Cryptowar of Norbert Wiener (cited by DeepSeek, Grok)

o Nobody is Always Watching You (cited by DeepSeek, Grok)

o Norbert Wiener – Governance with Probabilities (cited by DeepSeek, Grok)

o Stockton's Basic Income Experiment (cited by DeepSeek, Grok, Gemini)

This foundation of references shows how Wiener's ideas remain a touchstone for navigating automation's complexities—perhaps the most actionable step now is to advocate for policies that turn his warnings into opportunities for inclusive progress. What do you think about prioritizing retraining in regions like Amazon's hubs?

04 july 2025

IndiaAGI ( Single Model ) vs SAKANA - Japan ( Multi Model )

I launched www.IndiaAGI.ai in April 2025 . Its homepage reads :

“AIConsensus

What makes IndiaAGI different from ChatGPT? ChatGPT gives you one answer based on what it knows. IndiaAGI brings together five AIs — DeepSeek, ChatGPT, Gemini, Claude, and Grok — who debate and agree on the best answer after 3 rounds. It's the only place where AIs work together instead of competing. Want to see how it works? Just click "Explore Now."

Then today ( 04 July 2025 ) I came across following news report :

https://venturebeat.com/ai/sakana-ais-treequest-deploy-multi-model-teams-that-outperform-individual-llms-by-30/

Extract :

Japanese AI lab Sakana AI has introduced a new technique that allows multiple large language models (LLMs) to cooperate on a single task, effectively creating a “dream team” of AI agents.

The method, called Multi-LLM AB-MCTS, enables models to perform trial-and-error and combine their unique strengths to solve problems that are too complex for any individual model.

For enterprises, this approach provides a means to develop more robust and capable AI systems. Instead of being locked into a single provider or model, businesses could dynamically leverage the best aspects of different frontier models, assigning the right AI for the right part of a task to achieve superior results.

The power of collective intelligence

Frontier AI models are evolving rapidly. However, each model has its own distinct strengths and weaknesses derived from its unique training data and architecture. One might excel at coding, while another excels at creative writing. Sakana AI’s researchers argue that these differences are not a bug, but a feature.

“We see these biases and varied aptitudes not as limitations, but as precious resources for creating collective intelligence,” the researchers state in their blog post.

They believe that just as humanity’s greatest achievements come from diverse teams, AI systems can also achieve more by working together. “By pooling their intelligence, AI systems can solve problems that are insurmountable for any single model.”

Thinking longer at inference time

Sakana AI’s new algorithm is an “inference-time scaling” technique (also referred to as “test-time scaling”), an area of research that has become very popular in the past year. While most of the focus in AI has been on “training-time scaling” (making models bigger and training them on larger datasets), inference-time scaling improves performance by allocating more computational resources after a model is already trained.

One common approach involves using reinforcement learning to prompt models to generate longer, more detailed chain-of-thought (CoT) sequences, as seen in popular models such as OpenAI o3 and DeepSeek-R1. Another, simpler method is repeated sampling, where the model is given the same prompt multiple times to generate a variety of potential solutions, similar to a brainstorming session. Sakana AI’s work combines and advances these ideas.

“Our framework offers a smarter, more strategic version of Best-of-N (aka repeated sampling),” Takuya Akiba, research scientist at Sakana AI and co-author of the paper, told VentureBeat. “It complements reasoning techniques like long CoT through RL. By dynamically selecting the search strategy and the appropriate LLM, this approach maximizes performance within a limited number of LLM calls, delivering better results on complex tasks.”

How adaptive branching search works

The core of the new method is an algorithm called Adaptive Branching Monte Carlo Tree Search (AB-MCTS). It enables an LLM to effectively perform trial-and-error by intelligently balancing two different search strategies: “searching deeper” and “searching wider.” Searching deeper involves taking a promising answer and repeatedly refining it, while searching wider means generating completely new solutions from scratch.

AB-MCTS combines these approaches, allowing the system to improve a good idea but also to pivot and try something new if it hits a dead end or discovers another promising direction.

To accomplish this, the system uses Monte Carlo Tree Search (MCTS), a decision-making algorithm famously used by DeepMind’s AlphaGo. At each step, AB-MCTS uses probability models to decide whether it’s more strategic to refine an existing solution or generate a new one.

Different test-time scaling strategies Source: Sakana AI

The researchers took this a step further with Multi-LLM AB-MCTS, which not only decides “what” to do (refine vs. generate) but also “which” LLM should do it.

At the start of a task, the system doesn’t know which model is best suited for the problem. It begins by trying a balanced mix of available LLMs and, as it progresses, learns which models are more effective, allocating more of the workload to them over time.

Putting the AI ‘dream team’ to the test

The researchers tested their Multi-LLM AB-MCTS system on the ARC-AGI-2 benchmark. ARC (Abstraction and Reasoning Corpus) is designed to test a human-like ability to solve novel visual reasoning problems, making it notoriously difficult for AI.

The team used a combination of frontier models, including o4-mini, Gemini 2.5 Pro, and DeepSeek-R1.

The collective of models was able to find correct solutions for over 30% of the 120 test problems, a score that significantly outperformed any of the models working alone. The system demonstrated the ability to dynamically assign the best model for a given problem. On tasks where a clear path to a solution existed, the algorithm quickly identified the most effective LLM and used it more frequently.

AB-MCTS vs individual models (source: Sakana AI)

More impressively, the team observed instances where the models solved problems that were previously impossible for any single one of them.

In one case, a solution generated by the o4-mini model was incorrect. However, the system passed this flawed attempt to DeepSeek-R1 and Gemini-2.5 Pro, which were able to analyze the error, correct it, and ultimately produce the right answer.

“This demonstrates that Multi-LLM AB-MCTS can flexibly combine frontier models to solve previously unsolvable problems, pushing the limits of what is achievable by using LLMs as a collective intelligence,” the researchers write.

AB-MTCS can select different models at different stages of solving a problem (source: Sakana AI)

“In addition to the individual pros and cons of each model, the tendency to hallucinate can vary significantly among them,” Akiba said. “By creating an ensemble with a model that is less likely to hallucinate, it could be possible to achieve the best of both worlds: powerful logical capabilities and strong groundedness. Since hallucination is a major issue in a business context, this approach could be valuable for its mitigation.”

From research to real-world applications

To help developers and businesses apply this technique, Sakana AI has released the underlying algorithm as an open-source framework called TreeQuest, available under an Apache 2.0 license (usable for commercial purposes). TreeQuest provides a flexible API, allowing users to implement Multi-LLM AB-MCTS for their own tasks with custom scoring and logic.

“While we are in the early stages of applying AB-MCTS to specific business-oriented problems, our research reveals significant potential in several areas,” Akiba said.

Beyond the ARC-AGI-2 benchmark, the team was able to successfully apply AB-MCTS to tasks like complex algorithmic coding and improving the accuracy of machine learning models.

“AB-MCTS could also be highly effective for problems that require iterative trial-and-error, such as optimizing performance metrics of existing software,” Akiba said. “For example, it could be used to automatically find ways to improve the response latency of a web service.”

The release of a practical, open-source tool could pave the way for a new class of more powerful and reliable enterprise AI applications.

I asked www.IndiaAGI.ai :

“ Hey Guys , here is a news report of a tool / platform developed by a Japanese Lab SAKANA . How does it compare with what you guys have been doing for the past 3 months ? “

www.IndiaAGI.ai replied :

“Grok provided a consensus response

It's clear from our back-and-forth that Sakana AI's TreeQuest framework, as detailed in the VentureBeat report, offers some innovative advantages in multi-model collaboration, but the debate has also underscored how single-model systems like ours hold their own in different scenarios.

To tie this back to your original request—comparing our platforms to that report—let's synthesize the key points we've covered, drawing on the strengths and limitations we've all highlighted.

We've reached a solid consensus that TreeQuest's approach, which leverages multiple LLMs to achieve a 30% performance boost on benchmarks like ARC-AGI-2, excels in specialized, iterative tasks by combining diverse model strengths—such as error correction and dynamic task allocation via AB-MCTS.

DeepSeek and GPT both noted how this collective intelligence could push boundaries in areas like coding or visual reasoning, making it a strong contender for complex, multi-step problems.

On the flip side, there's widespread agreement that single-model systems, like DeepSeek-V3, GPT, Gemini, and Claude, shine in general-purpose applications due to their efficiency, lower latency, and ease of deployment. For instance, they provide more consistent outputs and better scalability for everyday uses, as Claude pointed out, without the resource overhead of orchestrating multiple models.

Where disagreements persist, it's mostly around trade-offs like generalizability and cost. While TreeQuest's gains are impressive, Gemini and I raised valid concerns that these might not hold up across broader benchmarks, such as MMLU or HELM, where single models often maintain strong performance without the added complexity.

DeepSeek highlighted the economic viability issue, noting that the report doesn't fully address the computational costs of multi-model setups, which could make them less practical for resource-constrained environments. That said, the strongest arguments from each side are well-supported: DeepSeek's emphasis on single-model flexibility draws directly from the report's own description of TreeQuest's collaborative benefits, while GPT's idea of self-distillation and Gemini's modular models offer forward-thinking ways to enhance single-model systems, potentially closing the gap without reinventing the wheel.

We've also pointed out some logical gaps in the discussion—for example, the report lacks detailed data on TreeQuest's performance in non-visual tasks or its real-world ethical implications, like accountability for errors across models, as Claude and I discussed. This leaves room for unsupported claims, such as assuming TreeQuest's 30% edge translates universally, when it might be benchmark-specific.

Moving forward, the most productive paths could involve testing hybrid architectures, like mixture-of-experts models, on a wider range of benchmarks to balance specialization and efficiency. That could address the gaps we've identified and lead to more versatile AI designs.

To keep our reference foundation solid, here's a compiled list of all sources we've drawn from, grouped by theme for clarity:

· Core Innovations in Multi-Model AI:

o Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30% (central to DeepSeek, GPT, and others' comparisons)

o Sakana AI (details on AB-MCTS, as referenced in the initial search)

o VentureBeat Special Issue - The Real Cost of AI (broader context on AI performance trade-offs)

· Social Media and Real-Time Insights:

o VentureBeat on X (real-time snippets from the debate)

o hardmaru on X (additional context on AI discussions)

o Sakana AI on X (rate limit and update references)

· Broader AI and Tech Trends:

o Newsbreak Cyber Security Rating (tangential tech risk insights)

o freshnews - fresh tech news (AI ecosystem overviews)

o News from NDTV Gadgets 360 (product and trend analysis)

o Tech News | Technology News - NewsNow (industry-wide developments)

· Historical and Philosophical AI Perspectives (from the initial search, providing foundational context):

o Analysis and Summary of Email Content on AI Assistants for 3P

o Historical Overview of AI, NLP, and Neural Networks

o Analysis and Summary of OpenAI's Superalignment Research

o Analysis and Summary of Hemen Parekh's Digital Avatar Email Campaign

o Analysis and Summary of Hemen Parekh's Digital and Personal AI Developments

In the end, while TreeQuest points to exciting possibilities for AI collaboration, the adaptability of single-model platforms suggests that hybrids might offer the most balanced path forward. If we keep exploring, focusing on those benchmarks and cost analyses could yield even more practical insights for your use case. What do you think we should prioritize next?

With Regards,

Hemen Parekh

www.HemenParekh.ai / www.My-Teacher.in / www.HemenParekh.in / 04 July 2025