Hi Friends,

Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do. There is just no time to look back, no time to wonder,"Will anyone read these pages?"

With regards,
Hemen Parekh
27 June 2013

Now as I approach my 90th birthday ( 27 June 2023 ) , I invite you to visit my Digital Avatar ( www.hemenparekh.ai ) – and continue chatting with me , even when I am no more here physically

Sunday, 19 October 2025

AI Must Speak Indian Languages

AI Must Speak Indian Languages

The True Challenge for AI in India

I’ve been following the recent developments around creating indigenous foundational AI models for India's vast linguistic landscape. The push to build systems that understand and communicate in our myriad languages is not just a technological challenge; it is a cultural necessity. For AI to be truly integrated into the fabric of our society, it must speak the languages of its people, not just translate them.

This mission resonates deeply with the work I've been doing for years on a more personal scale. My own fascination with Natural Language Processing (NLP) isn't new; I was exploring the basics of AI and neural networks back in 1996. Even then, the potential for machines to understand human language was clear, though the path to achieving it seemed monumental.

A Microcosm of a National Goal

The core idea driving my efforts to create my digital twin is this: for an AI to truly represent my way of thinking, it must be fed the entirety of my life's work. It's a principle I've discussed extensively with collaborators like Kishan Kokal, as we explored how to make my vast archive of blogs accessible to AI models like Gemini (Next Step in Evolution of my Virtual Avatar).

What is this, if not a miniature version of what India needs to do? We are not just looking for an AI that can process Hindi or Tamil. We need an AI that understands the context, the idioms, the cultural nuances embedded within each sentence. This requires training on data that is inherently Indian.

My work with Manoj Hardwani (manoj.hardwani@atidan.com) and Sharon Zhang (sharon-hipaa@personal.ai) of Personal.ai to parse and extract keywords from my writings was an attempt to map my own intellectual landscape. We weren't just collecting words; we were trying to identify the core topics and themes that define my thought process. Building a foundational model for India is the same challenge on a national scale—mapping the linguistic and cultural landscape of over a billion people.

Reverse-Engineering Understanding

A while ago, I mused about the idea of “reverse engineering” the blogging process—creating a system that could crawl news and find topics relevant to what I’ve written about extensively. This is precisely the kind of foundational work required for large-scale language models. Before an AI can speak, it must first listen and read, identifying patterns and understanding context from a massive, relevant dataset.

Seeing efforts to build indigenous models validates this perspective. We cannot simply rely on models trained on Western data and then fine-tune them for India. That approach will always miss the soul of our communication. The real work lies in building from the ground up, using our own data, reflecting our own realities.

It is heartening to see this principle being applied at a national level. The task is immense, but it is the only way to ensure that the future of AI in India is not a mere translation of a foreign idea, but a genuine reflection of our diverse and vibrant voices.


Regards,
Hemen Parekh


Of course, if you wish, you can debate this topic with my Virtual Avatar at : hemenparekh.ai

No comments:

Post a Comment