Hi Friends,

Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do. There is just no time to look back, no time to wonder,"Will anyone read these pages?"

With regards,
Hemen Parekh
27 June 2013

Now as I approach my 90th birthday ( 27 June 2023 ) , I invite you to visit my Digital Avatar ( www.hemenparekh.ai ) – and continue chatting with me , even when I am no more here physically

Translate

Monday, 26 January 2026

The Data Problem in AI

The Data Problem in AI

I’ve been watching the AI conversation for years — the training races, the GPU land grabs, the breathless demos. So when Larry Ellison larry.ellison@oracle.com recently said that the biggest problem with today’s large AI models is that “they’re all trained on the same public internet data,” it landed for me as both obvious and profound.[^1]

Why his point matters (and why I agree)

Ellison’s observation — that so many flagship models share largely the same public training corpus — speaks to a structural truth: public data alone gives you generality, not business differentiation. Public corpora build capable general-purpose models, but the real competitive value for enterprises comes when models can reason securely over private, contextual data.

I’ve argued along similar lines before. In my post “A Case of 900 Million Orphans” I warned that the people who generate the raw behavioral trails that power models are often left out of the conversation about value and ownership — and that private, high-quality data will be where real value accrues instead of another round of public-data training.A Case of 900 Million Orphans

Public models give you language and pattern recognition. Private data gives you decisions that matter.

Three implications every leader should hear

  • Enterprises that think “buying the best foundation model is enough” are mistaken. The frontier is not only model size; it’s secure connection between reasoning engines and private operational data.
  • Whoever solves secure, auditable inference on private data will capture disproportionate economic value — and not just from selling models, but from selling dependable, regulated decision-making within industries like healthcare, finance, and supply chain.
  • The debate shifts from “who built the biggest model” to “who can make AI reliably and privately useful for mission-critical operations.”

Practical steps I advise for companies today

  1. Treat data as an asset to be read safely, not a commodity you dump into third-party models.
  • Vectorize and index critical records behind your control plane; use retrieval-augmented generation (RAG) patterns rather than indiscriminate retraining on private data.
  1. Build an inference-first architecture.
  • Low-latency, audit trails, and policy enforcement belong at inference time. Train less publicly; serve more privately.
  1. Invest in governance and consent.
  • Data contracts, provenance, and user/subject consent must be embedded; otherwise trust erodes and regulation follows.
  1. Start small with high-value use cases.
  • Focus on narrow, measurable problems (claims adjudication, contract summarization, clinical decision support) before scaling horizontally.
  1. Prepare for hybrid models and marketplaces.
  • Don’t assume a single vendor lock-in; design your stack to let specialized models query private data securely via APIs or isolation layers.

The open cautions I’ll keep repeating

  • Security and privacy are not checkboxes. Exposing private data — even in vectorized form — without provable controls invites risk.
  • Synthetic data and on-device learning will change the economics, but they won’t remove the need for strong governance and business-context signal.
  • Concentration of private data is a double-edged sword. Firms that centralize valuable enterprise datasets can enable breakthroughs — and also create monopolies that invite scrutiny.

My mental model going forward

Think of modern AI as having two phases:

  • Phase 1 — Foundation models trained on public data: broad capability, rapid innovation, commoditization risk.
  • Phase 2 — Inference and private-data reasoning: where business value, differentiation, and regulatory tension converge.

This is the phase we should be designing for now.

A short checklist for executives (two weeks to start)

  • Identify 2 high-impact workflows that fail today for lack of contextual data.
  • Prototype a RAG-powered pilot with strict access controls and audit logs.
  • Appoint a cross-functional owner for data governance and model behavior monitoring.
  • Map compliance risks and draft a minimal consent and redaction plan.

I don’t think the conversation is about replacing models — it’s about connecting them to the right data, with the right guardrails. As I’ve written before, the people who produce the data — customers, employees, citizens — deserve both protection and a voice in how that value is realized.A Case of 900 Million Orphans

If you’re building AI for the enterprise, start with the question: what private knowledge does the model need to do useful work for us? Then build the pipelines, contracts, and controls that let the model reason — securely — against that knowledge.

Regards, Hemen Parekh


[^1]: See reporting on Larry Ellison’s remarks and Oracle’s positioning on enterprise AI and private-data inference: Times of India and Moneycontrol.

Get correct answer to any question asked by Shri Amitabh Bachchan on Kaun Banega Crorepati, faster than any contestant


Hello Candidates :

  • For UPSC – IAS – IPS – IFS etc., exams, you must prepare to answer, essay type questions which test your General Knowledge / Sensitivity of current events
  • If you have read this blog carefully , you should be able to answer the following question:
"Why does training on the same publicly available data make large AI models similar, and how does private enterprise data change the value proposition?"
  • Need help ? No problem . Following are two AI AGENTS where we have PRE-LOADED this question in their respective Question Boxes . All that you have to do is just click SUBMIT
    1. www.HemenParekh.ai { a SLM , powered by my own Digital Content of more than 50,000 + documents, written by me over past 60 years of my professional career }
    2. www.IndiaAGI.ai { a consortium of 3 LLMs which debate and deliver a CONSENSUS answer – and each gives its own answer as well ! }
  • It is up to you to decide which answer is more comprehensive / nuanced ( For sheer amazement, click both SUBMIT buttons quickly, one after another ) Then share any answer with yourself / your friends ( using WhatsApp / Email ). Nothing stops you from submitting ( just copy / paste from your resource ), all those questions from last year’s UPSC exam paper as well !
  • May be there are other online resources which too provide you answers to UPSC “ General Knowledge “ questions but only I provide you in 26 languages !




Interested in having your LinkedIn profile featured here?

Submit a request.
Executives You May Want to Follow or Connect
Dinesh Bisht
Dinesh Bisht
Managing Director at Anudra Innovations LLP ...
Managing Director at Anudra Innovations LLP | Strategic Partner at Altum Vista | GCC Consultant | Telecom & Technology || Nokia | Ericsson | C-DOT || MBA-IT ...
Loading views...
Mohit Batta
Mohit Batta
Managing Director | Telecom & Digital Banking | Built ...
... telecommunications across 29 markets. With multiple international awards, accredited with various first‑time innovations and strategic initiatives ...
Loading views...
mohit.batta@sc.com
Rahul Moghe | Manufacturing Strategy | Global Logistics
Rahul Moghe | Manufacturing Strategy | Global Logistics
undefined
Experience · Senior Vice President Operations, Procurement & Logistics · Vice President Operations & Logistics · AVP operations · Plant Head · Production Manager.
Loading views...
rahul.moghe@vahdam.com
Harsh Bhatti
Harsh Bhatti
VP – Operations & Strategy | Ex
VP – Operations & Strategy | Ex-Delhivery | EV Mobility, Fleet Electrification | Logistics and Supply Chain Transformation · I am a strategic leader with ...
Loading views...
harsh@bluwheelz.co.in
Balaji Jagadish
Balaji Jagadish
Chief Financial Officer | LinkedIn
SUN Industrial Automation & Solutions Private Limited. Dec 2023 - Apr 2025 1 ... Master of Business Administration - MBA Finance and Financial Management Services ...
Loading views...

No comments:

Post a Comment