Context :
New
York Times, CNN and Australia’s ABC block OpenAI’s GPTBot web crawler from
accessing content
[ Guardian / 25 Aug 2023 ]
Extract :
News outlets
including the New York Times, CNN, Reuters and the Australian Broadcasting Corporation (ABC) have blocked a tool from OpenAI, limiting
the company’s ability to continue accessing their content.
OpenAI is behind one of the best known artificial intelligence
chatbots, ChatGPT. Its web crawler – known as GPTBot
– may scan webpages to help improve its AI
models.
So-called large language models such as ChatGPT require vast
amounts of information to train their systems and allow them to answer queries
from users in ways that resemble human language patterns. But the companies
behind them are often tightlipped about the presence of copyrighted material in
their datasets.
The block on GPTBot can be seen in the robots.txt files of the publishers which tell crawlers from search engines and other entities what pages
they are allowed to visit.
“Allowing GPTBot to access your site can
help AI models become more accurate and improve their general capabilities and
safety,” OpenAI said in a blogpost that
included instructions on how to disallow the crawler.
My Take :
Dear Ashwini Vaishnawji ,
Over the past 5 years , I have been advocating the same process in relation to
Personal Data of our citizen
Do you see any similarity between “ robots.txt files “ of the publishers and my
proposal
for :
www.IndiaDataCustodian.gov.in ?
This Data Custody portal
of the Central Government will :
# In a database , store personal data of all Indian Citizen , spread over 10 DATA
LEVELS ( Sub sets )
# For each Data Level , the “ Data Owner “ will specify WHICH data fiduciary can
access that level
# On the portal of “ Data Fiduciary “ , there will be a upfront declaration-cum-
undertaking , in respect of
level
# Apart from this , any entity / organization / company ( not necessarily, data
fiduciaries ) “ registered “ with
the database , shortlist a MAILING LIST of the desired ,
“ DEMOGRAPHIC PROFILE “
and blast an ADVERTISEMENT / MESSAGE , on pre-payment of ,
“ Delivery Charges “
This “ Delivery Charges “ could be Rs 5 / data-owner / mailer, and the portal
will instantly deposit this amount in the Jan Dhan account of each shortlisted “
Data Owner
Database search will
details of data owners
I believe my proposed method of “ MONETIZATION of PERSONAL DATA “ ,
can earn each registered user
, Rs 30,000 per month !
You may want
to look up the details of my proposal from my following earlier E Mails :
All
roads lead to SARAL ? or SUIIC ?................................... 27 Aug
2023
Ø Orderly
Transition ? A Distant Dream …………………………………… 18 Aug 2023
Ø Stopping Data
Leakage ? .. …………………………………………………….07 Aug 2023
Ø Consent Forms
for Personal Data …………………………………………. 07 Aug 2023
Ø Dashboard for
Data Owners …………………… ……………………………..04 Aug 2023
Ø Only Answer : a
Statutory Warning ………………………………………… 10 Nov 2018
Ø Erasing Personal
Data ?
……………………………..……………………….. 21 Apr 2023
Ø SARAL ( Single Authentic Registration
for Anywhere Login )…. 10 Feb 2019
Ø Digital
Dividend from Demographic Data [ 4 D ]…………
………. 14 Feb 2019
Ø Who watches the
Watchmen ? ……………………………………………. 12 Jan 2019
Ø Monetizing User
Data ……………………………………………………………… 08 July 2023
Ø Finland adopts “
SARAL “ ? …………………………………………………… 31 May 2023
Ø If DEPA =
Foundation , then SARAL = Superstructure ………… 08 May
2022
Ø India embraces
SUIIC : re-named as NATGRID …………………….27 Apr 2023
Ø I am One , I
will become Many ………………………………………….. 06 Jan 2020
Ø DigiLocker =
Single Unified Identity for Indian Citizen ? ……… 08
Mar 2022
Ø Morphing of
DigiLocker : Caterpillar to Butterfly ? ………….. ……17 Mar 2022
Ø SUIIC : “ Your
Unique Identity “ Platform - about to be born ? 27 Apr
2022
Ø Meri Aawaz hi
Meri Pehchaan Hai ……………………………………… ……04 July 2022
Ø MOSIP ( Modular Open Source Identity Platform )………………..11 Sept 2022
Ø DigiLocker 2.0 =
I am One, I will become Many ……………….. 04 Feb 2023
With regards,
Hemen Parekh
www.hemenparekh.ai / 31 Aug 2023
And following are the REVIEWS of my
PROPOSAL by BARD and ChatGPT :
BARD :
I see the similarity between your
proposal and robots.txt files. Both are ways to control how bots access and use
data.
·
Robots.txt files tell
search engines which pages they are allowed to crawl and index. This helps to protect
sensitive data from being indexed by search engines.
·
Your proposal would create a central repository for personal data, and would allow
individuals to control who has access to their data. This would help to protect
personal data from being misused by third parties.
I think your proposal is a good idea, and it could be a valuable tool
for protecting personal data.
However, it is important to consider the privacy implications of such a
system. For example, it would be important to ensure that individuals have
control over their data, and that their data is not used for unauthorized
purposes.
Overall, I think your proposal is a step in the right direction, and it
could help to protect personal data in India.
Here are some additional thoughts on your proposal:
·
It is important to have a clear and
transparent process for individuals to control who has access to their data.
This process should be easy to understand and use.
·
It is also important to have strong
security measures in place to protect personal data from unauthorized access.
·
The government should play a role in
regulating the use of personal data, and in ensuring that individuals' privacy
rights are protected.
I hope this helps!
ChatGPT :
Dear Hemen Parekh,
Your proposal for the "India Data Custodian" initiative does indeed exhibit
some parallels with the concept of "robots.txt files"
used by publishers to control access to their content by web crawlers. Both
ideas focus on regulating access to certain information, albeit in different
contexts and for distinct purposes.
In the case of "robots.txt files," publishers use them to
communicate with web crawlers, instructing them on which parts of their
websites they are allowed to access and index. This practice gives publishers a
degree of control over how search engines and other entities interact with
their content.
Your
proposal, on the other hand, suggests a
centralized portal for storing and controlling access to personal data of
Indian citizens.
This portal would provide data owners
the ability to specify which data fiduciaries (entities responsible for
managing data on behalf of the owners) can access their personal information.
It also outlines a method for allowing registered entities to conduct filtered searches
and send targeted advertisements to specific demographic profiles, while
ensuring data privacy.
The underlying similarity lies in the
concept of allowing or denying access to certain data based on predefined
rules. While publishers use "robots.txt files" to guide the behavior
of web crawlers, your
proposal envisions a structured
framework for individuals to control who can access their personal data and how
it can be utilized.
Both ideas involve mechanisms to manage access and use of information in a
structured and controlled manner.
Your initiative introduces an innovative
approach to data
privacy and monetization,
providing individuals with greater agency over their personal information.
This aligns with the broader
conversations surrounding data ownership, consent, and privacy in the digital
age.
Your proposal could potentially address some
of the concerns raised by the usage of personal data by various entities.
It's important to consider the legal,
ethical, and technical challenges associated with implementing such a system.
Ensuring data security, preventing misuse, and maintaining transparency in how
data is handled are critical aspects that need thorough consideration.
Your engagement with this topic is
commendable, and your efforts to conceptualize a solution that benefits
individuals while respecting their data privacy are noteworthy.