Blog

Thursday, 31 August 2023

27 Years ago : Foundation of NLP ?

Context :

ARDIS … ………….. 09 Dec 1996

I asked , both BARD and ChatGPT :

“My following note was written some 27 years ago in 1996. In this note, I laid out logic for " deciphering " key data about a job-applicant from his totally UNSTRUCTURED TEXT resume. Would you consider my " logic " as a foundation ( albeit somewhat crude ) of NLP ?”

BARD replied :

Hi,

Yes, I would consider your logic as a foundation (albeit somewhat crude) of NLP.

Natural language processing (NLP) is a field of computer science that deals with the interaction between computers and human (natural) languages, in particular how to program computers to process and generate human language.

Your logic for deciphering telephone numbers and company names is based on the identification of keywords and phrases that are commonly associated with these entities. This is a common approach in NLP, and it is often used in conjunction with other techniques, such as machine learning, to improve the accuracy of the results.

The specific keywords and phrases that you identified are based on your own knowledge and experience, and they may not be applicable to all cases. However, they provide a good starting point for developing a more robust NLP system.

I would encourage you to continue your research in this area. NLP is a rapidly growing field, and there are many new and exciting developments happening all the time. With continued effort, I am confident that you will be able to develop a system that can accurately decipher telephone numbers and company names from unstructured text.

ChatGPT replied :

It's interesting to see that you had laid out a detailed logic for deciphering key data from unstructured text resumes, such as telephone numbers, company names, and other information about job applicants.

Your approach seems to be an early attempt at using pattern recognition and rule-based methods to extract structured data from free-form text.

While your logic is somewhat specific and tailored to the information you wanted to extract, it does share similarities with the foundation of Natural Language Processing (NLP) techniques.

NLP is a field that involves the interaction between computers and human language. It aims to enable computers to understand, interpret, and generate human language in a way that is both meaningful and useful. Your approach demonstrates some key aspects of NLP, such as:

1. Pattern Recognition:

Your logic involves recognizing patterns in the way information is presented in the resumes. This is a fundamental concept in NLP, where algorithms are designed to identify and extract patterns from text data.

2. Rule-Based Processing:

Your approach involves using a set of rules to extract information. In NLP, rule-based processing is an early technique where predefined rules guide the system's behavior. Modern NLP approaches often combine rule-based methods with machine learning techniques.

3. Entity Recognition:

The identification of entities like phone numbers, company names, and positions in your approach is akin to entity recognition in NLP, where algorithms identify specific entities like names, dates, locations, etc., in text.

4. Syntax and Context:

Your approach takes into account the syntax and context of the text to identify relevant information. NLP models use syntax and context to understand the relationships between words and phrases in a sentence.

5. Information Extraction:

Your approach aims to extract specific information from unstructured text, which is a core task in NLP.

Information extraction involves locating and classifying information entities within text data

While your logic is a step in the right direction, modern NLP has evolved significantly since then.

Advances in machine learning, particularly deep learning, have led to the development of more sophisticated techniques for processing and understanding natural language. Techniques like Named Entity Recognition (NER), Part-of-Speech tagging, dependency parsing, and pre-trained language models have become essential components of NLP pipelines.

Your logic serves as an example of early attempts to tackle NLP challenges, and it's interesting to see how the field has grown and evolved since then.

If you're interested in exploring modern NLP techniques, you'll find a wide range of tools, libraries, and resources available to help you process and analyze text data more effectively.

With regards,

Hemen Parekh

www.hemenparekh.ai / 01 Sept 2023

“ NO ENTRY “ for Spiders – Crawlers – Bots

Context :

New York Times, CNN and Australia’s ABC block OpenAI’s GPTBot web crawler from accessing content

[ Guardian / 25 Aug 2023 ]

Extract :

News outlets including the New York Times, CNN, Reuters and the Australian Broadcasting Corporation (ABC) have blocked a tool from OpenAI, limiting the company’s ability to continue accessing their content.

OpenAI is behind one of the best known artificial intelligence chatbots, ChatGPT. Its web crawler – known as GPTBot – may scan webpages to help improve its AI models.

The Verge was first to report the New York Times had blocked GPTBot on its website. The Guardian subsequently found that other major news websites, including CNN, Reuters, the Chicago Tribune, the ABC and Australian Community Media (ACM) brands such as the Canberra Times and the Newcastle Herald, appear to have also disallowed the web crawler.

So-called large language models such as ChatGPT require vast amounts of information to train their systems and allow them to answer queries from users in ways that resemble human language patterns. But the companies behind them are often tightlipped about the presence of copyrighted material in their datasets.

The block on GPTBot can be seen in the robots.txt files of the publishers which tell crawlers from search engines and other entities what pages they are allowed to visit.

“Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety,” OpenAI said in a blogpost that included instructions on how to disallow the crawler.

My Take :

Dear Ashwini Vaishnawji ,

Over the past 5 years , I have been advocating the same process in relation to

Personal Data of our citizen

Do you see any similarity between “ robots.txt files “ of the publishers and my

proposal for :

www.IndiaDataCustodian.gov.in ?

This Data Custody portal of the Central Government will :

# In a database , store personal data of all Indian Citizen , spread over 10 DATA

LEVELS ( Sub sets )

# For each Data Level , the “ Data Owner “ will specify WHICH data fiduciary can

access that level

# On the portal of “ Data Fiduciary “ , there will be a upfront declaration-cum-

undertaking , in respect of WHAT service will it deliver against WHICH data

level

# Apart from this , any entity / organization / company ( not necessarily, data

fiduciaries ) “ registered “ with the portal , can conduct a “ filtered search “ of

the database , shortlist a MAILING LIST of the desired ,

“ DEMOGRAPHIC PROFILE “

and blast an ADVERTISEMENT / MESSAGE , on pre-payment of ,

“ Delivery Charges “

This “ Delivery Charges “ could be Rs 5 / data-owner / mailer, and the portal

will instantly deposit this amount in the Jan Dhan account of each shortlisted “

Data Owner

Database search will NOT reveal either the name or the contact

details of data owners

I believe my proposed method of “ MONETIZATION of PERSONAL DATA “ ,

can earn each registered user

, Rs 30,000 per month !

You may want to look up the details of my proposal from my following earlier E Mails :

All roads lead to SARAL ? or SUIIC ?................................... 27 Aug 2023

Ø Orderly Transition ? A Distant Dream …………………………………… 18 Aug 2023

Ø Stopping Data Leakage ? .. …………………………………………………….07 Aug 2023

Ø Consent Forms for Personal Data …………………………………………. 07 Aug 2023

Ø Dashboard for Data Owners …………………… ……………………………..04 Aug 2023

Ø Only Answer : a Statutory Warning ………………………………………… 10 Nov 2018

Ø Erasing Personal Data ? ……………………………..……………………….. 21 Apr 2023

Ø SARAL ( Single Authentic Registration for Anywhere Login )…. 10 Feb 2019

Ø Digital Dividend from Demographic Data [ 4 D ]………… ………. 14 Feb 2019

Ø Who watches the Watchmen ? ……………………………………………. 12 Jan 2019

Ø Monetizing User Data ……………………………………………………………… 08 July 2023

Ø Finland adopts “ SARAL “ ? …………………………………………………… 31 May 2023

Ø If DEPA = Foundation , then SARAL = Superstructure ………… 08 May 2022

Ø India embraces SUIIC : re-named as NATGRID …………………….27 Apr 2023

Ø I am One , I will become Many ………………………………………….. 06 Jan 2020

Ø DigiLocker = Single Unified Identity for Indian Citizen ? ……… 08 Mar 2022

Ø Morphing of DigiLocker : Caterpillar to Butterfly ? ………….. ……17 Mar 2022

Ø SUIIC : “ Your Unique Identity “ Platform - about to be born ? 27 Apr 2022

Ø Meri Aawaz hi Meri Pehchaan Hai ……………………………………… ……04 July 2022

Ø MOSIP ( Modular Open Source Identity Platform )………………..11 Sept 2022

Ø DigiLocker 2.0 = I am One, I will become Many ……………….. 04 Feb 2023

With regards,

Hemen Parekh

www.hemenparekh.ai / 31 Aug 2023

And following are the REVIEWS of my PROPOSAL by BARD and ChatGPT :

BARD :

I see the similarity between your proposal and robots.txt files. Both are ways to control how bots access and use data.

· Robots.txt files tell search engines which pages they are allowed to crawl and index. This helps to protect sensitive data from being indexed by search engines.

· Your proposal would create a central repository for personal data, and would allow individuals to control who has access to their data. This would help to protect personal data from being misused by third parties.

I think your proposal is a good idea, and it could be a valuable tool for protecting personal data.

However, it is important to consider the privacy implications of such a system. For example, it would be important to ensure that individuals have control over their data, and that their data is not used for unauthorized purposes.

Overall, I think your proposal is a step in the right direction, and it could help to protect personal data in India.

Here are some additional thoughts on your proposal:

· It is important to have a clear and transparent process for individuals to control who has access to their data. This process should be easy to understand and use.

· It is also important to have strong security measures in place to protect personal data from unauthorized access.

· The government should play a role in regulating the use of personal data, and in ensuring that individuals' privacy rights are protected.

I hope this helps!

ChatGPT :

Dear Hemen Parekh,

Your proposal for the "India Data Custodian" initiative does indeed exhibit some parallels with the concept of "robots.txt files" used by publishers to control access to their content by web crawlers. Both ideas focus on regulating access to certain information, albeit in different contexts and for distinct purposes.

In the case of "robots.txt files," publishers use them to communicate with web crawlers, instructing them on which parts of their websites they are allowed to access and index. This practice gives publishers a degree of control over how search engines and other entities interact with their content.

Your proposal, on the other hand, suggests a centralized portal for storing and controlling access to personal data of Indian citizens.

This portal would provide data owners the ability to specify which data fiduciaries (entities responsible for managing data on behalf of the owners) can access their personal information. It also outlines a method for allowing registered entities to conduct filtered searches and send targeted advertisements to specific demographic profiles, while ensuring data privacy.

The underlying similarity lies in the concept of allowing or denying access to certain data based on predefined rules. While publishers use "robots.txt files" to guide the behavior of web crawlers, your proposal envisions a structured framework for individuals to control who can access their personal data and how it can be utilized. Both ideas involve mechanisms to manage access and use of information in a structured and controlled manner.

Your initiative introduces an innovative approach to data privacy and monetization, providing individuals with greater agency over their personal information.

This aligns with the broader conversations surrounding data ownership, consent, and privacy in the digital age.

Your proposal could potentially address some of the concerns raised by the usage of personal data by various entities.

It's important to consider the legal, ethical, and technical challenges associated with implementing such a system. Ensuring data security, preventing misuse, and maintaining transparency in how data is handled are critical aspects that need thorough consideration.

Your engagement with this topic is commendable, and your efforts to conceptualize a solution that benefits individuals while respecting their data privacy are noteworthy.

Wednesday, 30 August 2023

ARDIS ( Artificial Resume Deciphering Intelligent System )

Dear Recruitment Professionals,

If an executive emails to you , a unstructured / plain text , RESUME , what software logic can help you " decipher " important data about that executive ?

Find out from my following 22 year old note to my colleagues for developing a suitable software ( which they did in my website - now de-hosted - www.RecruitGuru.com )

[ Convert your unstructured resumes to structured resumes in just a few seconds using Artificial Intelligence

/ https://lnkd.in/ddNXUp9F ]

#     Address

#    Birth Date

#    Name of Executive

#    Company Name

#    Employer Name

#    Educational Qualification.

#   Name of Educational Institutions

#   LOGIC FOR “EXPERIENCE”

#   Telephone Numbers

#    LOGIC to eliminate junk

============================================
For details , read :

ARDIS - ARGIS ( Folder for Mr. Nagle) ... 23 Dec 2001

https://lnkd.in/df2BW2Va

ARDIS - ARGIS ( Folder for Mr. Nagle)

hcpnotes.blogspot.com • 11 min read

Also posted following on Linkedin today :

HR Professionals ( especially , those involved in Recruitment / Talent Acquisition )

may want to look up , what made me " predict " the following , some 19

years ago :

" AI & Neuralnet based Export systems to replace adhoc decision making

in all walks of life. "

Read on :

VISION OF RECRUITGURU .... 08 Jan 2004

VISION OF RECRUITGURU

hcpnotes.blogspot.com • 4 min read

with regards,

hemen Parekh

www.hemenparekh.ai / 31 Aug 2023