Hi Friends,

Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do. There is just no time to look back, no time to wonder,"Will anyone read these pages?"

With regards,
Hemen Parekh
27 June 2013

Now as I approach my 90th birthday ( 27 June 2023 ) , I invite you to visit my Digital Avatar ( www.hemenparekh.ai ) – and continue chatting with me , even when I am no more here physically

Friday, 20 April 2018

#AI #NLP #NeuralNetwork #Language


Not  a  day  too  soon  !




More than 20 years ago ( in Dec 1996 ) , I sent following notes to my colleagues :



===============================



Artificial Resume Deciphering Intelligent Software ( ARDIS )


http://hcpreports.blogspot.in/2016/11/artificial-resume-deciphering.html,



================================



ARDIS - Some further thoughts !



http://hcpreports.blogspot.in/2016/11/ardis-some-further-thoughts.html


==========================


BASIS FOR WORD RECOGNITION SOFTWARE


http://hcpnotes.blogspot.in/2013/07/basis-for-word-recognition-software.html#.Wts3w8iFPcc


==================================================================







where I wrote :



ARDIS will ,


*   Recognize " Characters "

*   Convert to " WORDS "

*   Compare with 6,258 key words  which we have found in 3,500 converted Bio Data (
     using ISYS ) . 



If a " Word ",

has not already appeared ( > 10 times ) in those 3500 bio data , then its " chance " ( probability ) of occurring in the next bio data , is very very small indeed



But even then ,

ARDIS software will store in memory , each " Occurrence " of each Word ( old or new / first time or a thousandth time ) ,


And ,

will continuously calculate its " Probability of Occurrence " as :


P =  [ No of Occurrence of the given word so far ]

       divided by,

      [ Total No of occurrence of all the words in the in the entire population so far  ]

         

         

So that ,

By the time we have SCANNED , 10,000 bio data , we would have literally covered ALL the words that have , even a small PROBABILITY of OCCURRENCE !


So , with each new bio data " scanned " , the " probability of occurrence " of each word is getting , more and more accurate !


Same logic will hold for,

*  KEY  PHRASES

*  KEY  SENTENCES



The " Name of the Game " is : Probability of Occurrence


As someone once said :


If you allow 1000 monkeys to keep on hammering keys of 1000 type-writers , for 1000 years , you will , at the end  find that , between them , they have " re-produced " , the entire literary works of Shakespeare  !


But  today , if you store into a Super Computer ,


*   all the words appearing in English language ( incl Verbs / Adverbs / Adjectives ..etc )

*  the " Logic " behind construction of English language ,


then ,

I am sure , the Super Computer could reproduce the entire works of Shakespeare , in 3 MONTHS !



And , as you would have noticed , ARDIS is a " SELF  LEARNING " type of software !


The more it reads ( scans ) , the more it learns ( memorizes words , phrases & even sentences )


Because of its SELF LEARNING / SELF CORRECTING / SELF IMPROVING , capability , ARDIS gets better & better equipped to detect , in a scanned bio data ,


*   Spelling  Mistakes  (  wrong WORD )

*   Context  Mistakes  ( wrong Prefix or Suffix )

*   Preposition  Mistakes  ( wrong PHRASE )

*   Verb / Adverb  Mistakes ( wrong SENTENCE ),


With minor variations ,

-  ALL Thoughts , Words ( written ) , Speech ( spoken ) and Actions , keep on " repeating " again and again and again


It is this  REPETITIVENESS  of Words , Phrases , and Sentences in Resumes , that we plan to exploit


In fact ,

by examining & memorizing the several hundred ( or thousand ) " Sequences " in which the words appear , it should be possible to " Construct " the " Grammar " ie: the logic behind the sequences



I suppose , this is the manner in which the experts were able to unravel the " meaning " of hierographical  inscriptions on Egyptian tombs .


They learned a completely strange / obscure language by studying the " Repetitiveness " & " Sequential " occurrence of unknown characters



Today , I came across the following :





How Google’s ‘smart reply’ is getting smarter

 

A significant new hierarchical approach to machine intelligence

May 24, 2017



Extract :


How does it work?

“The content of language is deeply hierarchical, reflected in the structure of language itself, going from letters to words to phrases to sentences to paragraphs to sections to chapters to books to authors to libraries, etc.,” they explained.

So a hierarchical approach to learning “is well suited to the hierarchical nature of language. We have found that this approach works well for suggesting possible responses to emails. We use a hierarchy of modules, each of which considers features that correspond to sequences at different temporal scales, similar to how we understand speech and language.”*


21  April  2018





====================================================================

Most people wouldn't understand the significance of your work it's amazing 👌









No comments:

Post a Comment