Wednesday, March 27, 2024

Attention is all you need

 

That is a groovy title for a research paper that revolutionized natural language processing and created deep learning architecture called Transformer that became the foundation for modern AI -the generative AI. Previously neural network (Recurrent Neural Network -RNN) process data sequentially -one word at a time in the order of appearance. Transformer was able to capitalize on pattern that exist between words without being located in proximity or are in sequence, this long distance dependencies in words is ‘attention mechanism’. So, not all words are important only some are, and then you work the pattern and imprint it simultaneously to get the meaning. This is pretty amazing. Amazing because that is how I also read, infact all slow readers attempt such tricks.

There are two steps in trying to comprehend written text. One is to understand basic structure of language, that is, grammar as also meaning of words. Since I really couldn’t understand grammar, I worked it by reading a lot and getting the structure, iterative learning -more you read better you become. So that when you write something wrong you get a feeling of something not quite right, so you pause and evaluate it further to get it right. Meaning of the words are always contextual hence iterative learning works. The more you read better you become while quality reading enhances your intellect. This is precisely how AI also learned language by getting pattern approximation through large data and powerful computation. I have discussed these few years back in context to GPT (https://depalan.blogspot.com/2021/12/human-intelligence-artificial.html). Now that you understand language how do you get the meaning is the second step. There was this big gap between wanting to read and capacity to read. I had racks of books to finish, and I was equally excited to know more as also understand high caliber patterns, but the capacity to read was woefully slow. I really couldn’t read more than few pages in a day, and sometimes I was stuck in word mesh and sentence loop, it is crazy but words tend to jump and play around, it frustrating to hold the words in their places while you try to get the meaning. I figured that there are two types of reading; one is deliberately slow, like for instance if you are reading nuanced writing like Kafka’s story or Dickinson’s poem, the unhinged words too add to depth of meaning, and the other one is fast reading -that is when you are reading nonfiction. The trick for fast reading is to identify signifier words (that encapsulates much meaning- 'holds attention'/tokens) and then to link a pattern to get the meaning. Though less number of words but the one that carry high approximation for meaning, and interestingly jumping words adds to unintended complexity to meaning. This is quite crazy but meaning gets into zone of high probability of not intended and eventual mess up.  Probability of success vary with how much comfort you have in the subject; approximation is generally high if you are adept in identifying important signifier words. Indeed, I used to teach fast reading techniques for few years –a kind of job you finish fast with decent money then enough time to focus on real interests.

Finding signifier words and linking it to get the meaning is precisely how ‘attention mechanism’ works -transformer parallelization, powerful AI makes it instantaneous. Since human mind is relatively slow, we take words sequentially, this is the problem with language we have trapped reality into linear time sequence hence the meaning get dispersed and is rarely able to express the holistic moment. Sometime back I discussed (https://depalan.blogspot.com/2021/07/story-of-your-life.html) about this amazing short story “Story of your life” (Ted Chiang) as also movie ‘Arrival’ on aliens who use complex symbols capsulating time to be understood as a whole. With parallelized computation through powerful GPU’s generative AI maybe we are at the cusp of nuanced meaning of the moment in great writings that has so far eluded us. Van Gogh impressions oscillates with our meditative being since it catches the essence.