Time and again whenever the word ‘transformers’ appears, it evokes memories of Sentinel Prime and Optimus Prime—the Autobots. People remember their heroics, their sacrifices, the civil war with Megatron’s Decepticons, how they lose the cube AllSpark that’s made of the transformers’ lifeblood Energon, destroy their planet Cybertron and land on Earth where they ultimately save humanity.
In reality, it wasn’t until November 2022 when yet another transformer—the Generative Pre-trained Transformer (or ChatGPT or GPT-3.5) introduced to the world by OpenAI—was revealed as a tool that could help humanity address its most pressing challenges. Since its release, this technology has given a new meaning to the word transformer.
“Comparison to the fictional characters of the alien transformers is nothing new. The characters are so famous that tools have been named after them already. Take for example, the Auto Bot Builder that leverages the power of GPT-3 to automatically build advanced chatbots tailored to the needs of enterprises,” says Phil Hermsen, Solutions Director, Data Science & AI, at HCLTech.
Launched Wednesday, the latest ‘Transformer’ GPT-4 is “more sophisticated, more creative and collaborative, and can solve difficult problems with greater accuracy.”
The company even claimed GPT-4 “performs better than humans on many standardized tests” and will produce fewer factually incorrect answers.
Based on its multimodal abilities, the new model can respond to images and can even generate captions, classifications and analyses, edits and iterate with users on creative and technical writing tasks. It’s powerful enough to handle 25,000 words, enabling content creation and extended conversations, as well as document search and analysis.
“We’ve spent 6 months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT, resulting in our best-ever results (though far from perfect) on factuality, steerability, and refusing to go outside of guardrail,” read a blog post from OpenAI that also tweeted on Tuesday.
And there are some more transformers, including Google’s BERT (Bidirectional Encoder Representations from Transformers) and its T5 (Text-to-Text Transfer Transformer).
Better known as foundation models now, these transformers along with generative AI are trending topics that have been around for some time with many organizations working on their capabilities. In this race, ChatGPT has been a revolution or a catalyst.
The birth of a transformer
What started with an aim to enable machine translation by a group of researchers at Google and the University of Toronto, ultimately ended up with the development of a transformer—a new type of neural network architecture—in 2017. This revolutionized multiple arenas in the machine learning (ML) world.
With the capability to train very large models, these transformers can process huge amounts of data that’s distributed and parallelized, and not sequential in nature, based on the recurrent neural networks (RNNs) of the past.
Why these transformers are special
Positional encoding, attention and self-attention are the three concepts that enable these transformers to leave behind the RNNs.
Built in a distributed fashion across multiple processors, positional encoding first leverages mass parallelization, subtracting the requirement to process one word of a sentence at a time.
Attention is the second stage in machine translation where patterns matter. How words are placed in the input and output sentences, while training content that then reflect or imitate those forms to perform machine translation on new phrases. In short, sentence-word-position matching.
In the final mechanism of self-attention, features such as object edges, shapes, grammar rules, parts of speech, homonyms, synonyms and antonyms from within unlabeled data, are identified and used in the model to better train the neural network for future processing. This concept is used in computer vision problems, convolutional neural nets (CNN) and natural language processing (NLP).
In the recent past, many organizations have created large language models (LLMs) that allow these transformers to do some incredible ML tasks related to NLP based on these concepts.
How do these transformers work
Designed to replace task-specific models, these transformers are mostly LLMs that undergo largescale pretraining with additional finetuning. They contain a type of deep neural network architecture from the surrounding words and present a numerical representation of text, emphasizing sequences of words. Critically important and applicable to a wide variety of downstream use cases, these transformers are evolving rapidly.
“With this wave of transformers what has changed is the acknowledgement that there needs to be a feedback loop that can improve them. This feedback loop exists in both a formal and informal state: the formal state is via Beta testers and selected clients, the informal is via social media, which is highlighting where the transformers are giving ‘wrong’ answers or inheriting bias answers, due to the training data,” adds Hermsen.
Neural intelligence and the artificial neural network
Inspired by the biological neural networks that constitute animal brains, neural intelligence uses neurons that only perform simple calculations.
While AI creates algorithms that can perform a certain task using a mix of probability and statistics, mathematics and neural networks, an artificial neural network (ANN)—usually called a neural network—is a computing system based on a collection of connected nodes or artificial neurons, which loosely model the neurons in a human brain.
Being an ML subset and designed to imitate the processing power of a human brain, ANNs is an architecture made up of an input, output and hidden layers that function by passing data through the layers of an artificial neuron.
The main components of such an architecture are input, weight, transfer function, activation function and bias.
“There is also the requirement for localized versions of the transformers. It is often said that the US and the UK are two countries divided by a common language, but then let’s add in, for example, Australian English and the need for localized versions become clear," says Hermsen.
“Named after the Decepticons’ leader, Megatron-Turing Natural Language Generation (MT-NLG) is a transformer-based language model with 530 billion parameters, making it the largest and most powerful of its kind and demonstrates unparalleled accuracy in natural language tasks such as completion prediction, commonsense reasoning, reading comprehension, natural language inferences, and word sense disambiguation,” adds Hermsen.
Neural networks are at the core of deep learning, which is rapidly evolving. The global neural network market—valued at $14.35 billion in 2020—is projected to reach $152.61 billion by 2030, registering a CAGR of 26.7% growth.
The fear factor
ANNs and ChatGPT together have the potential to revolutionize many industries. However, while creation of new jobs and opportunities are predicted by some experts, others fear automation will lead to cut-offs of many existing jobs and reduction in the workforce in the client service, transport and software development sectors.
With a focus on equity and fairness toward their employees, organizations must invest in retraining and upskilling programs to help workers transition into new roles. Forbes in a recent article stated organizations can also opt for flexible schedules or consider implementing a four-day workweek to reduce the impact of job losses, ensure that algorithms are transparent, accountable and auditable in terms of ethical implications of adopting these technologies and ensure that data is used ethically and with consent.
Types of neural network architectures and their application in industry
To solve ML problems, neural networks offer precision and accuracy in predictions to increase efficiency in various industries, including the stock market, transportation, weather, security, social media, agriculture, aerospace, defense, healthcare, banking and finance.
Transformer neural network: It does not have a concept of timestamps and passes through multiple inputs at once, processing complex and a variety of data more efficiently.
- This has shown good results in monitoring crops and weeds in the agriculture sector with the help of drones and visual transformer (ViT) models.
Standard neural network: It consists of perceptron that applies an input value and provides an output variable. Next is the feed-forward network that’s a multi-layered neural network where the information moves in a forward direction. The third is a residual network or a deep feed-forward network with several layers.
- It provides highly accurate and robust solutions such as fraud detection, business lapse and churn analysis, risk analysis and data mining in the finance, food, energy, medical and health care, science and engineering, transportation and communication industries, as well as the marketing and real estate sectors.
Recurrent neural network (RNN): Remembering the last-learnt forecasts, it predicts the future with accuracy. It consists of a long and short-term memory network that adds gates to an RNN to better memorize and sparsely connect RNN hidden layers into an echo state network.
- Turkey’s Sakarya province houses many factories and is heavily industrialized leaving the air quality in a very poor state. Data or statistical analysis of SO2 and PM10 derived by RNN, when compared to levels measured normally, brought accuracy and showed a huge difference in the readings.
- To improve predictions of the share price for the aerospace industry, a hybrid prediction model of Principal Component Analysis (PCA) and RNN showed that PCA could improve both accuracy and efficiency of prediction.
Convolutional neural network (CNN): A type of feed-forward network used for image analysis and language processing, CNNs detect patterns using features like edges, shapes and textures.
- An app based on deep CNN generates powerful features and demonstrates excellent defect detection results with low false alarm rates in the manufacturing sector.
Generative adversarial networks (GAN): Mainly used to train generative AI, it’s a type of unsupervised learning where synthetic data is generated by a generator via patterns from input data and then a discriminator decides whether the output is fake or genuine.
- It was used in the fashion industry where willingness to pay for GAN-generated products was much higher than non-GAN-generated products. This was found in a study conducted to find consumers’ evaluations of product consumption values, purchase intentions and willingness to pay for fashion products designed using GAN.
- It has also been helpful in 3D object generation, medicine, pandemics, image processing, face detection, texture transfer, and traffic controlling.
Researchers from the Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR) have recently developed neuromorphic computing using a semiconducting material called scandium nitride, which has excellent stability and complementary metal-oxide-semiconductor (CMOS) compatibility.
With the help of ML, the development of this artificial synapse—the connection between two neurons that serves both as a processor and a memory storage unit—is a revolution in the field of AI.
Neuromorphic computing functions with the help of ANNs. Based on Spiking Neural Networks (SNN), which bridge the gap between ML and neuroscience with neurons to carry out the computing, signals are passed here by these neurons in various layers.
Using electric spikes or signals, the inputs are then converted into output, like visual recognition and data interpretation.
Application areas for neuromorphic computing and engineering are mainly robotics, self-driving cars, olfaction and chemo-sensation, event vision sensors, neuromorphic audition, biohybrid systems for brain repair, embedded devices for neuromorphic time-series and collaborative autonomous systems.