How Transformers have revolutionised AI

The Transformers are here and you should thank them for the recent revolution in generative AI.

Here's a really simple explanation of what transformers are (spoiler: these transformers are not really the 1980's toys) and how they've revolutionised AI over the past few years.

By Justin Eames | CEO

Transformers are likely to become comparably as revolutionary as the combustion engine to the next generation.

Unless you’ve hidden yourself away for the last couple of years (and who could blame you?), you will be very aware that artificial intelligence has exploded in practically every aspect of work and life. There have been no little steps, only giant leaps, and we’ve only just seen the start of the changes that are to come.

If you’ve ever stopped to ponder how those leaps happened, and why they came so fast, and what on earth Transformers - Robots in Disguise have to do with all this, then this article is for you.

Transformers, but nothing to do with the fantastic 1980s toys, or the recently and decreasingly fantastic movies, are the relatively little-known breakthrough engineering that we have to thank for many of the big and sudden advances in AI we’re experiencing.

Google introduced the concept of transformers in AI through the paper "Attention Is All You Need" (who says computer scientists can’t come up with an amusing white-paper title). It offered up a revolutionary advancement in AI.

Unlike previous AI models you may have heard of, like neural net and state machines, transformers introduced a key difference – they are able to process large amounts of data simultaneously, rather than one piece at a time. Not only can they process lots of things at once, but they can do that much more efficiently than any other approach to AI so far.

So it's all down to transformers. They are the breakthrough technology we have to thank for AI tools like ChatGPT because they work particularly well when understanding language and complex data patterns, which are the basis for a huge array of applications that AI is now put to.

The authors of the white paper "Attention is all you need"

We've got Ashish, Noam, Niki, Jakob, Llion, Aidan, Lukasz, Illia to thank for writing "Attention is all you need" and bringing transformers to the tech community. (IMAGE CREDIT: FINANCIAL TIMES)

So what happened once “Attention is All You Need” got into the hands of the tech community?

Keep in mind that what the paper proposed is likely to become comparably as revolutionary as the combustion engine to the next generation.

Once the tech community got its head around the concept there were a series of light bulb moments resulting in start-ups being founded and staggeringly large rounds of investment being made. All fairly quietly.

It only took four years for OpenAI to take the concept of transformers from a scientific paper to launch ChatGPT which is now generating $2bn of annual revenue.

Transformers: Robots who crave attention

Let’s recap: in 2017 transformers burst onto the AI scene and it took tech people a few years to prove the potential.

The big advantage transformers brought is easiest to see in an AI model like ChatGPT. It’s the ability to process everything you write at once, rather than a word at a time.

Remember the title of that Google paper “Attention is All You Need”? Well, the “Attention” bit of that is the ability of transformers to focus on the most important parts of what you’ve written. That is called “The Attention Mechanism” and it means that the AI can make sense of what you’ve written in the same way that we humans can take in a scene at a glance.

This is a game-changer compared to the old AI models, like neural networks, which looked through data one step at a time, often missing the point of what it was looking at because it wasn’t aware of which bits were the important ones.

Transformers, on the other hand, are like Swiss Army knives in the AI toolkit, good at everything from understanding and generating complex language to recognising the nuances in images.

ChatGPT's transformers help it see the content and context of your entire prompt all at once, which is mostly why its responses are so much better than any older AI model could manage.

Your brain is different, but a little bit like generative AI

The truth is that transformers are not just better; they're in a league of their own, revolutionising tasks across the board with their ability understand the data given to them and their unparalleled versatility and speed.

Now it’s time to understand how transformers actually work.

As a side note, I think the reason people find the concepts behind generative AI hard to grasp is that they work so differently from how we perceive our own human way of generating ideas and information.

When we approach problem solving we’ve been taught to break complex problems down into smaller parts and deal with them one at a time – i.e. sequentially. When ideating in groups we take it in turns to speak and when brainstorming alone, most of us use an inner monologue – one word at a time.

But generative AI works on everything at once – it’s like an ideation session where everyone shouts ideas at once while also simultaneously being able to understand everyone else in the room instantly and focus on the best ideas immediately.

That’s a long way from how our human brains work when problem solving. But interestingly you’re not that different from a generative AI in some ways.

Us humans are, in fact, capable of parallel problem-solving, though the way our brains process parallel tasks is quite different from the parallel processing capabilities of a transformer in AI.

Human brains are much more complex than the most complex AI and can handle multiple tasks simultaneously. However, what we often think of as multitasking is actually rapid task-switching, as the brain focuses on one task at a time but switches between them quickly.

But in some aspects, our brain processes sensory information genuinely in parallel. For instance, we can simultaneously process visual information, sounds, and tactile sensations. Our brains can also integrate information from different sources to quickly form a coherent picture of what's happening around us, which is a form of parallel processing.

Before you let ChatGPT make you feel inferior, remember that your brain is optimised for deep sequential processing, which means you are capable of rich, nuanced understanding and creativity. In contrast, transformers, despite their fantastic parallel processing abilities, do not possess consciousness or understanding in the way you do; they just simulate understanding by processing large amounts of data at impressive speeds.

An illustration of a brain with vivid colours and shapes emerging from it

Your brain is still better than any AI model. It's optimised for deep sequential processing, which means you are capable of rich, nuanced understanding and creativity.

The crux: how a transformer actually works

So now you know that transformers are responsible for the paradigm shift in AI by altering the very fabric of data processing.

At the heart of this innovation lies their dual-component structure; the encoders and decoders. This design is crucial for understanding and generating responses that are contextually rich and nuanced.

Let me explain…

The encoders in a transformer analyse the input data. They function like an attentive committee, each member considering the data from a different angle, and together they build a comprehensive picture. This is where the parallel processing comes into play, enabling the encoder to process various parts of the data at the same time, unlike older AI models that would trudge through data a single item at a time.

An illustration of a robot driving an early automobile in a sepia 1800s style picture

Transformers are likely to become comparably as revolutionary as the combustion engine to the next generation.

Once the encoders have done their part, the job is passed to the decoders.

The decoder's job is to take this richly processed information and craft a response. To do this it generates responses by predicting the next word in a sequence given the context of all the previous words. To do that it uses patterns it learned by analysing the huge amount of online data it consumed while training.

Decoders also use the attention mechanism, but with a slight twist. They not only look at the encoders' output but also at what they have generated so far. This means they are refining each piece of the response using what came before, resulting in responses that make sense because they have a logical flow and relate closely to the outputs (let’s say words) that came before.

So to recap, let’s use everyone’s favourite AI, ChatGPT as an example. When you enter a prompt, the transformer looks at everything you wrote at once (parallel processing) and it gives attention to the most important parts (the attention mechanism). Now it’s ‘understood’ your prompt, it passes that understanding on to a decoder. The decoder crafts ChatGPT’s response you your prompt. It does that by selecting the next most likely word to come next based on the data it has from its training phase.

So, using this intricate setup of encoders and decoders, transformers manage to capture the subtleties of language and data patterns, generating responses that feel remarkably sensible and context-aware.

The result is an AI that can converse, translate, and interpret with a level of sophistication that previous models could not achieve, making transformers a giant leap forward in AI.

How Transformers have revolutionised AI

Transformers: Robots who crave attention

Your brain is different, but a little bit like generative AI

The crux: how a transformer actually works

Are you looking for an experienced games studio?

Are you a game changer?

More knowledge...

GAMES DESIGN | AI

Custom GPTs for Game Design

Games Design

Designing With Kids, For Kids