What is chatgpt?

Everyone is talking about ChatGPT these days. Everyone is asking about, how it works? how the magic happens? These curious questions are being asked by lot of non-technical background peoples around me. I will try to explain here how ChatGPT works without going into technical details.

Since the release of ChatGPT from last year (Nov 2022), it has been wildly successful launch, that forces even tech giant like Google to rush to release of its version of GPT called Google Bard.

History of Chat GPT evolution:

Let’s start with timeline of evolution of GPT and different versions of language transformer models.

ChatGPT launched by OpenAI, founded in 2015 for the purpose of Research in field of AI. Here is the brief history of evolution of GPT (Generative Pre-Trained Transformer).

As you can see from above timelines, that GPT evolved from transformer architecture and gained its abilities through many iterations.

How it works:

What ChatGPT does, it’s always fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far, by “reasonable” we mean “what one might expect someone to write after seeing what people have written on billions of webpages, etc.”

A language model uses machine learning to conduct a probability distribution over words used to predict the most likely next word in a sentence based on the previous entry.

Language models learn from library of text (called corpus) and predict words or sequences of words with probabilistic distributions, i.e. how likely a word or sequence can occur.

So, let’s say we’ve got the text “AI has the ability to”. Imagine scanning billions of pages of human-written text (say on the web and in digitized books) and finding all instances of this text — then seeing what word comes next what fraction of the time. ChatGPT effectively does something like this, except that, it doesn’t look at literal text; it looks for things that in a certain sense “match in meaning”. But the end result is that it produces a ranked list of words that might follow, together with “probabilities”.

SentenceNext Word..Probability
AI has ability toPredict0.5
Learn0.4
Image Recognition0.3
Process Natural Language0.2

And the remarkable thing is that when ChatGPT does something like write an essay, what it’s essentially doing is just asking over and over again “given the text so far, what should the next word be?” — and each time adding a word.

But, OK, at each step it gets a list of words with probabilities. But which one should it actually pick to add to the essay (or whatever) that it’s writing? One might think it should be the “highest-ranked” word (i.e. the one to which the highest “probability” was assigned). But this is where a bit of magic begins. If we always pick the highest-ranked word, we’ll typically get a very “flat” essay, that never seems to “show any creativity” (and even sometimes repeats word for word). But if sometimes (at random) we pick lower-ranked words, we get a “more interesting” essay.

The fact that there’s randomness here means that if we use the same prompt multiple times, we’re likely to get different essays each time. And, in keeping with the idea of magic, there’s a particular so-called “temperature” parameter that determines how often lower-ranked words will be used, and for essay generation, one has to try with different temperature value, it turns out that a “temperature” of 0.5 seems best. (It’s worth emphasizing that there’s no “theory” being used here; it’s just a matter of what’s been found to work in practice.

I hope above explanation helps to understand the evolution of ChatGPT and how it works without too much of technical details.