What Is GPT? GPT-3, GPT-4, and More Explained

Written by Jessica Schulze • Updated on

An overview and comparison of GPT models 1-4, Amazon’s GPT-55X, and more.

[Featured Image] Blue lines of binary code ripple across a black screen in waves.

In recent years, artificial intelligence (AI) has generated more than just content. It’s sparked debate, excitement, criticism, and innovation across a wide range of industries. One of the most notable and buzz-worthy AI technologies today is GPT, which is often incorrectly equated to ChatGPT.

In this article, you'll learn what GPT is, how it works, and what it’s used for. We’ll also compare and contrast different GPT models, starting with the original transformer and ending with today’s most recent and advanced entry in OpenAI’s catalog: GPT-4. 

What does GPT stand for?

GPT is an acronym that stands for "Generative Pre-trained Transformer" and refers to a family of large language models (LLMs) that can understand and generate text in natural language.

Let's break down the acronym:

  • Generative: Generative AI is a technology capable of producing content, such as text and imagery. 

  • Pre-trained: Pre-trained models are saved networks that have already been taught, using a large data set, to resolve a problem or accomplish a specific task.

  • Transformer: A transformer is a deep learning architecture that transforms an input into another type of output. 

Looking at the acronym above helps us remember what GPT does and how it works. GPT is a generative AI technology that has been previously trained to transform its inputs into a different type of outputs. 

Watch this video to learn more about what's involved in using a GPT model.

What is GPT?

GPT models are general-purpose language prediction models. In other words, they are computer programs that can analyze, extract, summarize, and otherwise use information to generate content.

One of the most famous use cases for GPT is ChatGPT, an artificial intelligence (AI) chatbot app based on the GPT-4 model (formerly based on GPT-3.5) that mimics natural conversation to answer questions and respond to prompts. GPT was developed by the AI research laboratory OpenAI in 2018. Since then, OpenAI has officially released three iterations of the GPT model: GPT-2, GPT-3, and GPT-4. 

Read more: Machine Learning Models: What They Are and How to Build Them

Large language models (LLMs)

The term large language model is used to describe any large-scale language model that was designed for tasks related to natural language processing (NLP). GPT models are a subclass of LLMs. 

Placeholder

GPT-1

GPT-1 is the first version of OpenAI’s language model. It followed Google’s 2017 paper Attention is All You Need, in which researchers introduced the first general transformer model. Google’s revolutionary transformer model serves as the framework for Google Search, Google Translate, autocomplete, and all large language models (LLMs), including Gemini and Chat-GPT. 

GPT-2

GPT-2 is the second transformer-based language model by OpenAI. It’s open-source, unsupervised, and trained on over 1.5 billion parameters. GPT-2 was designed specifically to predict and generate the next sequence of text to follow a given sentence. 

GPT-3

The third iteration of OpenAI’s GPT model is trained on 175 billion parameters, a sizable step up from its predecessor. It includes OpenAI texts such as Wikipedia entries as well as the open-source data set Common Crawl. Notably, GPT-3 can generate computer code and improve performance in niche areas of content creation, such as storytelling. 

Later versions of GPT-3 are known as GPT-3.5 and GPT-3.5 Turbo.

GPT-4

GPT-4 is the most recent model from OpenAI. It’s a large multimodal model (LMM), meaning it can parse image inputs as well as text. This iteration is the most advanced GPT model, exhibiting human-level performance across a variety of benchmarks in the professional and academic realm. For comparison, GPT-3.5 scored in the bottom 10 percent of test-takers in a simulated bar exam, while GPT-4 scored in the top 10 percent. 

Newer iterations of the GPT-4 model include GPT-4 Turbo, GPT-4o mini, and GPT-4o.

Amazon’s GPT55X

Amazon’s Generative Pre-trained Transformer 55X (GPT55X) is a language model based on OpenAI’s GPT architecture and enhanced by Amazon’s researchers. A few key aspects of GPT-55X include its vast amount of training data, ability to derive context dependencies and semantic relationships, and autoregressive nature (using past data to inform future data). 

Placeholder

How does GPT work?

Let's dive deeper into how generative pre-trained transformers work:

1. Neural networks and pre-training

GPTs are a type of neural network model. As a reminder, neural networks are AI algorithms that teach computers to process information like a human brain would. Pretraining involves training a neural network on a large data set, such as text from the internet. During this phase, the model learns to predict the next word in a sentence and gain an understanding of grammar and context.

2. Transformers and attention mechanisms

Transformers are based on attention mechanisms, a deep learning technique that simulates human attention by ranking and prioritizing input information by importance. Both in our brains and in machine learning models, attention mechanisms help us filter out irrelevant information that can distract us from the task at hand. They increase model efficiency by gleaning context and relevance from relationships between elements in data.

3. Contextual embeddings

GPT begins to capture the meaning of words based on their context. Contextual embeddings for a particular word generate dynamic representations that change according to surrounding words in a sentence.

4. Fine-tuning

After pretraining, GPT fine-tunes for specific jobs like writing an essay or answering questions and becomes more skilled at these.

For hands-on practice using ChatGPT, start with the one-hour course Use Generative AI as Your Thought Partner taught by Coursera CEO, Jeff Maggioncalda.

Placeholder

How to use GPT-3 and GPT-4

Despite the complexity of language models, their interfaces are relatively simple. If you’ve ever used ChatGPT, you’ll find the text-input, text-output interaction intuitive and easy to use. In fact, you can play around with GPT-4 via chat.openai.com as long as you have an OpenAI account. To train your own model or experiment with the GPT-4 application programming interface (API), you’ll need an OpenAI developer account (sign up here). After you’ve signed up and signed in, you’ll gain access to the Playground, a web-based sandbox you can use to experiment with the API. 

If you have a subscription to Chat-GPT Plus, you can access GPT-4o via chat.openai.com. Note that there is a usage cap that depends on demand and system performance.

How to use GPT-2 

GPT-2 is less user-friendly than its successors and requires a sizable amount of processing power. However, it is open-source and can be used in conjunction with free resources and tools such as Google Colab. To access the GPT-2 model, start with this GitHub repository. You’ll find a data set, release notes, information about drawbacks to be wary of, and experimentation topics Open-AI is interested in hearing about. 

Placeholder

Here are some additional resources to explore:

Build generative AI skills on Coursera 

Take a deeper dive into use cases, benefits, and risks of using the GPT model by enrolling in the intermediate-level online course, Generative Pre-trained Transformers (GPT). Or, learn how to harness the power of AI to revolutionize your productivity across Microsoft's ecosystem in the Microsoft Copilot: Your Everyday AI Companion Specialization.

Keep reading

Updated on
Written by:

Writer

Jessica is a technical writer who specializes in computer science and information technology. Equipp...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.