Learn about the Turing test—its history, how it works, and why it’s used—and how to conduct your own Turing test to discover more about today’s AI technology.
The Turing test refers to a thought experiment developed in 1950 by Alan Turing, a mathematician, computer scientist, and cryptanalyst, as a way to gauge a machine’s ability to generate human-like communication. Originally called “the imitation game,” the Turing test is a useful tool for studying a machine’s interactions with humans and reflecting on the definitions of “thinking” and “intelligence.”
As we’ll explore in more depth, the Turing test is still useful for learning more about artificial intelligence as it becomes increasingly integrated into our lives. The more we rely upon AI to make decisions, create economic opportunities, and advance society, the more important it is to understand AI’s capabilities.
Over the years, the Turing test has made its way into popular films that explore the relationship between humans and “intelligent” machines, including Blade Runner (1982), Ex Machina (2015), A.I. Artificial Intelligence (2001), and, of course, The Imitation Game (2014).
Keep reading to discover more about the Turing test and how you can use it to examine AI systems like ChatGPT.
When Alan Turing developed the test, his aim was to give people a tool for determining machines’ capabilities, particularly when it comes to natural language processing. Can machines actually think or exhibit intelligent behavior, or can they do only what humans have programmed them to do? And can machines mimic human-level intelligence through natural language such that their communications could be indistinguishable from humans?
More than 70 years later, the Turing test still serves these purposes and can provide us with a starting point for measuring AI’s human likeness, evaluating its capabilities, and facilitating AI research. With more insight into AI’s capabilities and limitations, developers can create more sophisticated systems that can perform vital functions in many areas of human life.
Now that we’ve reviewed the definition of the Turing test, its history, and why it’s used, let’s go deeper into how it works:
A Turing test has three participants:
A human judge (also called the interrogator) asks questions for a machine and a human to answer. The judge evaluates the responses from the machine and human to identify the responder.
A machine interlocutor, such as a generative AI system, answers the judge’s questions in natural language that simulates human conversation and behavior.
A human interlocutor who answers the judge’s questions alongside the machine and provides a baseline for comparison against the machine.
Asking the human and machine interlocutors questions allows these test participants to form written responses that the judge can evaluate and compare. The purpose is to find out if the machine’s answers can convince the judge that the human interlocutor produced them.
There is no official list of questions to pose to the human and machine during a Turing test. Asking the following types of questions, though, can help you tell the machine’s answers from the human’s because they require the interlocutor to generate thoughtful, context-rich, socially appropriate responses.
Open-ended questions like, “What’s a skill or talent you’d like to develop and why?”
Opinion questions like, “What is your perspective on technology and its impact on mental health?”
Emotional questions like, “What’s something from the past that you long for?”
Personal questions like, “What was it like to fall in love for the first time?”
Hypothetical scenarios like, “Imagine that you are a museum curator in the future. What artifacts of today would you display in the museum and why?”
Self-assessment questions like, “How do you think you performed on this test? How human-like are your answers to my questions?”
The Imitation Game is the official name for the Turing test, which Alan Turing first outlined in a seminal paper titled "Computing Machinery and Intelligence" (1950). The purpose of the test is to identify whether a machine exhibits human-like intelligence by convincingly responding to a series of questions asked by a human interrogator.
The test is called the Imitation Game because it's designed in such a way that the interrogator is intentionally unaware of whether they are conversing with a machine or a human. If the machine can essentially fool the average interrogator into believing that it's a person, then it's said to exhibit human intelligence.
For the test to provide valuable insight into machine intelligence, the human judge must not know which conversational partner is the machine and which is the human. To ensure concealment during a Turing test, the judge can communicate with both the human and machine through a computer interface that doesn’t supply any identifying information. That way, the interlocutors’ responses stand on their own, and the judge can evaluate them purely for their human-like communication style.
After a Q&A series between the judge and the two interlocutors, the judge then evaluates the interlocutors’ responses. Some of the criteria might include:
Creativity
Empathy
Natural language use
Ethical considerations
Relevance
If the machine can convince the human judge that it’s human, or if the human judge cannot distinguish between the human’s and machine’s responses, then the machine has passed the Turing test.
Now that ChatGPT has become widely used for so many tasks, one thing people wonder is whether it can pass a Turing test and communicate with the empathy, contextual awareness, and nuance of a human. Some sources suggest that ChatGPT has passed the test in individual instances, but there is no official word from OpenAI, the makers of ChatGPT, on the results of any official ChatGPT Turing test.
To this day, the Turing test is a valuable tool for learning more about AI. It does have some limitations, which are important to consider as we seek to understand and improve AI.
There’s no way for the test to determine whether a machine is truly intelligent in the sense that it actually understands the conversation in which it participates. The test only helps humans observe how well a machine can produce outputs that are close enough to human conversation so as to be indistinguishable.
The evaluation of the human judge will be subjective, based on their own understanding of how a human communicates. In some cases, the confederate effect may occur, which refers to instances when a human interlocutor is falsely identified as a machine.
Human judges may be limited in their knowledge that some test questions address. For instance, the sample question above—”What is your perspective on technology and its impact on mental health?”—may be outside the scope of the judge’s knowledge or experience, making it difficult for the judge to determine if the interlocutors provide sufficient answers.
The questions you select determine the kind of responses from both interlocutors and whether the responses can provide adequate insight into how human and machine communication compares. For example, if the questions focus mostly on uniquely human abilities like creativity or empathy, then the AI’s responses might expose it as non-human more readily.
Variations of the Turing test have been developed over the years with different objectives and potential outcomes.
Developed by Gary Marcus, a cognitive scientist and AI researcher, the Marcus Test evaluates an AI system’s ability to understand the meaning behind video content, including plot, humor, sarcasm, and more. To pass, an AI system needs to describe the video content like a human would.
Developed based on a theory by Ada Lovelace, this test examines whether AI can generate original ideas that exceed its training.
This test attempts to trick AI, as judge or interrogator, into believing a human is AI. To conduct this test, you’d need to use another AI system as an interlocutor alongside a human to answer the AI judge's questions. For the human to pass the test, the AI judge must identify the human interlocutor.
Developed by computer scientists Michael Barclay and Antony Galton, this test is performed to see if a machine can exhibit a human’s visual abilities, like identifying details in an image.
The CAPTCHA security measure used by many websites is a version of the Turing test. It requires a human to perform a task like identifying images or distorted text before accessing certain site information, whereas a bot cannot perform the task. CAPTCHA stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart.
Conducting your own Turing test can be a fun and educational experience. Through the process outlined below, you can learn more about AI systems and how they work, get hands-on experience with this important piece of tech history, connect with others who are curious about the Turing test, and reflect on the implications of AI for the future.
Here’s how to conduct your Turing test:
A text-to-text generative AI system is a good machine interlocutor for a Turing test, because it’s specifically designed to have text-based conversations. This means that you input written instructions, called a prompt, which the AI system then responds to with a text-based output. Popular generative AI systems that are available to the public include ChatGPT, Google Gemini, or Microsoft Copilot.
Read more: How To Write ChatGPT Prompts
Tip: Conduct Turing tests with several generative AI systems to compare their performance.
Once you’ve selected the generative AI system you want to test, you’ll need a human judge and human interlocutor. With you setting up the test, give some thought to your own role:
Human judge: Do you want to be the one asking questions and evaluating the answers you receive from the interlocutor?
Human interlocutor: Do you want to supply answers to the judge? In this case, you wouldn’t be the one evaluating test results as to whether the AI’s answers seem as human-like as yours.
Outside observer: Do you want to watch how the test plays out from a more holistic or objective perspective?
Before beginning the test, create the following settings:
The judge should converse with the human and machine interlocutors separately.
The judge should not know which interlocutor is contributing which response.
Ensure no interactions between interlocutors that could skew results.
The judge needs to interact with both interlocutors in the exact same way, including the questions posed and the duration of the interaction, to ensure a level playing field.
Test instructions include the questions the judge will be asking the interlocutors and guidance on how to interact with the human and machine.
Note: if you are participating in the test as a human interlocutor, it’s important that you recruit someone else to create and execute test instructions so that the test can be as impartial as possible.
Have the judge ask the human and machine questions and gather responses, following the instructions.
Once the test is complete and you have the responses, it’s time for the judge to evaluate how it went. In other words, did the generative AI system pass the test by communicating in a manner indistinguishable from humans? Can you or the judge tell the difference between the interlocutors?
In addition, think about ways to conduct a more effective Turing test in the future:
Asking more diverse questions
Recruiting more human interlocutors to provide more responses for comparison.
Recruiting more judges to weigh in on which interlocutors are human and which are machines.
Taking online courses can be a great way to learn more about AI and how humans use it. To get a solid introduction, consider IBM’s Introduction to Artificial Intelligence (AI) Course or DeepLearning.AI’s AI For Everyone Course. To deepen your knowledge of AI and explore its use in professional settings, consider the University of Pennsylvania’s AI For Business Specialization or the IBM Applied AI Professional Certificate.
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.