AI and the Third Age of Human-Computer Interaction
The past, present, and future of how we interact with machines
The First Age: code
The history of computers stretches back to 1937. As we invented these new machines, we needed to create a language to communicate what actions or tasks we wanted them to complete. A "code".
The earliest form of code was punched cards fed into a computer to execute a command.
To put things in perspective, 62,500 punched cards were the equivalent of 5 MB of data:
This “code” evolved when MIT researchers created the first keyboard, the flexowriter, in 1956. It’s a shame we didn't keep that name because it sounds way more badass than "keyboard".
Even with a keyboard, we were still talking to machines using code. This time it was on a command line. If I told you to imagine the first computers, you'd start visualizing a monochromatic, green-on-black screen with incomprehensible commands and code.
The Second Age: point-and-click
In the late 1970s, researchers at Xerox Palo Alto Research Center (PARC) were imagining an intermediary to bridge the gap between human and machine language. They created the first computer with a graphical user interface (GUI): the Xerox Alto.
Inspired by how we interacted with objects in the physical world, they created the digital equivalent: menus, buttons, desktops with files and icons, draggable objects you can move from one place to another, and typing in plain language.
We didn't have to communicate in an obscure code anymore. We had applications with buttons, menus, text boxes, and even tutorials.
Humans and machines still spoke different languages, but the GUI became a sort of translator of human actions into computer commands, which made them much more accessible for non-programmers.
But the graphical user interface was still locked away in a lab, inaccessible to the general public. Until Steve Jobs came to visit PARC in 1981.
Jobs made an odd proposal. He offered Xerox 100,000 shares of Apple in return for a detailed tour of PARC and their current projects. During his visit, Jobs was shown the Xero Alto. He became fascinated with the GUI and mouse that made the Alto easier to use compared to other computers at the time.
The Alto was the inspiration behind the Apple Lisa which became the first publicly-available personal computer with a graphical user interface. It wasn't a success, but it represented an inflection point. A new paradigm was created.
Since then, we've designed interfaces to help people get the most out of computers, phones, and other machines for work, education, art, entertainment, and every other aspect of our lives. And they kept getting better and more sophisticated over time. Command lines and code didn't disappear, but point-and-click interfaces became the dominant way of interacting with computers.
Is it all about to change again?
The Third Wave: natural language
ChatGPT's overwhelming success is a signal of change.
Yes, conversational interfaces and chatbots existed long before ChatGPT came out. But they mostly sucked. They were pre-programmed and limited in their responses. Most of the time, they didn't understand us. Almost every time, we were eagerly asking them to redirect us to a human.
ChatGPT was different. It felt smart. It understood us. We could give it unstructured data and instructions, and it still returns an answer. Even when it fails, it does so gracefully. Using it makes you feel like Harry Potter holding the Elder Wand (or Voldemort, if you try to turn it into DAN).
ChatGPT represents another inflection point in how we interact with machines:
- In the First Age, we used code to speak to machines.
- In the Second Age, we used interfaces as an intermediary between human and machine language.
- In the Third Age, we're using natural language because machines finally understand us.
Advances in LLMs like GPT-4 mean that machines don’t need structured data and inputs as much as they did before.
In the Second Age, we had to define the rules and bounds of interaction: you had to type your search query in a specific box, set a price range using a slider, and select filters from a pre-defined list of parameters.
The “boundless” nature of the Third Age has its drawbacks (as I’ll discuss further below), but the numbers speak for themselves: ChatGPT is the fastest-growing consumer product in history. It only took 2 months to reach 100 million users.
I've heard many people, including experts, downplay ChatGPT as not being innovative and that "all it does" is predict the next best word. Maybe that's true from a technical perspective, but who cares? The experience matters more than the technology from a user's standpoint.
Everyone is embracing the Third Age
ChatGPT has created a new standard for how people want to interact with technology.
Expectations on what creates a good user experience are changing. Every company was licking its lips at ChatGPT's meteoric rise, and they've all hopped on the train and announced their own AI capabilities. Even staple tech products, like Google Search and Microsoft Office, are getting makeovers.
Other companies are following suit too:
- Duolingo can now role-play as a barista in a foreign country so you can practice your French.
- Khan Academy is introducing an AI-powered tutor.
- OpenAI announced ChatGPT plugins with apps like Expedia and Instacart.
Instead of getting lost in the menus, memorizing keyboard shortcuts, and bouncing between different applications, you can type what you need to be done in a single box. In plain English. Or any of the 26 languages GPT-4 is fluent in.
I've been a UX Designer for 6 years, and I expect this trend to become the standard method of interaction between humans and machines. Yes, we've developed standards, principles, and best practices to create intuitive interfaces in the Second Age. But is anything more intuitive than writing what you want to accomplish in plain language and having a computer "magically" do it all for you?
Let’s examine a couple of examples.
Example #1: Instacart
I’ve used Instacart countless times during the pandemic to order groceries. I don’t have any major complaints about the app. It had everything I needed to find and order my groceries every week.
But you’ll start to notice inefficiencies in the process once you pause and take a step back. If you were a fly on the wall, you would probably see me do something like this:
- Plan my meals for the week
- Write down all the ingredients I need in my notes
- Open the Instacart app
- Search for each item individually
- Find the item I want by scrolling, filtering, and sorting
- Add the item to my cart
- Adjust the quantity if needed
- Repeat steps 4-7 until I added everything I needed (each order had 20 items on average)
- Pick a delivery slot after consulting my personal calendar to check when I’ll be home
In the Third Age, I would use the Instacart ChatGPT plugin instead:
- I can copy and paste that list of ingredients from my Notes app and have it fill up my cart automatically.
- I could even take a picture of a list I wrote by hand, because multimodal language models that accept text and images as inputs will become available soon.
- The list I write will have the quantity I want for each item and even the brand.
- I could even add instructions for it to pick the healthiest or cheapest option for each item, or a delivery time during which I’ll be home.
Which process sounds faster and easier? Probably the second. By far.
Example #2: Planning a vacation
Like most people, I love travelling and exploring new places. But I hate planning trips.
I recently had to plan my trip to Paris, and the biggest challenge was optimizingmultiple variables:
- Finding the cheapest round-trip flight for a specific number of days
- Finding an Airbnb or hotel for those days within my budget, within close access to public transit, and any other conditions I need
- Book restaurants, events, and tours for those days
I was constantly bouncing between Expedia, Airbnb, OpenTable, and other booking websites to plan my trip. It didn’t take long for me to have 32 tabs open to keep track of all the options I explored.
I was overwhelmed. So I closed my laptop and procrastinated the planning as much as possible. A computer would’ve done a much better job than me at optimizing these variables and presenting a range of solutions if I just wrote my requirements/conditions in a list.
This new shift redefines the relationship between humans and computers. They’re not just “tools”. They’re personalized, product-specific assistants that work for us. Just write down everything you need, and they’ll get most of it done for you. A true white-glove experience.
The problems with AI-powered interfaces
Third Age conversational interfaces still have design gaps that need to be solved.
#1: A gap in communication still exists
Remember the “bounds” we spoke about with Second Age interfaces? They’re important. They guide us on what information to include and in what format. It’s like going to a bowling alley and having the bumpers up. No matter how you swing the bowling ball, it’s gonna hit a pin and land where it’s supposed to.
The conversational interfaces of the Third Age eliminate the majority of these bounds. It’s like you took away the bumpers and made everyone wear a blindfold. You’re taking random shots in the dark and hoping one of them lands.
This is why the flurry of “prompt engineering” tips and guides has taken social media by storm. It indicates that there’s still a gap in communication between us and machines, even if we’re speaking the same language.
ChatGPT is not the best example of this because it’s not a specialized application. It can assist you in a variety of ways, from planning a trip to creating a marketing strategy.
This is where companies like Duolingo and Khan Academy can differentiate because they can design a unique and richer experience tailored to their specific products and services.
#2: Improving output quality
GPT-4 was trained and improved through a method known as Reinforcement Learning from Human Feedback, or RLHF for short.
In this process, GPT-4 outputs are reviewed and judged by humans. Depending on the quality of the output, it’ll receive a reward or a penalty. This helps “reinforce” desired outputs and improve the quality of future responses. Think of it as a parent teaching a child and rewarding them for good behaviour.
LLMs are improving, but are still prone to hallucinations, making reasoning errors, and going off the rails. It’s impossible to anticipate and solve all of these instances, and we keep discovering new triggers or prompts that lead to those scenarios.
The popularity of AI tools offers an opportunity to implement RLHF at an unprecedented scale. Instead of relying on a small group of employees to give feedback to your AI model, you can rely on users.
The advantage of this approach is collecting feedback from a variety of sources and perspectives, which could help reduce bias. It also allows companies to fine-tune their AI model for their product or service if they rely on general LLMs from OpenAI or Google.
Traditional interfaces won’t die
Like command lines and code, traditional point-and-click interfaces won’t die. Some of them will be replaced with a ChatGPT-like experience, but many will evolve to accommodate new AI-powered experiences.
This is where the idea of AI being your copilot is relevant: it’s there to help you when you need it, but you can still do things on your own.
Having a chat-only interface on a tool like PowerPoint would be frustrating. But having a chatbot integrated within the familiar PowerPoint experience gives you the flexibility of doing things on your own when you need to, and calling upon your copilot to assist me when needed.
I guess the first inkling of the Third Age was under our noses all along…