AI storytelling: Crafting Adventures on the fly

By John Robinson | June 7, 2024

calendar_month

Built in June 2024

schedule

1 Week

build_circle

OpenAI, Open AI Assistants API, DALL-E, Flask, Python

This post is part of a series where we document our journey to build 1 product per week, in an effort to explore what’s possible with AI.

The idea

This week we explored using generative AI for a location-based augmented reality game - think generative AI storytelling meets Pokemon Go. Imagine walking down the street, where every step takes you into an AI-generated fantasy story. Along your journey, you make decisions that influence how the next part of the story unfolds. Your decision could take you down a new path, get you into a conversation with an NPC or battle an enemy to progress your quest. Read on to find out how we built it and what we learned.

Why this idea?

AI storytelling was an exploratory project for our team to learn more about building with AI APIs. For this project, our idea was to see if layering generative AI stories on top of walking game mechanics would produce a better experience.

We were especially interested to learn:

How good GPT-4o and DALL-E would be at consistent multi-prompt story telling

Whether Text to speech (TTS) could show enough range to be an engaging storyteller

What drives costs when using TTS, image and text generation at the same time

How it works:

1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die
1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die
1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die
1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die

1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die
1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die
1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die
1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die

1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die
1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die
1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die
1. It starts with an AI-generated storyline, with images generated by DALL-E
1. An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
1. Your choices impact the next story segment
1. Sometimes you encounter an enemy and have to battle to progress
1. Sometimes you win, sometimes you die

What we learned:

In one week, our team built a functioning story telling app. But honestly, the experience with the storytelling left a lot to be desired. Here’s what we learned:

AI generated stories weren’t really that interesting. We’d rather listen to a podcast.

We’re still having a hard time getting AI to give us consistent output or just follow basic instructions. (e.g. give us a 200 word story segment)

DALL-E in particular doesn’t listen to strict guidance. We frequently had to design our UI around the inconsistencies we found (e.g. DALL-E can’t do basic things like consistently center our characters in a generated image)

We did get one consistency win though by using Threads in the Assistants API. Story segments often referenced things that happened earlier in the story like a previous encounter with an NPC. That was cool to see.

Finally, we learned that running this would be an expensive business. Just some light testing from running 20+ 5 minute quests ran up a bill of $10 across TTS, image and text generation.

So lots of learning this week from our team. The struggle with consistent AI outputs continues, but the consistency with threads is a promising tool to experiment more with.

Want to see more projects like this? We have a new project to share every week.

NPC Generator for Dungeons and Dragons

arrow_circle_left

Gov’r - Modernize Citizen Engagement

arrow_circle_right

We’re a small team of Canadian engineers, designers and product folks who previously built TunnelBear VPN. Today we run a product studio, where we build products that fall somewhere in between slightly useful and stuff that makes us laugh.