AI storytelling: Crafting Adventures on the fly
By John Robinson | June 7, 2024
calendar_month
Built in June 2024
schedule
1 Week
build_circle
OpenAI, Open AI Assistants API, DALL-E, Flask, Python
OpenAI, Open AI Assistants API, DALL-E, Flask, Python
This post is part of a series where we document our journey to build 1 product per week, in an effort to explore what’s possible with AI.
This post is part of a series where we document our journey to build 1 product per week, in an effort to explore what’s possible with AI.
This post is part of a series where we document our journey to build 1 product per week, in an effort to explore what’s possible with AI.
The idea
This week we explored using generative AI for a location-based augmented reality game - think generative AI storytelling meets Pokemon Go. Imagine walking down the street, where every step takes you into an AI-generated fantasy story. Along your journey, you make decisions that influence how the next part of the story unfolds. Your decision could take you down a new path, get you into a conversation with an NPC or battle an enemy to progress your quest. Read on to find out how we built it and what we learned.
Why this idea?
AI storytelling was an exploratory project for our team to learn more about building with AI APIs. For this project, our idea was to see if layering generative AI stories on top of walking game mechanics would produce a better experience.
We were especially interested to learn:
How good GPT-4o and DALL-E would be at consistent multi-prompt story telling
Whether Text to speech (TTS) could show enough range to be an engaging storyteller
What drives costs when using TTS, image and text generation at the same time
How it works:
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
It starts with an AI-generated storyline, with images generated by DALL-E
An assistant call to GPT-4o generates the story, then another call to the TTS API streams the response.
Your choices impact the next story segment
Sometimes you encounter an enemy and have to battle to progress
Sometimes you win, sometimes you die
What we learned:
In one week, our team built a functioning story telling app. But honestly, the experience with the storytelling left a lot to be desired. Here’s what we learned:
AI generated stories weren’t really that interesting. We’d rather listen to a podcast.
We’re still having a hard time getting AI to give us consistent output or just follow basic instructions. (e.g. give us a 200 word story segment)
DALL-E in particular doesn’t listen to strict guidance. We frequently had to design our UI around the inconsistencies we found (e.g. DALL-E can’t do basic things like consistently center our characters in a generated image)
We did get one consistency win though by using Threads in the Assistants API. Story segments often referenced things that happened earlier in the story like a previous encounter with an NPC. That was cool to see.
Finally, we learned that running this would be an expensive business. Just some light testing from running 20+ 5 minute quests ran up a bill of $10 across TTS, image and text generation.
So lots of learning this week from our team. The struggle with consistent AI outputs continues, but the consistency with threads is a promising tool to experiment more with.
Want to see more projects like this? We have a new project to share every week.