Amble
A real-time language tuition agent that teaches through conversation, so you can learn a language by living it.
Overview
- Run a real-time voice streaming pipeline across speech-to-text, an LLM, and text-to-speech, stitched together to feel like a conversation with a patient and endlessly adaptive tutor.
- Analyse lessons to evaluate fluency and feed this forward into a personalised, evolving curriculum.
- Compound a user’s interests, ambitions, proficiency, previous lessons and conversation history into a context pipeline that is assembled alongside every conversation.
Stack
- WebRTC
- Pipecat
- STT
- LLM
- TTS
- Mem0
- Modal
- Neon
In 2025, we worked with Page Nineteen to create a new approach to language learning called Amble. We helped take Amble from an initial idea to a running beta in around 3 months, and then spent around the same time again refining the product whilst helping to build out a world class engineering team.
Building the Amble beta was pivotal in realising our own identity as a studio because it made us realise just how big the gap was between the capabilities of frontier AI models and building tools that improve people’s lives. Engineering a real-time conversational pipeline is a challenge in itself, but shaping that into a natural, intuitive experience proved to be a very delicate balance of product research, design and engineering.

Language learning, designed around you
From the beginning, the goal for Amble wasn’t just to create a chatbot but to create a space where users could immerse themselves in culturally rich content they care about. Personalisation could occur through daily content (to read or listen to), conversational lessons with a realtime voice AI, and through retention (spaced, repetition learning that is tailored around a users history and progress).
Building from the ground up, we started by creating a voice agent with Pipecat that could stream audio in, pass it through STT (Speech-to-Text), LLM and TTS (Text-to-Speech), and give audio out again. Whilst end to end Speech-to-Speech models were available and growing in popularity, we chose to build our own in order to benefit from the increased control, observability and flexibility that comes with modular pipelines.
We then built out the infrastructure, and frontend, that would turn a voice agent into a language learning experience. On the backend this meant scaling compute instances, pre-warming the agent, and adding additional services for injecting memory, curriculum and lesson context that wouldn’t add any latency back in that we had worked so hard to remove in development. For the frontend it meant building a minimal and intuitive UI that encouraged natural conversation, using a WebRTC connection to stream audio in real time.

As the first few users hit the beta, we realised that language learning presented a unique set of challenges for realtime voice applications. Pausing mid sentence might be interpreted as finishing speaking, a bad pronunciation might be transcribed in a way that made the model look like it was hallucinating, or someone might decide to start their lesson during a noisy commute. Our agent had to account for all of this, and with audio being a continuous stream, no component in the pipeline could ever wait for another to fully complete.
Asynchronous engineering got us a long way with reducing latency and keeping things feeling relevant, but ultimately it was through re-examining fundamental assumptions that would take Amble from beta to polished product. In early versions, we automated turn taking, using turn-taking VAD model to detect when a user had finished speaking. But this would often confuse a learner pausing to think with one who had just finished their sentence, and instead of tuning the VAD thresholds the answer was actually to scrap turn taking entirely and move a to push-to-talk functionality. Likewise, we spent a lot of engineering resource early on finessing the users transcript in real time, but the decision to hide this in favour of more accurate processing and analysis would yield far better results for learners in the long run.

Amble sunset in June 2026 after helping over 10,000 users practice speaking in seven languages. If you’re interested in learning more about some of the thinking that went into Amble, Will’s essay is well worth the deep dive.
Ultimately, we reached the same conclusion - that although the conversation around AI is dominated by progress of the frontier models, the real work to be done is in developing our understanding of how people use them. Creating beautiful, useful, products requires a deep focus on how people think, what they need and where they struggle which will often counter our assumptions or go against the latest available technology.