Skip to main content

About Kolokwa

Kolokwa is a crowdsourced voice collection project dedicated to building the first open dataset for Liberian English. Our goal is to enable AI systems -- speech recognition, voice assistants, translation tools -- to understand the way Liberians actually speak.

The Problem

Current AI voice technology is trained primarily on American and British English. When Liberians speak naturally -- using Kolokwa, our everyday English with its unique vocabulary, grammar, and pronunciation -- these systems fail. Try telling Siri "Da my own pekin" or ask Alexa "Wha kinda ting da?" -- they have no idea what you're saying. TV remotes with voice control? Useless for Kolokwa speakers. Transcription tools garble everything. These systems weren't built for the way we talk.

Our Solution

We're building a high-quality, open-source voice dataset by asking Liberians around the world to record themselves speaking common Kolokwa phrases. Each recording is paired with its English translation, creating training data that AI systems can learn from.

More Than Just an Accent

Teaching AI to understand Kolokwa is a two-part challenge. Half of it is the accent -- AI needs to hear how we actually pronounce words so it can recognize them. But the other half is just as important: understanding what we actually mean.

Some Kolokwa phrases are straightforward -- they mean exactly what they say, just spoken with our accent and vocabulary. "I want water" is "I want water." "Open the door" is "Open the door." The AI just needs to learn how those words sound when a Liberian says them.

But other phrases carry a different meaning beneath the surface. "I coming just now" doesn't mean you're arriving right this second -- it means you'll be there soon. "The man can talk-oh" isn't a compliment about eloquence -- it means the person is deceitful. "Leh go small small" isn't about distance -- it means take things slowly. An AI that only hears the words but misses the meaning is still failing us.

That's why our dataset includes both types of phrases, and why contributors who submit their own phrases should feel free to add both kinds -- everyday sentences that are straight up, and expressions where the real meaning lives between the lines. Both are essential for building AI that truly understands Kolokwa.

What Your Recordings Unlock

Every recording gets us closer to real milestones. At 1,000 recordings we can start testing with Whisper speech recognition. At 5,000 we can train a basic Kolokwa speech model. At 10,000 we can fine-tune a production-quality model that actually understands Liberian English. Your few minutes of recording time compound into something that serves millions.

How It Works

  1. You see a Kolokwa phrase with its English translation.
  2. You record yourself saying the phrase naturally.
  3. Your recording is reviewed and added to the dataset.
  4. Researchers and developers use the dataset to train AI models.

Open Data

The Kolokwa dataset will be released as an open-source resource available to researchers, developers, and organizations working on language technology. Recordings may also be licensed commercially to fund the project's mission. We believe that language data should be a public good, especially for underrepresented languages, and commercial partnerships help sustain that mission.

Privacy First

We take your privacy seriously. Recordings are associated with anonymous session identifiers, not your personal identity. You can request deletion of your data at any time. Read our privacy policy for full details.

Get Involved

Whether you're in Monrovia or Minnesota, if you speak Kolokwa, we need your voice. Every recording helps, and it only takes a few minutes to make a difference.