The AfricanVoices corpus is a speech corpus containing datasets of aligned sentences and audio for 11 languages. We have uploaded data for 12 different languages in this website so far. We obtain the datasets in three ways:
- Create and record
- Align audio books from sources like Open.Bible , ALLFA project, LLSTI
- Obtain single speaker audio-text pairs from sources like Mozilla CommonVoice
Datasets
Data_id | Lang code | Language | Source | Speaker | No. of sentences | Hrs | MCD* | Quality | Download |
---|---|---|---|---|---|---|---|---|---|
luo_opb | luo | Dholuo | Open.Bible | Male | 11263 | 12.43 | 5.20 | Good | Available |
luo_afv | luo | Dholuo | AfricanVoices | Male | 1516 | 1.79 | 6.35 | Okay | Available |
lug_cmv | lug | Luganda | CommonVoice | Male | 2942 | 4.52 | 6.59 | Okay | Available |
en-ke_afv | en-ke | English (Kenyan) | AfricanVoices | Female | 596 | 0.56 | 4.99 | Good | Available |
sxb_afv | sxb | Suba | AfricanVoices | Male | 1178 | 1.70 | 4.80 | Good | Available |
sxb_bbi | sxb | Suba | Bible.is | Male | 8917 | 18.76 | 5.40 | Good | **Unavailable |
yor_opb | yor | Yoruba | Open.Bible | Male | 9275 | 18.04 | 4.61 | Good | Available |
kik_opb | kik | Kikuyu | Open.Bible | Male | 10877 | 18.04 | 5.19 | Good | Available |
wol_alf | wol | Wolof | ALFFA | Male | 1000 | 1.20 | 4.42 | Good | Available |
swa_lst | swa | Kiswahili | LLSTI | Male | 426 | 0.53 | 5.06 | Good | Available |
ibb_lst | ibb | Ibibio | LLSTI | Female | 125 | 0.32 | 4.91 | Good | Available |
hau_cmv_m | hau | Hausa | CommonVoice | Male | 518 | 0.62 | 6.95 | Okay | Available |
hau_cmv_f | hau | Hausa | CommonVoice | Female | 1938 | 2.30 | 6.29 | Okay | Available |
fon_alf | fon | Fon | ALFFA | Male | 542 | 0.33 | 6.10 | Okay | Available |
lin_opb | lin | Lingala | Open.Bible | Male | 12957 | 27.52 | 5.20 | Good | Available |