Indian Names Corpus

Info:Indian names as parallel data (indic and roman spellings)
Author: Alok Parlikar
License:See individual corpus files for license information.


This directory contains a parallel corpus of Indian names. Indian names are presented in their Indic representation, as well as their traditionally used Roman spelling. You can use this corpus to obtain CMUDICT pronunciations for Indian name spellings.

File Format

Anything after the # character is ignored to be a comment. Blank or whitespace-only lines are ignored.

Every row contains two columns, separated by the TAB character. <roman_spelling> <indic_spelling>