GCIDE to SQL
Generate HTML-useful entries from the Collaborative International Dictionary of English (GCIDE) project — including Webster's Revised Unabridged Dictionary (1913 + 1828).
This bash script takes the raw ASCII GCIDE files, updates the non-standard encoded characters to UTF-8, modifies some tags, and then outputs the entries into SQL (PostgreSQL).
First clone the GCIDE repo into the project folder:
git clone git://git.savannah.gnu.org/gcide.git
Then run the build file:
Once done you can just import the SQL into your Postgres table;
psql -h localhost -U postgres avuncular < CIDE.A-Z.sql
If you get an error about file encoding you can run
iconv -t utf-8 -c CIDE.A-Z.sql > CIDE.A-Z.utf8.sql to remove any non utf-8 characters.
There is also a script to run all the definitions through espeak to generate a pronunciation for each. This generates a second sql file which can be run on the same database generated by the build script. It must be run after the CIDE.A-Z.sql is inserted into a database since
espeak.sql run update commands based on the definition ids.
As with the GCIDE project, this project is licensed under the GNU General Public License as published by the Free Software Foundation; you can redistribute it and/or modify it under the terms of the GPL.
Lookup file partially based on table from WebsterParser