Converting a Toolbox lexical database to LKB format
Summary
Note: Update in progress as of October / November 2012. Hannes Hirzel
The LKB system (Linguistic Knowledge Builder) is a grammar and lexicon development environment for unification-based linguistic formalisms. LKB is focused on the use of HPSG. This page contains a description and the program to convert a lexicon database made with Toolbox to lexicon format needed by LKB. The scripts were developed by Hannes Hirzel.
A presentation in Trondheim, 2005 (File:Toolbox-LKB-Link-slides - version 4.pdf) shows how this was applied to a lexicon file of the Ga language edited by Mary E. Kropp Dakubu.
The scripting language used is called 'Consistent changes' and built into the Toolbox program. You may run the program from within Toolbox. To do so make the lexicon file the active windows and then choose the
- 'File' menu,
- 'Export'
- 'TBox-LKB-Step1'.
This runs all processing steps. The result is a lexicon file in LKB format.
A working portable setup is available from the author on request. However the information and files to recreate the setup is included on this page. You might need to adapt it to your particular lexicon file.
Implementation
Setup
The files which belong to a Toolbox project may be kept all in the same folder. The following screen shot shows the setup how Toolbox has to be setup to produce an LKB TDL lexicon file. Marked green are the six 'consistent changes' script files. They include a conversion from an 8 bit font to Unicode for the particular setup used for the Ga language as of 2005. As of 2012 most lexicons use a Unicode font so these steps might be left out or adapted. The LKB lexicon is the result of the sixth step marked in red.
Each of the steps of the 'consistent changes' process chain must be defined. The screen shot shows the last dialog. It contains input fields for
- input file
- 'consistent changes' script
- output file
Download
The following folder File:Toolbox Project Ga.zip contains the standard files produced by the utility program 'Toolbox New Project Package 1.5.8 from http://www'.sil.org/computing/toolbox/downloads.htm .
The file 'Dictionary.txt' has been replaced
by the Ga lexicon
created by Mary Ester Dakubu (MED), University of Ghana.
There is a folder 'Tbox2LKB-conv-scripts' which has a copy of the
the cct files of the folder
2005-05-31Ga-for-LKB-Uni-Trondheim-11a
These cct files are used to convert the Ga lexicon which is in SFM (Toolbox format) to the format LKB (Linguistic Knowledge Builder) needs.
To run the conversion do the following steps
- Make the dictionary window the 'active window' by clicking on the title bar
- Choose menu 'File' / 'Export'
- Select 'TBox-LKB Step1'
- Click 'OK'.
- A new file 'LKBlexicon.tdl' is created.
This folder has been posted to this web site [www.typecraft.org]
by permission.
License
The presentation and this wiki page are licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. The script code is under the MIT license.