Jump to: navigation, search

Converting a Toolbox lexical database to LKB format

Summary

The LKB system (Linguistic Knowledge Builder) is a grammar and lexicon development environment for unification-based linguistic formalisms. LKB is focused on the use of HPSG. This page contains a description and the program to convert a lexicon database made with Toolbox to the lexicon format needed by LKB. The scripts were developed by Hannes Hirzel.

A presentation given in Trondheim in 2005 (File:Toolbox-LKB-Link-slides - version 4.pdf) shows how this may be applied to a lexicon file of the Ga language. The dictionary file was created by Mary E. Kropp Dakubu.

The scripting language used for the conversion is called 'Consistent changes' and built into the Toolbox program.

For the working portable setup see the download section on this page. The setup might need some adaptation for the needs of other languages. All files to do so are text files which you may change, see license section.


Status

Updated 9th November 2012. Hannes Hirzel. In entries with multiple senses only the last entry is converted. This needs to be fixed. Contact dictionaries_gillbt@gillbt.org

Download

The following folder File:Toolbox Project Ga.zip contains the standard files produced by the utility program 'Toolbox New Project Package 1.5.8 from http://www'.sil.org/computing/toolbox/downloads.htm .


The file 'Dictionary.txt' has been replaced by the Ga lexicon created by Mary Ester Dakubu (MED), University of Ghana.

This folder has been posted to this web site [www.typecraft.org] by permission.

NOTE 5th Dec 2012: It does not contain the correct lexicon file. The lexicon file needs to have only one sense per entry.

How to start Toolbox

The folder 'Settings' contains the Toolbox exe file. Double click on it to start it.

Error creating thumbnail: Unable to save thumbnail to destination

How to create the LKB tdl file

You may run the conversion program from within Toolbox.

To run the conversion do the following steps

  1. Make the dictionary window the 'active window' by clicking on the title bar
  2. Choose menu 'File' / 'Export'
  3. Select 'TBox-LKB Step1'
  4. Click 'OK'.
  5. A new file 'LKBlexicon.tdl' is created.
Error creating thumbnail: Unable to save thumbnail to destination

Examples

The examples need to be updated to reflect the restriction on the lexicon input file.

lɔ / sneak

Error creating thumbnail: Unable to save thumbnail to destination

is converted to

   lO_2 := verb-lexeme &
   [STEM <"lO">,
   PHON <"lO">,
   ENGL-GLOSS <"sneak", "">,
   SYNSEM.LKEYS.KEYREL.PRED "_lO_v_rel"].

lɔŋ / raffia_palm

   \lx lɔŋ
   \ph lɔ̀ŋ̀
   \ps n
   \sn 1
   \ge raffia_palm
   \xv lɔŋ tso
   \sn 2 
   \ge fibre,_raffia
   \de the fibre of the raffia palm, used for sewing sacks and weaving mats. 
   \xv lɔŋ kɛ abui 
   \xe thread and needle; close association (fig.).
   \et PGD *lɔ-
   \dt 12/Apr/2007


is converted to

   lOG := noun-lexeme &
   [STEM <"lOG">,
   PHON <"lOG">,
   ENGL-GLOSS <"fibre,_raffia", "">,
   SYNSEM.LKEYS.KEYREL.PRED "_lOG_n_rel"].

Only the second sense is given. This needs to be fixed.

fee / make

   \lx fee
   \hm 2 
   \ph fèê, fèé, !fé 
   \ps verb
   \sn 1 
   \ge make
   \de make, do, perform
   \sl1 v
   \sl2 tr
   \sl4 suAg_obTh
   \sl6 CREATION
   \xv E-fee flɔɔ, samala
   \xg 3S.AOR-make stew
   \xe she made stew, soap.

is converted to

  .....

Implementation of the conversion

There is a folder 'Tbox2LKB-conv-scripts' which has a copy of the the cct files of the folder 2005-05-31Ga-for-LKB-Uni-Trondheim-11a mentioned in the presentation of 2005.

These cct files are used to convert the Ga lexicon which is in SFM (Toolbox format) to the format LKB (Linguistic Knowledge Builder) needs.


The Ga alphabet contains the additional characters

  • ɛ
  • ŋ
  • ɔ

They are converted to

  • E
  • G
  • O

This conversion is defined in the file 'Step1-Unicode.cct'. It converts Unicode to plain ASCII combinations. In case the LKB processor used can cope with certain forms of Unicode this file has to be adapted. This means that some conversions just have to be deleted.

License

The presentation and this wiki page are licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. The script code (program code) is under the MIT license.

License for data (dictionary file): to be determined; contact medakubu@gmail.com