In the near future I intend to make some of the software and data sources from Enigma available on this page, for now I have just added some lemma/inflection mapping tables.
These lemma/inflection tables were derived from CUVPlus lexicon which is freely available from the Oxford Text Archive. I extracted them using a guess/check algorithm (described in my thesis) which provided a candidate list of lemmas for each word in the lexicon and then used the lexicon to filter them. For example, the input putting/VVG must be reduced to a base lexical verb form, so the system tries removing the suffix to form putt/VVB and removing the suffix and splitting the double consonant to form put/VVB. In this case both lemmas are in the lexicon and so both are mapped to putting/VVG. Given the input letting/VVG the same algorithm returns lett/VVB and let/VVB but only the latter is listed and so only this mapping is added. I constructed a large list of exceptional cases and irreducible forms using a mixture of heuristic algorithms and hand annotation. The exceptions are included in the main lemma/inflection mapping files and the irreducible forms are contained in a separate file. Details of the formatting are in a README file in the zip.
This file contains the keyword terms and title information from the TEI header for each file in the BNC World Edition. It is encoded as a tab-delimited text file, the first item is the filename, the second a comma delimited list of keywords and the last is the title. The file is useful if you want to cut subcorpora from the BNC for analysis. Right-click the link below to download, or follow the link to view the file in your browser.