Classroom:LING2208 - Annotating Norwegian Bokmål/Agreement statistics
Agreement statistics
Massive editing of this section. It would be best to start a new page. --Dorothee Beermann 12:14, 24 February 2014 (UTC)
The following table describes the distribution of marked gender as glossed on adjectives, and the total distribution of tags for Norwegian Bokmål in TypeCraft. This is compared to the distribution of genders among nouns in the NoWaC corpus. The percentages in the first columns represent the ratio of each tag to the total for each count, (i.e: 56% of all nouns are tagged in NoWaC as masculine). The final column contains the compound ratio of the ratio of each gender in entries tagged with ADJ in TypeCraft and the ratio of each gender in entries tagged as nouns in NoWaC. This gives us an indication of whether some genders are more frequently glossed for adjectives than they naturally occur.
Gender | Adjectives | Total for all tags in TypeCraft | Total for nouns in NoWaC | Ratio for ADJ to NoWaC |
---|---|---|---|---|
FEM | 0 (0%) | 33 (6.33%) | 20358360 (16.47%) | 0% |
MASC | 13 (21%) | 302 (58%) | 69209955 (56%) | 37.5% |
NEUT | 49 (79%) | 186 (35.7%) | 34026414 (27.53%) | 286.96% |
Total: | 62 (100%) | 521 (100%) | 123594729 (100%) | N/A |
From this data we can see that infinitival gender is overrepresented for adjectives. This is due to feminine and masculine genders (which appear to be equally underrepresented) not being indicated morphologically in adjectives, but rather indicated by their un-inflected base form, neuter adjectives are inflected with a morpheme. This reflects a tagging convention that is morphologically oriented.