Difference between revisions of "Classroom:LING2208 - Annotating Norwegian Bokmål"
Are Ormberg (Talk | contribs) |
|||
Line 106: | Line 106: | ||
====Reflexive pronouns in Norwegian==== | ====Reflexive pronouns in Norwegian==== | ||
− | We find two different reflexive pronoun forms in Norwegian; one with a pronoun, such as "hans" (english: his) or "hennes" ( | + | We find two different reflexive pronoun forms in Norwegian; one with a pronoun, such as "hans" (english: his) or "hennes" (her), and one without, as in "sin". |
− | We assume | + | We assume that the form "hans" consists of the masculine pronoun "han", and the reflexive marker "sin" (which has become cliticized as "s"). This pattern is found in the feminine form "hennes" and the neuter forms "dens/dets", as well as the plural form "deres" (their). |
In terms of agreement, the reflexive marker "sin" takes its values of GENDER and NUMBER from its object, as seen in the examples below: | In terms of agreement, the reflexive marker "sin" takes its values of GENDER and NUMBER from its object, as seen in the examples below: | ||
Line 118: | Line 118: | ||
"Mannen sine vinduer" (the man's windows) | "Mannen sine vinduer" (the man's windows) | ||
<Phrase>41911</Phrase> | <Phrase>41911</Phrase> | ||
− | When the | + | When the object "vinduer" is in the plural, then so is the reflexive "sine". |
− | "Mennene | + | "Mennene sine vinduer" (the mens' windows) |
− | + | The reflexive "sitt" is in the singular to agree with its object, even though the subject is in the plural. This indicates that it is the object rather than the subject which is the controller that spreads its values of the features GENDER and NUMBER. | |
− | The reflexive "sitt" is in the singular to agree with its | + | |
− | The same control relation also goes for the reflexive determiner "egen" (own) | + | The same control relation also goes for the reflexive determiner "egen" (own). |
"Mannen sitt eget vindu." (the man's own window) | "Mannen sitt eget vindu." (the man's own window) | ||
− | + | "Window" spreads its values NEUTER and SINGULAR to "eget", so that they agree. | |
− | " | + | |
"Mannen sine egne vinduer" (the man's own windows) | "Mannen sine egne vinduer" (the man's own windows) | ||
− | + | When the object "vinduer" is in the plural, then so is the reflexive "egne". | |
− | When "vinduer" is in the plural, then so is the reflexive "egne" | + | |
− | "Mennene sitt eget | + | "Mennene sitt eget vinduer" (the mens' own windows) |
− | + | ||
− | + | This is a purely syntactical analysis for the properties of ''sin'' and ''eget''. Within this, we try to show that agreement only occurs within the noun phrase. | |
+ | |||
+ | ===Agreement statistics=== | ||
+ | |||
+ | The following table describes the distribution of marked gender as glossed on adjectives, and the total distribution of tags for Norwegian Bokmål in TypeCraft. This is compared to the distribution of genders among nouns in the [http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html NoWaC corpus]. The percentages in the first columns represent the ratio of each tag to the total for each count, (i.e: 56% of all nouns are tagged in NoWaC as masculine). The final column contains the compound ratio of the ratio of each gender in entries tagged with ADJ in TypeCraft and the ratio of each gender in entries tagged as nouns in NoWaC. This gives us an indication of whether some genders are more frequently glossed for adjectives than they naturally occur. | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Gender | ||
+ | ! Adjectives | ||
+ | ! Total for all tags in TypeCraft | ||
+ | ! Total for nouns in NoWaC | ||
+ | ! Ratio for ADJ to NoWaC | ||
+ | |- | ||
+ | | ''FEM'' | ||
+ | | 0 (0%) | ||
+ | | 33 (6.33%) | ||
+ | | 20358360 (16.47%) | ||
+ | | 0% | ||
+ | |- | ||
+ | | ''MASC'' | ||
+ | | 13 (21%) | ||
+ | | 302 (58%) | ||
+ | | 69209955 (56%) | ||
+ | | 37.5% | ||
+ | |- | ||
+ | | ''NEUT'' | ||
+ | | 49 (79%) | ||
+ | | 186 (35.7%) | ||
+ | | 34026414 (27.53%) | ||
+ | | 286.96% | ||
+ | |- | ||
+ | | Total: | ||
+ | | 62 (100%) | ||
+ | | 521 (100%) | ||
+ | | 123594729 (100%) | ||
+ | | ''N/A'' | ||
+ | |} | ||
+ | |||
+ | From this data we can see that infinitival gender is overrepresented for adjectives. This is due to feminine and neuter genders (which appear to be equally underrepresented) not being indicated morphologically in adjectives, but rather indicated by their un-inflected base form, infinitival adjectives are inflected with a morpheme. This reflects a tagging convention that is morphologically oriented. |
Revision as of 15:31, 18 February 2014
Contents
Agreement
The following phrase contains agreement between the noun kjøttbein and the adjectives fint and saftig:
Han |
Han |
He1SG |
PN |
så |
så |
seePAST |
Vtr |
en |
en |
3SGMASCINDEF |
ART |
slakterbutikk |
slakterbutikk |
butcher.shop |
N |
og |
og |
and |
CONJ |
gikk | |
g | ikk |
walkSG | PAST |
V |
raskt |
raskt |
quickly |
ADVm |
inn |
inn |
in |
ADVplc |
og |
og |
and |
CONJ |
stjal |
stjal |
PAST |
V |
et |
et |
aINDEFNEUTSG |
DET |
fint | |
fin | t |
nice | SGINDEFNEUT |
ADJ |
saftig |
saftig |
juicySGINDEF |
ADJ |
kjøttbein |
kjøttbein |
meat.boneNEUT |
N |
fra |
fra |
fromSRC |
PREP |
hyllen | |
hylle | n |
shelf | DEF |
N |
[[1]]
Both adjectives are tagged as being singular and neuter, which corresponds to the head of the NP in which they are embedded; et fint, saftig kjøttbein. Although kjøttbein is only tagged as neuter, its indefiniteness is given by the determiner et, which also agrees with the noun.
The corpus for Norwegian Bokmål available on TypeCraft contains 182 sentences tagged as adjectives, with 60 of them tagged with gender markings, such as in the adjectives discussed.
Clause Linkage
The phrase mentioned above is also a complex clause, consisting of the simple clauses Han så en slakterbutikk and [og gikk raskt inn [og stjal et fint, saftig kjøttbein fra hyllen]]. The complex clause is an adjoined clause, in which the second simple clause contains a conjunction (og), and is coordinated with the first clause. The syntagms are not in a relation of dependency, as no grammatical slot is occupied by one. Therefore, the second syntagm is not embedded, in which case it would fill a grammatical slot.
The second syntagm may itself be divided into two coordinate clauses, which in turm form a coordinate clause itself. All of thee clauses in the sentece constitute syndetic parataxis.
The syntagms describe a series of events in temporal order. The first clause contains the head of the sentence (så), and would be grammatical without the rest of the coordinated clauses. Gramatically, the clauses are all linked by tense (past), and the grammaticality would be questionable if they were in different tense. This may be because all of the clauses share the same subject.
--Are Ormberg 13:31, 17 February 2014 (UTC)
AGREEMENT
den |
den |
3SGCOMMSBJ |
PN |
innså | |
inn | så |
seeVstemPRET | |
V |
sin |
sin |
REFL3PCOMM |
TRUNC |
egen |
egen |
REFLSGCOMM |
DET |
dårskap | |
dår | skap |
foolishNstem | nessN>NCOMM |
N |
for |
for |
tooDEG |
ADVm |
sent | |
sen | t |
lateADJstem | NEUT |
ADJ |
og |
og |
CONJ |
gikk |
gikk |
walkVstemPRET |
V |
avsted | |
av | sted |
aPART | wayN>ADV |
ADVplc |
sulten | |
sult | en |
hungryN>ADJ | SGCOMM |
ADJ |
og |
og |
CONJ |
trist |
trist |
sadCOMMSG |
ADJ |
men |
men |
CONJ |
kanskje | |
kan | skje |
maybeV>ADV | V>ADV |
ADV |
litt |
lit: |
a.littleDEG |
ADVm |
klokere | |
klok | ere |
wiseADJstem | CMPR |
ADJ |
"Den innså sin egen dårskap for sent og gikk avsted sulten og trist men kanskje litt klokere" [[2]]
The pronoun "den" is an anaphor that picks up its antecedent "hunden", specified for the values "COMMON GENDER" for the feature GENDER, and the suffix "-en" which is specified for the value SINGULAR for the feature NUMBER, as well as 3RD PERSON for the feature PERSON. The values spreading from the pronoun "den" the reflective determiner "egen", as well as the adjectives "sulten" and "trist" are SINGULAR and COMMON GENDER. When it comes to the reflective pronoun "sin" these values, and the value 3RD PERSON are in agreement.
CLAUSE LINKAGE
den |
den |
3SGCOMMSBJ |
PN |
innså | |
inn | så |
seeVstemPRET | |
V |
sin |
sin |
REFL3PCOMM |
TRUNC |
egen |
egen |
REFLSGCOMM |
DET |
dårskap | |
dår | skap |
foolishNstem | nessN>NCOMM |
N |
for |
for |
tooDEG |
ADVm |
sent | |
sen | t |
lateADJstem | NEUT |
ADJ |
og |
og |
CONJ |
gikk |
gikk |
walkVstemPRET |
V |
avsted | |
av | sted |
aPART | wayN>ADV |
ADVplc |
sulten | |
sult | en |
hungryN>ADJ | SGCOMM |
ADJ |
og |
og |
CONJ |
trist |
trist |
sadCOMMSG |
ADJ |
men |
men |
CONJ |
kanskje | |
kan | skje |
maybeV>ADV | V>ADV |
ADV |
litt |
lit: |
a.littleDEG |
ADVm |
klokere | |
klok | ere |
wiseADJstem | CMPR |
ADJ |
"Den innså sin egen dårskap for sent og gikk avsted sulten og trist men kanskje litt klokere" [[3]]
The complex clause above consists of two simple clauses;
1: "Den innså sin egen dårskap for sent" 2: "Den gikk avsted sulten og trist men kanskje litt klokere"
These two simple clauses are connected with the conjunction "and", which often is used to coordinate two or more clauses. In other words we are here dealing with an example of parataxis, in which the clauses are independent of each other (even though they share the same subject). A sign of this is the inflection of the verb contained in these clauses, and that the clauses are quite autonomous, as shown in the breakdown into separate clauses 1 and 2 above. However, they agree in tense (both are in the preterite) which suggests that they are linked temporally. From the semantic content it may seem that the clauses are linked causally, which would imply subordination, or hypotaxis: "Because <Den innså sin egen dårskap for sent>, <gikk den avsted sulten og trist men kanskje litt klokere>", but in my view this complex clause seems to be an example of coordination rather than subordination.
--Eirik Zahl 19:16, 16 February 2014 (UTC)
Agreement
In the course of the story we find two cases of agreement that are different with respect to a single feature. It shows, quite neatly, how agreement works in norwegian and how it affects syntactical composition of Norwegian. In sentence 6 we find this noun phrase [4]:
Han |
han |
he3SGMASC |
PN |
så |
så |
sawPAST |
V |
en |
en |
INDEFMASCSG |
DET |
annen |
annen |
otherMASC |
ADJ |
hund |
hund |
dogMASC |
N |
nøyaktig |
nøyaktig |
exactly |
ADV |
lik |
lik |
likeCMPR |
PRT |
ham |
ham |
himMASC3SGACC |
PN |
som |
som |
which |
CONJS |
holdt | |
hold | t |
holdVstem | PAST |
V |
et |
et |
INDEFNEUTSG |
DET |
bein |
bein |
boneNEUT |
N |
i |
i |
PREP |
munnen | |
munn | en |
mouthMASC | DEFMASCSG |
N |
sin | |
si | n |
MASCSG | |
PNposs |
En annen hund - Another dog (eng)
In sentence 7, however, we find this noun phrase
den |
den |
DEFMASCSG |
DET |
grådige | |
grådig | e |
greedy | AGRMASCSG |
ADJ |
hunden | |
hund | en |
dogMASC | DEFMASCSG |
N |
bestemte | |
bestem | te |
decideVstem | PAST |
V |
seg |
seg |
self3SGREFL |
PNrefl |
for |
for |
PRTv |
at |
at |
that |
COMP |
han |
han |
he3SGMASC |
PN |
ville | |
vil | le |
wouldVstem | PAST |
AUX |
ha |
ha |
haveINF |
V |
dét |
dét |
thatNEUTSG |
DEM |
beinet | |
bein | et |
boneNEUT | DEFNEUTSG |
N |
óg |
óg |
ADV |
så |
så |
Han |
han |
he3SGMASC |
PN |
knurret | |
knurre | t |
growlVstem | PAST |
V |
i |
i |
in |
PREP |
håpet | |
håp | et |
hopeNEUT | DEFNEUTSG |
N |
om |
om |
PREP |
at |
at |
that |
CONJ |
den |
den |
DEFMASCSG |
DET |
andre |
andre |
otherDEF |
ADJ |
hunden | |
hund | en |
dogMASC | DEFMASCSG |
N |
i |
i |
in |
PREP |
elva | |
elv | a |
riverFEM | DEFFEMSG |
N |
skulle |
skulle |
shouldPAST |
AUX |
miste |
miste |
dropINF |
V |
beinet | |
bein | et |
boneNEUT | DEFNEUTSG |
N |
ut |
ut |
PREP |
av |
av |
PREP |
frykt |
frykt |
fearMASC |
N |
Den andre hunden - The other dog (eng)
It should be relatively clear that the only difference between the two noun phrases is one of definiteness. In both cases the controller is the word hund, which means dog and is the head of the phrase. The noun phrase, accordingly, is the domain of agreement. The word hund in itself carries only the feature of masculine (MASC), and definiteness is impossible to determine through this word alone. However, an indefinite article has been chosen, namely en, and thus renders the noun indefinite. En becomes a target for the controller and agrees with the feature MASC. Therefore it carries the two features MASC and indefinite (INDEF). The adjective annen, which means other in English, is also a target for the controller and therefore has to agree in both the features MASC and INDEF.
This can be seen by comparing it to the other noun phrase in sentence 7. Here the word hund has gained the additional morpheme -en. This is the definite article in Norwegian, and so the word now holds two features in itself, namely MASC and DEF. An interesting point is that there is still a preceding article den which also marks definiteness, irrespective of the presence of the definite suffix. This is called double definiteness, and it surfaces when the noun is modified by an adjective. Regardless, this den is affected by the controller and gains the feature MASC. The adjective is also affected by the controller and gathers the features of MASC and DEF. Because of this, it changes form from annen to andre, which is a definite form of the word.
--Anders Lynghaug Haugen 21:49, 16 February 2014 (UTC)
Clause Linkage
There are a number of different forms of clause linkage that can be found throughout the story. Let ut first look at sentence number 2 [5]:
han |
han |
he3SGMASC |
PN |
spanet | |
spane | t |
spyVstem | PAST |
V |
et |
et |
INDEFNEUTSG |
DET |
slakterhus |
slakterhus |
slaughterhouseNEUT |
N |
og |
og |
CONJC |
smatt |
smatt |
snuckPAST |
V |
raskt | |
rask | t |
quick | ADJ>ADV |
ADV |
inn |
inn |
PREP |
og |
og |
CONJC |
stjal |
stjal |
stole |
V |
et |
et |
INDEFNEUTSG |
DET |
stort | |
stor | t |
big | AGRNEUTSG |
ADJ |
fint | |
fin | t |
nice | AGRNEUTSG |
ADJ |
saftig |
saftig |
juicy |
ADJ |
bein |
bein |
boneNEUT |
N |
fra |
fra |
PREP |
hyllen | |
hylle | n |
shelfMASC | DEFSG |
N |
[Han spanet et slakterhus] og [smatt raskt inn] og [stjal et stort, fint, saftig bein fra hyllen]
The brackets here mark the boundaries of the clauses, whether indepedent or embedded in the sentence. In this sentence we can see by the bracketing that we have three clauses within the sentence. They need to be in this order because of temporal and causal resttrictions, but syntactically speaking, they are independent of one another. In this sense, they are parallel one another, and this is marked through the use of og, which acts as a coordinating conjunction. This is a phenomenon called parataxis.
For another form of clause linkage, let's again look at sentence 6 [6]. Here we find this noun phrase:
En annen hund [[nøyaktig lik ham] som hold et bein i munnen sin]
The linked clause is the part between the brackets. This clause is complex because the phrase en annen hund works fine on its own. The part in the brackets is thus a modifying element. Nøyaktig lik ham is an adjectival expression which can be overlooked for this purpose. However, the part som holdt et bein i munnen sin is rather important, because it is initiated through the use of the subordinating conjunction som. This results in a downgrading which causes this clause to lack a subject, because this is taken to be the head of the nound phrase it is subordinated.
Yet another form of clause linkage is found in sentence 3 [7]:
[Mens han tygget lykkelig på beinet] sprang han inn i skogen.
In this example, the clause within the brackets is completely adverbial. It is not needed by the main verb, which is sprang, and is therefore not embedded in the sentence. In spite of this it is introduced by a conjunction fucntions as a temporal adverb. Because it is subordinate to the main event. This would a type of clause linkage that could be said to be halfway between parataxis and embedding.
Finally we have sentence 7 [8]. Here we fin this portion:
Den grådige hunden bestemte seg for [at han ville ha det beinet óg...]
In this complex clause, the part within the brackets is completely embedded within the sentence. This is because the clause within the brackets is absolutely necessary to fulfill the valency of the main verb bestemte seg for, i.e. it fills a grammatical slot predicated by the main verb. The clause acts as a complement to the verb and is therefore an embedded clause and totally dependent on the main verb in this sentence. It is the subordinating conjunction at that introduces the embedded element.
--Anders Lynghaug Haugen 21:49, 16 February 2014 (UTC)
Reflexive pronouns in Norwegian
We find two different reflexive pronoun forms in Norwegian; one with a pronoun, such as "hans" (english: his) or "hennes" (her), and one without, as in "sin". We assume that the form "hans" consists of the masculine pronoun "han", and the reflexive marker "sin" (which has become cliticized as "s"). This pattern is found in the feminine form "hennes" and the neuter forms "dens/dets", as well as the plural form "deres" (their).
In terms of agreement, the reflexive marker "sin" takes its values of GENDER and NUMBER from its object, as seen in the examples below:
"Mannen sitt vindu." (the man's window)
[[9]] "Window" spreads its values NEUTER and SINGULAR to "sitt", so that they agree.
"Mannen sine vinduer" (the man's windows)
When the object "vinduer" is in the plural, then so is the reflexive "sine".
"Mennene sine vinduer" (the mens' windows) The reflexive "sitt" is in the singular to agree with its object, even though the subject is in the plural. This indicates that it is the object rather than the subject which is the controller that spreads its values of the features GENDER and NUMBER.
The same control relation also goes for the reflexive determiner "egen" (own).
"Mannen sitt eget vindu." (the man's own window) "Window" spreads its values NEUTER and SINGULAR to "eget", so that they agree.
"Mannen sine egne vinduer" (the man's own windows) When the object "vinduer" is in the plural, then so is the reflexive "egne".
"Mennene sitt eget vinduer" (the mens' own windows)
This is a purely syntactical analysis for the properties of sin and eget. Within this, we try to show that agreement only occurs within the noun phrase.
Agreement statistics
The following table describes the distribution of marked gender as glossed on adjectives, and the total distribution of tags for Norwegian Bokmål in TypeCraft. This is compared to the distribution of genders among nouns in the NoWaC corpus. The percentages in the first columns represent the ratio of each tag to the total for each count, (i.e: 56% of all nouns are tagged in NoWaC as masculine). The final column contains the compound ratio of the ratio of each gender in entries tagged with ADJ in TypeCraft and the ratio of each gender in entries tagged as nouns in NoWaC. This gives us an indication of whether some genders are more frequently glossed for adjectives than they naturally occur.
Gender | Adjectives | Total for all tags in TypeCraft | Total for nouns in NoWaC | Ratio for ADJ to NoWaC |
---|---|---|---|---|
FEM | 0 (0%) | 33 (6.33%) | 20358360 (16.47%) | 0% |
MASC | 13 (21%) | 302 (58%) | 69209955 (56%) | 37.5% |
NEUT | 49 (79%) | 186 (35.7%) | 34026414 (27.53%) | 286.96% |
Total: | 62 (100%) | 521 (100%) | 123594729 (100%) | N/A |
From this data we can see that infinitival gender is overrepresented for adjectives. This is due to feminine and neuter genders (which appear to be equally underrepresented) not being indicated morphologically in adjectives, but rather indicated by their un-inflected base form, infinitival adjectives are inflected with a morpheme. This reflects a tagging convention that is morphologically oriented.