Difference between revisions of "Documenting Lule Sami"
(→Categories and Functions) |
|||
Line 1: | Line 1: | ||
− | Documenting Lule Sami is a pilot study for an in-depth manual annotation of written Lule Sami. The study is conducted by LingLab (see [[About TypeCraft]]) at the [http://www.ntnu.no/hf Department for Language and Communication Studies] at the Norwegian University of Science and Technology. Work started in Mai 2008 and will end in November 2008. The page [[A Pilot Study in Documenting Lule Sami]] tells you more about the project itself. Under [[About Lule Sami]] you find a short introduction to the Lule Sami people and their language. Following the latter link you also will find other relevant links about the Sami and their language. | + | Documenting Lule Sami is a pilot study for an in-depth manual annotation of written Lule Sami (see [[Annotation_of_representative_texts_from_Lule_Sami_-_An_NTNU_project]]). The study is conducted by LingLab (see [[About TypeCraft]]) at the [http://www.ntnu.no/hf Department for Language and Communication Studies] at the Norwegian University of Science and Technology. Work started in Mai 2008 and will end in November 2008. The page [[A Pilot Study in Documenting Lule Sami]] tells you more about the project itself. Under [[About Lule Sami]] you find a short introduction to the Lule Sami people and their language. Following the latter link you also will find other relevant links about the Sami and their language. |
Lule Sami is a morphologically rich, highly inflected and very often fusional language which makes its in-depth morpho-syntactic annotation an interesting, yet at the same time a difficult and very time-consuming task. Yet, none of the texts we use has been annotated before, and since Lule Sami is, with approximately 2000 speakers in Norway and Sweden, of which only few speak Lule Sami as first language, one of the highly endangered languages in Europe. | Lule Sami is a morphologically rich, highly inflected and very often fusional language which makes its in-depth morpho-syntactic annotation an interesting, yet at the same time a difficult and very time-consuming task. Yet, none of the texts we use has been annotated before, and since Lule Sami is, with approximately 2000 speakers in Norway and Sweden, of which only few speak Lule Sami as first language, one of the highly endangered languages in Europe. |
Revision as of 10:00, 15 December 2008
Documenting Lule Sami is a pilot study for an in-depth manual annotation of written Lule Sami (see Annotation_of_representative_texts_from_Lule_Sami_-_An_NTNU_project). The study is conducted by LingLab (see About TypeCraft) at the Department for Language and Communication Studies at the Norwegian University of Science and Technology. Work started in Mai 2008 and will end in November 2008. The page A Pilot Study in Documenting Lule Sami tells you more about the project itself. Under About Lule Sami you find a short introduction to the Lule Sami people and their language. Following the latter link you also will find other relevant links about the Sami and their language.
Lule Sami is a morphologically rich, highly inflected and very often fusional language which makes its in-depth morpho-syntactic annotation an interesting, yet at the same time a difficult and very time-consuming task. Yet, none of the texts we use has been annotated before, and since Lule Sami is, with approximately 2000 speakers in Norway and Sweden, of which only few speak Lule Sami as first language, one of the highly endangered languages in Europe.
Below we discuss some of the issues that were raised during annotation.
Contents
Annotating Lule Sami - Questions and some answers
Categories and Functions
ADJ and N
Anders Kintel writes in his "Veiledning i bruk av ordboka" (foreløpig versjon):
"Vi gjør oppmerksom på at de fleste adjektiv i samisk kan også fungere som substantiv og også motsatt, derfor står det ikke alltid en markering [s. eller adj.] bak ordet som tilsier at dette er et adjektiv eller et substantiv".
Here a free translation:
We would like to draw attention to the fact that most adjectives in Sami can function as nouns, as well as nouns can function as adjectives; therefore there is not always a specification [n. or adj.] after the word that expresses that this is an adjective or a noun.
Reference: Kintel A. Lulesamisk-norsk. Ajluokta /Drag, Biehtsemanon 2005. Unpublished manuscript.
An example how we annotate de-adjectival and other derivational lexemes comes from one of the texts that we have annotated.
In the example below we annotate vuorra meaning old on the POS tier as N:
Várrá |
várrá |
mountainNOMSG |
N |
vuolgget | |
vuolgge | t |
go | INF |
Vitr |
guollit | |
guolli | t |
fish | N>VINF |
Vitr |
ja |
ja |
and |
CONJC |
vuorrasij | ||
vuorra | si | j |
old | GENPL | |
N |
siegen | |
siege | n |
with | INESSSG |
Nspat |
tjåhkkåhit | ||
tjåhkkå | hi | t |
sit | DUR | INF |
Vitr |
ja |
ja |
and |
CONJC |
gulldalit | ||
gullda | li | t |
listen | DUR | INF |
gå |
gå |
when |
COMP |
subtsasti… | |
subtsa | sti… |
tale | N>V3PLPRES |
Vitr |
In the case of vuorro we in fact do find derivational morphology. The -s in vuorra-s [vuorrasa] marks the noun as derived. The -s is followed by some case inflection. Clearly, the function of vuorra is that of a noun, and accordingly it has been inflected for case. (Kristin)
In general we will annotate the word's POS category according the function it has in its context of use. However on the glossing tier we should in addition indicate the word's derivation. The -s reflects nominalization, so we should use the ADJ-> N tag in the gloss line to reflect this better (Dorothee).
We need gloss tags ATT for attributive form and PRED for predicative form of the adjectives. Some forms are equal in both forms - then perhaps, it is sufficient to mark it only with ADJ pos. (Kristin)
V > N
(Again the issue is how to annotate for derivation)
(Kristin): I think it is NOT enough to annotate V > N, V > V, etc. We should mark every derivation with what kind of N derivation it is. I have written something about it below:
In the phrase below the word for temptation is derived from the verb 'watch/look, which in LS is gähttjalibmáj. When we decompose this word we get: gæhttjat+V+TV+Der1+Der/l+V+Actio+Der2+Der/ibme+N+Sg+Ill
Consider the following phrase:
Ja |
ja |
and |
CONJC |
ale |
ale |
notIMP2SG |
mijáv | |
mijáv | v |
us | 1PLACC |
PN |
gæhttjalibmáj | |||
gæhttja | li | bmá | j |
watch/look | DIMFREQ | V>N | ILLSG |
N |
lájddi |
lájddi |
leadIMP2SG |
Vtr |
ájnat |
ájnat |
but |
CONJS |
várjjala |
várjjala |
deliverIMP2SG |
mijáv |
mijáv |
us1PLACC |
PN |
bahás | |
bahá | s |
evil | ELATSG |
N |
In descriptive Sami grammars the nominalizer li is called Actio. The nominalizer seems to be internally complex: -li- i gæhttja-li-t = subjunctive-FREQ
(Dorothee) At this point it is not clear which subtypes of nominalizing suffixes we should distinguish. We could for example introduce NMLZ.actio and other subtypes of nominalizer. How useful would that be? Perhaps we should wait until we have a clearer overview over which categories are needed.
V > Adj but how about ADJ->V
In the phrase below we need the tag ADJ->V
Dá |
dá |
theseNOMPL |
DEM |
bale |
bale |
timeGENSG |
N |
bessin | |
bessi | n |
be_allowed | 3PLPAST |
Vitr |
oahppe |
oahppe |
pupilNOMPL |
N |
vehi |
vehi |
little |
ADVm |
oahpásmuvvat | |||
oahpá | s | muvva | t |
get_to_be | INF | ||
Vtr |
doarromuseajn | ||
doarro | musea | jn |
warNOMSG | museum | withCOMITSG |
N |
Narvijkan | |
Narvijka | n |
Narvik | atINESSSG |
Np |
ja |
ja |
and |
CONJC |
sáme |
sáme |
SaamiGENSG |
N |
ásadusáj | |
ásadusá | j |
arrangemet | medCOMITPL |
N |
Jåhkåmåhken | |
Jåhkåmåhke | n |
river_bend | atINESSSG |
Np |
åvdås |
åvdås |
before |
vádtsájin | |
vádtsá | jin |
leave | 3PLPAST |
Vitr |
oahpásmuvvat: oahpás- is an ADJ ('oahpás-' in compounds, 'oahpes' (ATT) otherwise)
(Kristin): It should be possible to note somewhere that verb(s) can be derived from adjectives. Maybe there should be one more level for derivations only? Also, the translation is of no help: while oahpás- is an ADJ in ATT form, known is a V in PERF.PART form. Translation gives us only sketchy semantics.
(Dorothee): UPS yes, we need the tag ADJ->V
PRON.POSS vs. PRON
Áhttje |
áhttje |
fatherNOMSG |
N |
mijá |
mij |
ourGENPL |
PN |
guhti |
guhti |
whoNOMSG |
PROint |
le |
le |
is3SGPRES |
COP |
almen | |
alme | n |
heaven | inINESSSG |
N |
Above is a nominal construction where the possessive pronoun follows the noun. Possessive pronouns may also precede the noun.
(Dorothee): Question: Are both syntactic pattern in free distribution? Is one of the two constructions preferred? So do we find one of the constructions more often in our texts?
Why really should we use only PRON when the possessive is used attributive, but PRON.POSS when it is used as modifier?.
VERBAL FORMS
more verbal tags...
While annotation verb forms in Lule Sami we noticed that TypeCraft did not provide all the tags we needed. In the following we exemplify some of the verb forms, and discuss the right use of tags.
GERUND we need to tag two distinct gerunds:
Gerund I
Is expressing: while..., at the same time as... something happens at the same time as the doing the main verb is expressing.
sån oaddá-j bårå-dijn = he fell asleep while eating
jåhte-t -> jåde-dijn = while moving
tjieggi-t -> tjieggi-dijn = while traveling
tjåhkani-t -> tjåhkana-ttjin= while assembling
Note: -dijn - used after the last vowel of the week stem of a pair-syllabic verb or after the last vowel of the stem of a contracted verb ...(a)-ttjin - used after the last vowel of an unpair-syllabic verb (the last stem vowel changes to '-a'
Gerund II
Is expressing: someone is doing something, or something is going on, or something has started but is not finished. The Gerund II is build through the use of the auxiliary liehke-t (to be).
sån la låhkå-min = he is reading
-min - used after the last vowel of the strong stem of a pair-syllabic verb
Ex: sån la goarro-min (= she is sewing)
- used after the last vowel of a contract verb:
Ex: sån la guolli-min (= she is fishing)
-me - used after the last vowel of an unpair-syllabic verb
Ex: sån la malesti-me (= he is cooking) (all examples from Spiik) (Kristin)
Imperative
Also here we need two distinct tags to distinguish between
IMP.1 which expresses a direct order.
IMP.2 which expresses a strong wish or suggestion
INCHOATIVE
In the gloss tier we need a tag for inchoative verbs. Here an example:
Hyhto |
hyhto |
cabinGENSG |
N |
sisi |
sisi |
insideILLSG |
Nspat |
manájma | ||
maná | j | ma |
go | PAST | 1PL |
Vitr |
ja |
ja |
and |
CONJC |
jus |
jus |
if |
CONJS |
riekta |
riekta |
right |
ADVm |
de |
de |
then |
CONJS |
oaddát | |
oaddá | t |
sleepINCEP | INF |
Vitr |
galgajma | ||
galga | j | ma |
shall | PAST | 1PL |
AUX |
valla |
valla |
but |
CONJC |
ejma | ||
e | j | ma |
notNEG | PAST | 1PL |
Vtr |
ájn |
ájn |
yet |
ADVtemp |
ájgo |
ájgo |
intendNEG |
PTCP |
oaddá-t er inchoative of oade-t.
Phonologically inchoatives are marked by a fortification of the consonant cluster and lengthening of the last vowel in the stem.
NEGATIVE VERBS
The tag Vneg in the POS tier is needed.
Supinum
Iŋŋga: |
Iŋŋgá: |
Inggá:NOMSG |
Np |
Mån |
mån |
I1SGNOM |
PN |
dal |
dal |
now |
ADVtemp |
biejav | |
bieja | v |
put | 1SGPRES |
Vtr |
mállásav | |
mállása | v |
dinner | ACCSG |
N |
duoldatjit | ||
duolda | tji | t |
cook | for | INF |
V |
How should one annotate the suffix -tji in the above sentence.
Kristin suggests to use 'supinum'. I am not so sure that this is right. As far as I know the supinum
is one of the infinite forms of LS next to the infinitive, the gerund, the participle and others.
But is -tji in the example above really an infinite marker, and then what is the -t? (Dorothee)
May be it is better to say that 'supinum' is tjit and as such an infinite marker. (Kristin)
Ex: "Dån la má smidá váttsá-tjit!" - "You are clever at walking!" (Arnhild/Kristin)
Derivational or inflectional ??
Is it possible to say that the supinum marker is a derivational suffix ? It is mentioned among the ordavledninger in descriptive grammars? (Kristin)
Strong and weak verb stems
Verbs in LS can either have a weak or a strong stem, so for example the verb wash has two stem forms
basá and bassi
the 1P, present tense is expressed as basá-v while the 1P past tense is bassi-v.
We will use the tag WEAK and STRONG to distinguish these two forms.
Grammatical Changes
LS is changing...
Dá |
dá |
theseNOMPL |
DEM |
bale |
bale |
timeGENSG |
N |
bessin | |
bessi | n |
be_allowed | 3PLPAST |
Vitr |
oahppe |
oahppe |
pupilNOMPL |
N |
vehi |
vehi |
little |
ADVm |
oahpásmuvvat | |||
oahpá | s | muvva | t |
get_to_be | INF | ||
Vtr |
doarromuseajn | ||
doarro | musea | jn |
warNOMSG | museum | withCOMITSG |
N |
Narvijkan | |
Narvijka | n |
Narvik | atINESSSG |
Np |
ja |
ja |
and |
CONJC |
sáme |
sáme |
SaamiGENSG |
N |
ásadusáj | |
ásadusá | j |
arrangemet | medCOMITPL |
N |
Jåhkåmåhken | |
Jåhkåmåhke | n |
river_bend | atINESSSG |
Np |
åvdås |
åvdås |
before |
vádtsájin | |
vádtsá | jin |
leave | 3PLPAST |
Vitr |
oahpásmuvvat doarromuseajn
According to grammars of LS - oahpásmuvvat takes ILL, but in the example sentence above it is used with a COMIT case. This leads to a change in meaning:
- muvva-t: used with ILL: the meaning is: get to know (people and concrete things), get accustomed to, get experience with, get familiar with.
(- tuvva-t used with COMIT the meaning is: learn to know). (Kristin)
Translation of place names
In English (as in other languages too - but not so much in Norwegian..) it is quite normal to translate proper names, e.g. München > Munich, Firenze > Florence, København > Copenhagen, etc.
Lule Sami place names have been translated into Norwegian, such as:
Ájluokta-Drag; Gásluokta-Kjøpsvik; Guovdageaidnu-Kautokeino; Divttasvuodna-Tysfjord, etc.
In Norway place names have officially a Sami and a Norwegian name, and the Sami name is used when writing in Sami, while the Norwegian one is used when writing in Norwegian.
As for free translation into English this could mean that we either use the Sami name, since we translate from Sami, or that we use the Norwegian name, since the Norwegian name is also internationally better known.
Which one should it be?
SÁMI - SAMI - SAAMI
(Kristin): I just talked with the employee at the museum at Arran who has the responsibility for the exibitions, Anne Kalstad Mikkelsen. She has checked the spelling of sáme with the Norvegian Sami Parliament. The Sami Parliament has decided that sáme is to be written Sami in English. So the museum has to follow this norm.
So - we should then follow the same norm! (Shouldn't we?)
(Dorothee): Definitely !