"i"DAT
The goal of the initiative is to create a pool of in-depth annotated data to support online knowledge representation, e-research, that is the free exchange of research data, and natural language processing initiatives that depend on richly annotated linguistic data.
iDAT also stands for 'I' for Indo-Aryan, 'D' for Dravidian, 'A' for Austro-Asiatic and 'T' for Tibeto-Burman, which are the four language families found in India.
Language Families of India
The languages of India belong to several language families, the major ones being the Indo-European languages—Indo-Aryan (spoken by 72% of Indians) and the Dravidian languages (spoken by 25% of Indians). Other languages spoken in India belong to the Austro-Asiatic, Tibeto-Burman, and a few minor language families and isolates. Individual mother tongues in India number several hundred; the 1961 census recognised 1,652 (SIL Ethnologue lists 415). According to Census of India of 2001, 30 languages are spoken by more than a million native speakers, 122 by more than 10,000. Three millennia of language contact has led to significant mutual influence among the four language families in India and South Asia. Two contact languages have played an important role in the history of India: Persian and English. The northern Indian languages from the Indo-European family evolved from Old Indo-Aryan such as Sanskrit, by way of the Middle Indo-Aryan Prakrit languages and Apabhraṃśa of the Middle Ages. There is no consensus for a specific time where the modern north Indian languages such as Hindi-Urdu, Assamese, Bengali, Gujarati, Marathi, Punjabi, Saraiki, Sindhi and Oriya emerged, but AD 1000 is commonly accepted. Each language had different influences, with Hindi-Urdu (Hindustani) being strongly influenced by Persian. The Dravidian languages of South India have a history independent of Sanskrit. The major Dravidian languages are Tamil, Telugu, Kannada and Malayalam. The Austro-Asiatic and Tibeto-Burman languages of North-East India also have long independent histories.
Language | Family | Speakers(2001, mill) | State(s) |
---|---|---|---|
Assames | Indo-Aryan,Eastern | 13 | Assam, Arunachal Pradesh |
Bengali | Indo-Aryan, Eastern | 83 | West Bengal, Tripura, Andaman & Nicobar Islands and also few regions of Assam |
Bodo | Tibeto-Burman | 1.4 | Assam |
Dogri | Indo-Aryan, Northwestern | 2.3 | Jammu and Kashmir |
Gujarati | Indo-Aryan, Western | 46 | Dadra and Nagar Haveli, Daman and Diu |
Standard Hindi | Indo-Aryan,Central | 258-422 | Andaman and Nicobar Islands, Arunachal Pradesh, Bihar, Chandigarh, Chhattisgarh, the national capital territory of Delhi, Haryana,Himachal Pradesh, Jharkhand, Madhya Pradesh, Rajasthan, Uttar Pradesh and Uttarakhand |
Kannada | Dravidian | 38 | Karnataka |
Kashmiri | Indo-Aryan Dardic | 5.5 | Jammu and Kashmir |
Konkani | Indo-Aryan, Southern | 2.5 (7.6 per Ethnologue) | Goa, Karnataka, Maharashtra, Kerala |
Maithili | Indo-Aryan, Eastern | 12 (32 in India in 2000 per Ethnologue) | Bihar |
Malayalam | Dravidian | 33 | Kerala, Andaman and Nicobar Islands, Lakshadweep, Pondicherry |
Manipuri (alsoMeitei or Meithei) | Tibeto-Burman | 1.5 | Manipur |
Manipuri (alsoMeitei or Meithei) | Tibeto-Burman | 1.5 | Manipur |
Nepali | Indo-Aryan, Northern | 2.9 | Sikkim, West Bengal, Assam |
Oriya | Indo-Aryan, Eastern | 33 | Orissa |
Punjabi | Indo-Aryan, North-western | 29 | Chandigarh, Delhi, Haryana, Punjab |
Sanskrit | Indo-Aryan | 0.01 | non-regional |
Santhali | Munda | 6.5 | Santhal tribals of the Chota Nagpur Plateau (comprising the states of Bihar, Chattisgarh, Jharkhand, Orissa) |
Sindhi | Indo-Aryan, North-western | 2.5 | non-regional |
Tamil | Dravidian | 61 | Tamil Nadu, Andaman & Nicobar Islands, Puducherry |
Telugu | Dravidian | 74 | Andaman & Nicobar Islands, Andhra Pradesh, Puducherry |
Urdu | Indo-Aryan, Central | 52 | Jammu and Kashmir, Andhra Pradesh, Delhi, Bihar, Uttar Pradesh and Uttarakhand |