China builds Mongolian language corpus
Updated: 2016-01-22 13:43
(Xinhua)
|
|||||||||||
HOHHOT -- A Mongolian language database containing 80 million words has been launched, after ten years of collection and research, the Inner Mongolia Academy of Social Sciences said.
The Mongolian corpus is a part of the 200-million word corpora used by ethnic minorities in northern and northeastern China including the Duar, Ewenk and Oroqen languages. The project is slated for completion in 20 years.
The compilers identified 97 locations across eight Chinese provincial regions that have a Mongolian population as well as five provinces and cities in Mongolia, the Buryat Republic and the Republic of Kalmykia in Russia. They collected 4,192 hours of oral data from 6,725 mongolian speakers as well as over 4,000 hours of written data.
The corpora projects aims to help protect disappearing ethnic languages,and will be a precious linguistic resource, according to the academy.
The project has two stages. The first stage, the Mongolian corpus, is finished and the second stage, the database for the other three languages, is under way.
Today's Top News
China gives beleaguered Tsipras a helping hand
China and Gulf nations resume free trade talks
IMF starts to select new chief
Merkel insists on European solution for refugee crisis
China, Saudi Arabia sign deals on Xi's visit
Hollande announces $2.2b plan to create jobs
Chinese conductor becomes first woman in charge of BBC orchestra
Taxi drivers in Budapest protests against Uber
Hot Topics
Lunar probe , China growth forecasts, Emission rules get tougher, China seen through 'colored lens', International board,
Editor's Picks
Chinese firms making inroads in UK |
Two-child policy to add 30m workers |
Shanghai Disneyland set for June opening |
Mountaineering school hits new heights |
EU stuck in the middle as China chases MES |
Food from the Forbidden City |