Institute of Philology of the Siberian Branch of Russian Academy of Sciences
Monuments of Folklore Siberian Journal of Philology Critique and Semiotics
Yazyki i fol’klor korennykh narodov Sibiri Syuzhetologiya i Syuzhetografiya
Institute of Philology of
the Siberian Branch of
Russian Academy of Sciences
По-русски
  
Siberian Journal of Philology
По-русски
Archive
Editorial board
Our ethical principles
Submission Requirements
Process for Submission & Publication
List of Typos
Search:

Author:

and/or Keyword:

Article

Name: The project of Tomsk dialect corpus in keeping with trends of corpus linguistics development

Authors: Svetlana S. Zemicheva, Ekaterina V. Ivantsova

Tomsk State University, Tomsk, Russian Federation

In the section Linguistics

Issue 3, 2018Pages 192-205
UDK: 811.161.1; 81-25; 81’322DOI: 10.17223/18137083/64/18

Abstract: The concept of the dialect corpus representing the Russian dialect speech of the Middle Ob region is proposed. The authors demonstrate that the project of Tomsk dialect corpus corresponds to the key trends of modern corpus linguistics: the involvement of oral speech materials; attention to the regional variation of the language; the study of dialect as part of the traditional culture; multimodality. The novelty of the resource is determined by the material – it is one of the few corpuses that include the speech of residents of the vast Siberian region: the archive includes the results of a 70-year expedition survey of about 400 villages – and lexicocentric and textocentric orientation: the possibility of access to full texts is fundamentally important. The problem of representativeness and balance of the dialect corpus which has not been studied in the scientific literature is considered. Today, Tomsk dialect corpus includes approximately 700 000 words, allowing it to be considered as a fairly representative collection of dialect texts. At the same time, the special characteristics of the material result in the corpus being not strictly balanced. The texts are presented in spelling with some phonetical features of the dialect. The structure of the new electronic resource involves 3 types of markup: passport, thematic and type of text. Passport metamarkup includes extra-linguistic data about the texts: the place of recording, the date, the information about the informant (sex, age, place of birth, level of education, occupation). Thematic meta-markup is made by means of an inductive analysis of the discursive practices of old-timers. The list of topics is hierarchical, with each topic being three levels deep maximum. The principle of «soft» markup is used, with the possibility of simultaneously assigning several themes to the one text fragment. At the first level of the hierarchy, 16 macro-themes are marked (Work, Food, Nature, etc.), on the second – 64 topics. Firstly, the markup by type of text at this stage includes the degree of the spontaneity of speech events and, secondly, the most frequent speech genres. The prospects for using the resource are the study of Middle Ob dialects in linguocultural, genre, communicative, cognitive, linguopersonological and other aspects; the creation of new dialect dictionaries; the investigation of traditional culture and folklore, customs and rituals, history of the region.

Keywords: corpus linguistics, Tomsk dialect corpus, Russian dialects of Siberia

Bibliography:

Aleksandrov O. A. Dialektologiya vospriyatiya: innovatsii v zarubezhnoy lingvistike [Dialectology of perception: innovations in foreign linguistics]. ISLU Philological Review. 2013, no. 3(24), pp. 52–58.

Anders C. A., Hundt M., Lasch A. Perceptual Dialectology. Neue Wege der Dialec-tologie. Berlin, Degruyter, 2010, 449 p.

Beridze M. M., Nadaraia D. V. Slovar’ kak tekstovyy komponent korpusa (Korpus gruzinskikh dialektov) [Dictionary as the text component of the corpus (corpus of Georgian dialects)]. In: Tr. mezhdunar. konf. “Korpusnaya lingvistika-2011”, 27–29 iyunya 2011 g. S.-Peterburg [Proceedings of the international conference “Corpus linguistics-2011” (June 27–29, 2011, St. Petersburg)]. St. Petersburg, 2011, pp. 92–97. URL: https://events.spbu.ru/eventsContent/files/corpling/corpora2011/Beridze_92.pdf

Erofeyeva E. V., Vardëy B., Krauze M., Post M. Zvukovoy korpus regional’noy russkoy rechi kak instrument izucheniya regiolektov i ikh otsenki naivnymi nositelyami yazyka [Sound corpus of the Russian regional speech as a tool for study regiolects and their evaluation by naive speakers]. In: Russkiy yazyk i literatura v prostranstve mirovoy kul’tury: Materialy XIII kongr. Mezhdunar. assotsiatsii prepodavateley rus. yaz. i litera-tury (MAPRYAL), 13–20 sent. 2015 g., Granada, Ispaniya [Russian language and literature in the space of world culture: Proceedings of the 13th congress of MAPRYAL Sept. 13–20, 2015, Granada, Spain]. St. Petersburg, MAPRYAL, Granada, 2015, vol. 2, pp. 84–88.

Gol’din V. E., Kryuchkova O. Yu. Tematicheskaya razmetka i tematicheskiy analiz dialectnogo tekstovogo korpusa [Theme markup and thematic analysis of the dialect text corpus]. In: Yazykovaya lichnost’ – tekst – diskurs: Teoreticheskiye i prikladnyye aspekty issledovaniya: Materialy mezhdunar. nauchn. konf.: V 2 ch. Ch. 1 [Linguistic personality – text – discourse: theoretical and applied aspects of research: proceedings of the intern. sci. conf.: in 2 pts. Pt 1]. Samara, 2006, pp. 71–80.

Johannessen J. B., Priestley J., Hagen K., Nøklestad A., Lynum A. The Nordic dialect corpus. In: Proc. of the Eighth Intern. Conf. on Language resources and Evaluation. 2012, pp. 3387–3392. URL: http://www.lrec-conf.org/proceedings/lrec2012/pdf/ 773_Paper.pdf

Kachinskaya I. B., Sichinava D. V. Dialektnyy podkorpus segodnya [Dialect subcorpus today]. Proceedings of the V.V. Vinogradov Russian Language Institute. 2015, vol. 6, pp. 142–163.

Kryuchkova O. Yu., Gol’din V. E. Korpus russkoy dialektnoy rechi: kontseptsiya i parametry otsenki [The Corpus of Russian dialect speech: the concept and parameters of evaluation]. In: Komp’yuternaya lingvistika i intellektual’nyye tekhnologii: Po materialam ezhegod. mezhdunar. konf. “Dialog”, 25–29 maya 2011 g., Bekasovo. Vyp. 10(17) [Computer linguistics and intellectual technologies: Based on materials of annual intern. conf. “Dialogue”, May 25–29, 2011, Bekasovo. Iss. 10(17)]. Moscow, 2011, pp. 359–367. URL: http://www.dialog-21.ru/media/1437/36.pdf

Kryuchkova O. Yu. Elektronnyy korpus russkoy dialektnoy rechi i printsipy ego razmetki [Electronic corpus of Russian dialect speech and the principles of its markup]. Izvestiya of Saratov University. New Series. Series: Philology. Journalism. 2007, vol. 7, iss. 1, pp. 30–34. URL: http://sarteorlingv.narod.ru/dialekt/elektr_korpus.html

Letuchiy A. B. Korpus dialektnykh tekstov: zadachi i problemy [Corpus of dialect texts: tasks and problems]. In: Natsional’nyy korpus russkogo yazyka: 2003–2005. Rezul’taty i perspektivy [The National Corpus of the Russian language: 2003–2005. Results and prospects]. Moscow, 2005, pp. 215–233. URL: http://ruscorpora.ru/sbornik2005/13letuchy.pdf

Moskvina T. N. Metody i podkhody korpusnoy lingvistiki v issledovaniyakh semantiki dialektnoy leksiki [Methods and approaches of corpus linguistics in studies of the semantics of dialect vocabulary]. Sovremennyye problemy nauki i obrazovaniya. 2014, no. 6. URL: http:// www.science-education.ru/ru/article/view?id=15784 (accessed 10.05.2017).

Newman J., Lin J., Butler T., Zhang E. The Wenzhou spoken corpus. In: Corpora. 2008, vol. 2, iss. 1, pp. 97–109. URL: http://dx.doi.org/10.3366/cor.2007.2.1.97

Perkuhn R., Keibel H., Kupietz M. Korpuslinguistik. Paderborn: Wilhelm Fink Verl., 2012, 144 p.

Rezanova Z. I. Lingvisticheskiy korpus “Tomskiy regional’nyy tekst”: tipologicheski relevantnyye parametry sbalansirovannosti i reprezentativnosti [Linguistic corpus “Tomsk regional text”: typologically relevant parameters of balance and representativeness]. Tomsk State University Journal of Philology. 2015, no. 1(33), pp. 38–50.

Rostova A. N. Metatekst kak forma eksplikatsii metayazykovogo soznaniya [Metatext as a form of explication of metalanguage consciousness]. Tomsk, TSU, 2000, 193 p.

Russkiye govory Srednego Priob’ya. Ch. 1 [Russian dialects of the Middle Ob region. Pt 1]. V. V. Palagina (Ed.). Tomsk, TSU, 1984, 208 p.

Russkiy yazyk povsednevnogo obshcheniya: osobennosti funktsionirovaniya v raz-nykh sotsial’nykh gruppakh [Russian language of everyday communication: features of functioning in different social groups]. N. V. Bogdanova-Beglaryan (Ed.). St Petersburg, Layka, 2016, 244 p.

Tomskaya dialektologicheskaya shkola: Istoriograficheskiy ocherk [Tomsk school of dialectology: A historiographical sketch]. O. I. Blinova (Ed.). Tomsk, TSU, 2006, 392 p.

Tregubova E. N. Mnogourovnevaya tematicheskaya razmetka kak instrument etnolingvisticheskoy reprezentatsii dialektnogo diskursa v elektronnom tekstovom korpuse [Multilevel thematic marking as an ethnolinguistic tool of dialectal discourse representation in digital text corpora]. Tomsk State University Journal of Philology. 2015, no. 1(33), pp. 66–77.

Zadumina P. N. O nekotorykh osobennostyakh sozdaniya mul’timediynogo korpusa regional’nykh tekstov [On some features of creating a multimedia corpus of regional texts]. In: Molodyye issledovateli – regionam: Materialy mezh-dunar. nauch. konf. T. 3. [Young researchers to regions. Proceedings of the intern. sci. conf. Vol. 3]. Vologda, 2004, pp. 194–196.

Zakharov V. P. Korpusnaya lingvistika: Ucheb.-metodich. posobiye [Corpus linguistics: Educational and methodical manual]. St. Petersburg, 2005, 48 p.

Zu Y., Chen Y., Zhang Y., Zhou L., Shen M., Huang J. A Super phonetic system and multidialect Chinese speech corpus for speech recognition. In: Proc. of Intern. Conf. on Spoken Language Processing. 2002. URL: http://www.colips.org/conferences/iscslp2006/anthology/2002/Papers/048.PDF

Institute of Philology
Nikolaeva st., 8, Novosibirsk, 630090, Russian Federation
+7-383-330-15-18, ifl@philology.nsc.ru
© Institute of Philology