Institute of Philology of the Siberian Branch of Russian Academy of Sciences
Monuments of Folklore Siberian Journal of Philology Critique and Semiotics
Yazyki i fol’klor korennykh narodov Sibiri Syuzhetologiya i Syuzhetografiya
Institute of Philology of
the Siberian Branch of
Russian Academy of Sciences
По-русски
  
Siberian Journal of Philology
По-русски
Archive
Editorial board
Our ethical principles
Submission Requirements
Process for Submission & Publication
List of Typos
Search:

Author:

and/or Keyword:

Article

Name: What and how can a linguist get from digitized texts?

Authors: V. I. Belikov

In the section Linguistics

Issue 3, 2016Pages 17-34
UDK: 811.161.1, 81’33DOI: 10.17223/18137083/56/2

Abstract: The article is devoted to the limits of applicability of the online tools for automatic processing of digital texts (search engines, corpora, Google Books Ngram Viewer) to a linguostatistic study. Despite the common opinion about the objectivity of the results obtained after automatic processing of the text array, there are limitations and distortions of the data due to many reasons. One of them is the frequent lack of linguists in the teams of developers of such machines. In the article, the analysis of the frequency of use of culturally significant names and their spelling variants, generic forms of the verb and the prepositional variants of the control according to the different automatic means of analysis of the texts shows the complexity of interpreting the results of automatic processing of text arrays.

Keywords: digital text, linguistic statistics, linguistic online tools, word processing, corpus linguistics, grammatical variation

Bibliography:

Belikov V. I. Internet i orfografiya [Internet and spelling]. In: Komp'juternaja lingvistika i intellektual'nye tehnologii: Trudy mezhdunar. konf. «Dialog’2004» [Computational linguistics and intellectual technologies: Works of Intern. Conf. «Dialog’2004»]. M., Nauka, 2004.

Belikov V. I. K metodike korpusnogo issledovaniya leksiki [To the methodology of the corpus study of vocabulary]. In: Russkiy yazyk i novye tekhnologii [Russian language and new technologies]. M., NLO, 2014.

Belikov V. I. Slovar’ «Yazyki russkikh gorodov»: podbor primerov i Internet [The dictionary of the «Languages of Russian cities»: a selection of examples and the Internet]. In: Komp'juternaja lingvistika i intellektual'nye tehnologii: Trudy mezhdunar. konf. «Dialog’2006» [Computational linguistics and intellectual technologies: Works of Intern. Conf. «Dialog’2006»]. M., IPI RAN, 2006.

Belikov V. I. Yandex kak leksikograficheskij instrument [Yandex as a lexicographic tool]. In: Komp'juternaja lingvistika i intellektual'nye tehnologii: Trudy mezhdunar. konf. «Dialog’ 2003» [Computational linguistics and intellectual technologies: Works of Intern. Conf. «Dialog’ 2003»]. M., Nauka, 2003.

Belikov V. I., Kopylov N. Ju., Piperski A. Ch., Selegej V. P., Sharov S. A. Korpus kak jazyk: ot masshtabiruemosti k differencial’noj polnote [Corpus as language: from scalability to differential completeness]. In: Komp'juternaja lingvistika i intellektual'nye tehnologii: Trudy mezhdunar. konf. «Dialog». Vyp. 12 (19). T. 1 [Computational linguistics and intellectual technologies: Materials of Intern. Conf. «Dialog».]. Iss. 12 (19), vol. 1. M., RGGU, 2013.

Epshtein M. N. Mysli v chislah: Amerika i Rossija v zerkalah interneta [Thoughts in numbers: America and Russia in the mirror of the Internet] In: Filosofskiy vek. Al'manakh. Vyp. 32.

Bendzhamin Franklin i Rossiya: k 300-letiyu so dnya rozhdeniya. Ch.' 2 [Philosophical Age: Alm. Vol. 32: Benjamin Franklin and Russia: the 300th anniversary of his birth. Pt 2]. SPb., Tsentr istorii idey, 2006.

Evgen’eva A. P. (ed.) Slovar’ russkogo yazyka v 4 t. 2-e izd., ispr. i dop. [Dictionary of Russian language in 4 vols. 2nd ed., rev. and ext.]. M., Rus. yaz., 1981–1984, vols. 1–4.

Jepshtejn M. N. Slovo nedeli: numerizm [Word of the week: numerism]. In: Dar slova. Proektivnyy leksikon Mihaila Epshteina [Projective lexicon of Mikhail Epstein]. 2003, 8 sept., no. 71 (111). Available at: http://www.emory.edu/INTELNET/dar71.html

Krongauz M. A. Samouchitel' olbanskogo [Tutorial of Olbany]. M., AST, 2013. Kuznetsov S. A. Yazykovaya norma i pravila rechevoy deyatel'nosti [Language norm and speech activity rules]. In: Kommentarij k Federal'nomu zakonu «O gosudarstvennom jazyke Rossijskoj Federacii». Ch. 1: Doktrinal'nyj i normativno-pravovoj kommentarij [Commentary to the Federal Law «On state language of the Russian Federation». Pt 1: Doctrinal and legal commentary]. SPb, SPbSU, 2009.

Shherba L. V. O trojakom aspekte jazykovyh javlenij i ob jeksperimente v jazykoznanii [On the threefold aspect of language phenomena and about the experiment in linguistics]. In: Jazykovaja sistema i rechevaja dejatel’nost’ [Language system and speech activity]. Leningrad, Nauka, 1974.

Trudy mezhdunarodnoy konferentsii «Korpusnaya lingvistika – 2015» [Proceedings of the international conference «Corpus linguistics – 2015»]. SPb, SPbSU, 2015.

Yazyk i mysl': sovremennaya kognitivnaya lingvistika [Language and thought: the modern cognitive linguistics]. M., 2015.

Institute of Philology
Nikolaeva st., 8, Novosibirsk, 630090, Russian Federation
+7-383-330-15-18, ifl@philology.nsc.ru
© Institute of Philology