The Structuralist Tradition Meets Empirical Data: Corpus Data Enhancing the Czech Internet Language Reference Book

Kováříková, Dominika; Beneš, Martin; Smejkalová, Kamila; Kovářík, Oleg

doi:10.3366/word.2023.0230

The Structuralist Tradition Meets Empirical Data: Corpus Data Enhancing the Czech Internet Language Reference Book

dc.contributor.author	Kováříková, Dominika
dc.contributor.author	Beneš, Martin
dc.contributor.author	Smejkalová, Kamila
dc.contributor.author	Kovářík, Oleg
dc.date.accessioned	2025-02-07T10:10:58Z
dc.date.available	2025-02-07T10:10:58Z
dc.date.issued	2023
dc.identifier.uri	https://hdl.handle.net/20.500.14178/2878
dc.description.abstract	This paper demonstrates how the corpus grammar tool GramatiKat can be used to improve and refine morphological information in the Internet Language Reference Book (ILRB), which presents complete declension paradigms for 45,632 standard Czech nouns. The paradigm tables are based mainly on morphological types, following structuralist conceptions of language as a fully articulated system. The paper discusses how to update the ILRB and provide users with empirically based grammatical information for individual word forms in each cell of the paradigm. All noun lemmas have been investigated using the GramatiKat tool for research into grammatical categories in Czech. The tool observes the distribution of word forms of a particular lexeme in comparison with the standard distribution across the whole word class. It is capable of identifying nouns that have an unusually high occurrence of a certain word form, as well as nouns with unattested word forms. GramatiKat is based on the data from two corpora of Czech written texts, SYN2015 and SYN2020 (200 million word tokens). The paper investigates the relationship between defectiveness and overabundance on one side and language variability and potentiality on the other. Based on the unique combination of data from the ILRB and GramatiKat, the paper suggests how information about unusually frequent or overabundant word forms as well as unattested ones should be pointed out, so that ILRB provides the user with accurate, empirically based data.	en
dc.language.iso	en
dc.relation.url	https://doi.org/10.3366/word.2023.0230
dc.rights	Creative Commons Uveďte původ 4.0 International	cs
dc.rights	Creative Commons Attribution 4.0 International	en
dc.title	The Structuralist Tradition Meets Empirical Data: Corpus Data Enhancing the Czech Internet Language Reference Book	en
dcterms.accessRights	openAccess
dcterms.license	https://creativecommons.org/licenses/by/4.0/legalcode
dc.date.updated	2025-02-07T10:10:58Z
dc.subject.keyword	Czech	en
dc.subject.keyword	codification	en
dc.subject.keyword	corpora	en
dc.subject.keyword	declension paradigm	en
dc.subject.keyword	defectiveness	en
dc.subject.keyword	GramatiKat	en
dc.subject.keyword	Internet Language Reference Book	en
dc.subject.keyword	overabundance	en
dc.subject.keyword	usage	en
dc.subject.keyword	word form variants	en
dc.identifier.eissn	1755-2036
dc.relation.fundingReference	info:eu-repo/grantAgreement/UK/COOP/COOP
dc.relation.fundingReference	info:eu-repo/grantAgreement/AHRC/AHRC/AH/T002859/1
dc.relation.fundingReference	info:eu-repo/grantAgreement/MSM/LM/LM2023044
dc.date.embargoStartDate	2025-02-07
dc.type.obd	73
dc.type.version	info:eu-repo/semantics/publishedVersion
dc.identifier.doi	10.3366/word.2023.0230
dc.identifier.utWos	001099547400005
dc.identifier.eidScopus	2-s2.0-85179302875
dc.identifier.obd	632865
dc.identifier.riv	RIV/00216208:11210/23:10465205
dc.subject.rivPrimary	60000::60200::60203
dcterms.isPartOf.name	Word Structure
dcterms.isPartOf.issn	1750-1245
dcterms.isPartOf.journalYear	2023
dcterms.isPartOf.journalVolume	16
dcterms.isPartOf.journalIssue	2-3
uk.faculty.primaryId	114
uk.faculty.primaryName	Filozofická fakulta	cs
uk.faculty.primaryName	Faculty of Arts	en
uk.department.primaryId	812
uk.department.primaryName	Ústav Českého národního korpusu	cs
uk.department.primaryName	Institute of the Czech National Corpus	en
dc.description.pageRange	233-257
dc.type.obdHierarchyCs	ČLÁNEK V ČASOPISU::článek v časopisu::původní článek	cs
dc.type.obdHierarchyEn	JOURNAL ARTICLE::journal article::original article	en
dc.type.obdHierarchyCode	73::152::206	en
uk.displayTitle	The Structuralist Tradition Meets Empirical Data: Corpus Data Enhancing the Czech Internet Language Reference Book	en

Soubory tohoto záznamu

Název:: Structuralist tradition printed ...
Velikost:: 1.554Mb
Formát:: PDF

Zobrazit/otevřít

Tento záznam se objevuje v následujících kolekcích

Filozofická fakulta

Zobrazit minimální záznam