Zobrazit minimální záznam

The Structuralist Tradition Meets Empirical Data: Corpus Data Enhancing the Czech Internet Language Reference Book

dc.contributor.authorKováříková, Dominika
dc.contributor.authorBeneš, Martin
dc.contributor.authorSmejkalová, Kamila
dc.contributor.authorKovářík, Oleg
dc.date.accessioned2025-02-07T10:10:58Z
dc.date.available2025-02-07T10:10:58Z
dc.date.issued2023
dc.identifier.urihttps://hdl.handle.net/20.500.14178/2878
dc.description.abstractThis paper demonstrates how the corpus grammar tool GramatiKat can be used to improve and refine morphological information in the Internet Language Reference Book (ILRB), which presents complete declension paradigms for 45,632 standard Czech nouns. The paradigm tables are based mainly on morphological types, following structuralist conceptions of language as a fully articulated system. The paper discusses how to update the ILRB and provide users with empirically based grammatical information for individual word forms in each cell of the paradigm. All noun lemmas have been investigated using the GramatiKat tool for research into grammatical categories in Czech. The tool observes the distribution of word forms of a particular lexeme in comparison with the standard distribution across the whole word class. It is capable of identifying nouns that have an unusually high occurrence of a certain word form, as well as nouns with unattested word forms. GramatiKat is based on the data from two corpora of Czech written texts, SYN2015 and SYN2020 (200 million word tokens). The paper investigates the relationship between defectiveness and overabundance on one side and language variability and potentiality on the other. Based on the unique combination of data from the ILRB and GramatiKat, the paper suggests how information about unusually frequent or overabundant word forms as well as unattested ones should be pointed out, so that ILRB provides the user with accurate, empirically based data.en
dc.language.isoen
dc.relation.urlhttps://doi.org/10.3366/word.2023.0230
dc.rightsCreative Commons Uveďte původ 4.0 Internationalcs
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.titleThe Structuralist Tradition Meets Empirical Data: Corpus Data Enhancing the Czech Internet Language Reference Booken
dcterms.accessRightsopenAccess
dcterms.licensehttps://creativecommons.org/licenses/by/4.0/legalcode
dc.date.updated2025-02-07T10:10:58Z
dc.subject.keywordCzechen
dc.subject.keywordcodificationen
dc.subject.keywordcorporaen
dc.subject.keyworddeclension paradigmen
dc.subject.keyworddefectivenessen
dc.subject.keywordGramatiKaten
dc.subject.keywordInternet Language Reference Booken
dc.subject.keywordoverabundanceen
dc.subject.keywordusageen
dc.subject.keywordword form variantsen
dc.identifier.eissn1755-2036
dc.relation.fundingReferenceinfo:eu-repo/grantAgreement/UK/COOP/COOP
dc.relation.fundingReferenceinfo:eu-repo/grantAgreement/AHRC/AHRC/AH/T002859/1
dc.relation.fundingReferenceinfo:eu-repo/grantAgreement/MSM/LM/LM2023044
dc.date.embargoStartDate2025-02-07
dc.type.obd73
dc.type.versioninfo:eu-repo/semantics/publishedVersion
dc.identifier.doi10.3366/word.2023.0230
dc.identifier.utWos001099547400005
dc.identifier.eidScopus2-s2.0-85179302875
dc.identifier.obd632865
dc.identifier.rivRIV/00216208:11210/23:10465205
dc.subject.rivPrimary60000::60200::60203
dcterms.isPartOf.nameWord Structure
dcterms.isPartOf.issn1750-1245
dcterms.isPartOf.journalYear2023
dcterms.isPartOf.journalVolume16
dcterms.isPartOf.journalIssue2-3
uk.faculty.primaryId114
uk.faculty.primaryNameFilozofická fakultacs
uk.faculty.primaryNameFaculty of Artsen
uk.department.primaryId812
uk.department.primaryNameÚstav Českého národního korpusucs
uk.department.primaryNameInstitute of the Czech National Corpusen
dc.description.pageRange233-257
dc.type.obdHierarchyCsČLÁNEK V ČASOPISU::článek v časopisu::původní článekcs
dc.type.obdHierarchyEnJOURNAL ARTICLE::journal article::original articleen
dc.type.obdHierarchyCode73::152::206en
uk.displayTitleThe Structuralist Tradition Meets Empirical Data: Corpus Data Enhancing the Czech Internet Language Reference Booken


Soubory tohoto záznamu

Thumbnail

Tento záznam se objevuje v následujících kolekcích

Zobrazit minimální záznam