热点事件新闻语料库的研制及词汇研究

发布时间：2018-02-23 19:10

本文关键词： 热点事件新闻语域语料库词频统计引发—持续模式　出处：《南京师范大学》2012年硕士论文　论文类型：学位论文

【摘要】：国内对新闻语言的研究取得一定成果,新闻语言研究的论著相继发表、出版,但研究的出发点通常是写作和修辞,讨论语言如何去适应新闻写作的要求,如何增强新闻语言的表达效果,而基于社会热点事件语料库的新闻语言研究则很少。本研究从语言学的角度、运用语言学理论研究新闻语言。首先,对现代汉语语域信息库进行回溯。已建成的日常、法律、商务、体育语域信息库为语言分语域研究提供了第一手资料,基于语料库的分语域语言研究取得了一定成果,本研究为信息库中的新闻语域部分。其次,建立“社会热点事件新闻语料库”。本研究收录《扬子晚报》2009年全年社会热点事件,根据筛选标准,最终筛选出48.9万字的热点事件。其中70%是PDF形式,需要利用OCR软件将其转换成word形式,转换过程中进行校对,以保证语料的正确性。为方便以后的查找、校对,对语料库中的语料还要进行分类及编码。本新闻语域语料库包含33件热点事件,库中共365个文件,每条新闻都有一个编码,并附有新闻标题,报道的时间、记者、版面及字数统计。在确定语料属性及语料库研制原则的前提下,按照语料库的研制步骤,对语料库进行深度加工。本研究采取机器自动分词及词性标注方式,再辅以人工校对。对分词及词性标注过程中出现的问题再进行讨论,使其适合新闻语域的语言特点,为基于语料库的新闻语言研究打下基础,最终建成赋码语料库。最后利用“社会热点事件新闻语料库”。对语料库中的词汇进行词频统计制成《热点事件新闻词汇频度表》,并编制《热点事件新闻基本词汇表》。将热点事件新闻词表(选取高频词、次高频词及部分中频词)与通用词表比较,经过筛选得到特殊词汇216个,参考语义及语料分布对特殊词汇进行分类。全部词汇都要回归到语料库中进行检索,根据热点事件发生特点分为“表示时间”、“事件描述”、“网络推动”、“媒体介入”、“司法介入”、“事件影响”六大类。特殊词汇的分类并不是主观断定,而是基于语料库,该词语在语料库中的分布决定其所属类别,在分类基础上进而梳理热点事件的引发—持续模式。本研究坚持定量研究和定性研究相结合的方法,建成的“社会热点事件新闻语料库”,及提取的《热点事件新闻基本词汇表》,为新闻教学、新闻辞典的编撰及新闻语言学的发展提供参考。梳理的热点事件报道模式对新闻采编及报道有一定的启示意义。
[Abstract]:Some achievements have been made in the study of the language of news in China, and the works on the study of news language have been published and published one after another. However, the starting point of the research is usually writing and rhetoric, discussing how language can adapt to the requirements of news writing. How to enhance the expression effect of news language, but the research of news language based on social hot event corpus is rare. From the linguistic point of view, this study uses linguistic theory to study news language. First of all, it traces the modern Chinese register information database. The sports register information database provides the first-hand information for the research of the language register. The research on the register language based on the corpus has made some achievements, and this research is the news register part of the information database. Secondly, In this study, the Yangzi Evening News was collected for the whole year of 2009. According to the screening criteria, 489,000 words of hot events were screened out. 70% of them are in the form of PDF. It is necessary to use OCR software to convert it into word form and proofread it in the process of conversion to ensure the correctness of the corpus. This news register corpus contains 33 hot events, 365 documents, each piece of news has a coding, and with the news title, the time of the report, the reporter, Layout and word count. On the premise of determining the data attributes and the principles of corpus development, the corpus is further processed according to the development steps of the corpus. In this study, automatic word segmentation and part of speech tagging are adopted. The problems in the process of word segmentation and part of speech tagging are discussed again to make them suitable for the language characteristics of news register and lay the foundation for the research of news language based on corpus. Finally, the code-assigned corpus was built. Finally, by using the "Social Hot event News Corpus", the "Hot event News Vocabulary Frequency Table" was obtained by the word frequency statistics of the vocabulary in the corpus, and the "Hot event News basic Vocabulary" was compiled. Table >. Select the hot event news word list (select high-frequency words, Compared with the general vocabulary, 216 special words were selected and classified by reference to semantic and corpus distribution. All the words were returned to the corpus for retrieval. According to the characteristics of hot events, they are divided into six categories: "express time", "event description", "network push", "media intervention", "judicial intervention" and "event influence". The classification of special words is not subjective determination, but based on corpus. The distribution of the word in the corpus determines its category, and then combs the initiation-persistence pattern of hot events on the basis of classification. This study adheres to the method of combining quantitative and qualitative research, the "social hot event news corpus" and the "basic glossary of hot event journalism", which are news teaching. The compilation of news dictionaries and the development of journalistic linguistics provide references for the compilation of news dictionaries and the development of journalistic linguistics.
【学位授予单位】：南京师范大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：H136

【参考文献】