热点事件新闻语料库的研制及词汇研究
发布时间:2018-02-23 19:10
本文关键词: 热点事件 新闻语域 语料库 词频统计 引发—持续模式 出处:《南京师范大学》2012年硕士论文 论文类型:学位论文
【摘要】:国内对新闻语言的研究取得一定成果,新闻语言研究的论著相继发表、出版,但研究的出发点通常是写作和修辞,讨论语言如何去适应新闻写作的要求,如何增强新闻语言的表达效果,而基于社会热点事件语料库的新闻语言研究则很少。 本研究从语言学的角度、运用语言学理论研究新闻语言。首先,对现代汉语语域信息库进行回溯。已建成的日常、法律、商务、体育语域信息库为语言分语域研究提供了第一手资料,基于语料库的分语域语言研究取得了一定成果,本研究为信息库中的新闻语域部分。其次,建立“社会热点事件新闻语料库”。本研究收录《扬子晚报》2009年全年社会热点事件,根据筛选标准,最终筛选出48.9万字的热点事件。其中70%是PDF形式,需要利用OCR软件将其转换成word形式,转换过程中进行校对,以保证语料的正确性。为方便以后的查找、校对,对语料库中的语料还要进行分类及编码。本新闻语域语料库包含33件热点事件,库中共365个文件,每条新闻都有一个编码,并附有新闻标题,报道的时间、记者、版面及字数统计。在确定语料属性及语料库研制原则的前提下,按照语料库的研制步骤,对语料库进行深度加工。本研究采取机器自动分词及词性标注方式,再辅以人工校对。对分词及词性标注过程中出现的问题再进行讨论,使其适合新闻语域的语言特点,为基于语料库的新闻语言研究打下基础,最终建成赋码语料库。最后利用“社会热点事件新闻语料库”。对语料库中的词汇进行词频统计制成《热点事件新闻词汇频度表》,并编制《热点事件新闻基本词汇表》。将热点事件新闻词表(选取高频词、次高频词及部分中频词)与通用词表比较,经过筛选得到特殊词汇216个,参考语义及语料分布对特殊词汇进行分类。全部词汇都要回归到语料库中进行检索,根据热点事件发生特点分为“表示时间”、“事件描述”、“网络推动”、“媒体介入”、“司法介入”、“事件影响”六大类。特殊词汇的分类并不是主观断定,而是基于语料库,该词语在语料库中的分布决定其所属类别,在分类基础上进而梳理热点事件的引发—持续模式。 本研究坚持定量研究和定性研究相结合的方法,建成的“社会热点事件新闻语料库”,及提取的《热点事件新闻基本词汇表》,为新闻教学、新闻辞典的编撰及新闻语言学的发展提供参考。梳理的热点事件报道模式对新闻采编及报道有一定的启示意义。
[Abstract]:Some achievements have been made in the study of the language of news in China, and the works on the study of news language have been published and published one after another. However, the starting point of the research is usually writing and rhetoric, discussing how language can adapt to the requirements of news writing. How to enhance the expression effect of news language, but the research of news language based on social hot event corpus is rare. From the linguistic point of view, this study uses linguistic theory to study news language. First of all, it traces the modern Chinese register information database. The sports register information database provides the first-hand information for the research of the language register. The research on the register language based on the corpus has made some achievements, and this research is the news register part of the information database. Secondly, In this study, the Yangzi Evening News was collected for the whole year of 2009. According to the screening criteria, 489,000 words of hot events were screened out. 70% of them are in the form of PDF. It is necessary to use OCR software to convert it into word form and proofread it in the process of conversion to ensure the correctness of the corpus. This news register corpus contains 33 hot events, 365 documents, each piece of news has a coding, and with the news title, the time of the report, the reporter, Layout and word count. On the premise of determining the data attributes and the principles of corpus development, the corpus is further processed according to the development steps of the corpus. In this study, automatic word segmentation and part of speech tagging are adopted. The problems in the process of word segmentation and part of speech tagging are discussed again to make them suitable for the language characteristics of news register and lay the foundation for the research of news language based on corpus. Finally, the code-assigned corpus was built. Finally, by using the "Social Hot event News Corpus", the "Hot event News Vocabulary Frequency Table" was obtained by the word frequency statistics of the vocabulary in the corpus, and the "Hot event News basic Vocabulary" was compiled. Table >. Select the hot event news word list (select high-frequency words, Compared with the general vocabulary, 216 special words were selected and classified by reference to semantic and corpus distribution. All the words were returned to the corpus for retrieval. According to the characteristics of hot events, they are divided into six categories: "express time", "event description", "network push", "media intervention", "judicial intervention" and "event influence". The classification of special words is not subjective determination, but based on corpus. The distribution of the word in the corpus determines its category, and then combs the initiation-persistence pattern of hot events on the basis of classification. This study adheres to the method of combining quantitative and qualitative research, the "social hot event news corpus" and the "basic glossary of hot event journalism", which are news teaching. The compilation of news dictionaries and the development of journalistic linguistics provide references for the compilation of news dictionaries and the development of journalistic linguistics.
【学位授予单位】:南京师范大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:H136
【参考文献】
相关期刊论文 前9条
1 李葆嘉;论言语的语层性、语域性和语体性[J];语文研究;2003年01期
2 俞士汶,段慧明,朱学锋,孙斌;北京大学现代汉语语料库基本加工规范(续)[J];中文信息学报;2002年06期
3 武文杰;徐艳;;试论网络语言的发展前景[J];商场现代化;2006年36期
4 陈建华;网络语言的发展及其规范[J];福州大学学报(哲学社会科学版);2004年01期
5 苏新春;汉语词汇定量研究的运用及其特点——兼谈《语言学方法论》的定量研究观[J];厦门大学学报(哲学社会科学版);2001年04期
6 李葆嘉;论语言科学与语言技术的新思维[J];南京师范大学文学院学报;2002年01期
7 俞士汶,朱学锋,段慧明;大规模现代汉语标注语料库的加工规范[J];中文信息学报;2000年06期
8 许家金;语料库语言学的理论解析[J];外语教学;2003年06期
9 崔刚,盛永梅;语料库中语料的标注[J];清华大学学报(哲学社会科学版);2000年01期
相关硕士学位论文 前3条
1 伍欣;近十年来报刊用语特点研究[D];四川师范大学;2006年
2 张会鹏;中文词法分析技术的研究与实现[D];哈尔滨工业大学;2006年
3 封鹏程;现代汉语法律语料库的建立及其词汇计量研究[D];南京师范大学;2005年
,本文编号:1527274
本文链接:https://www.wllwen.com/wenyilunwen/hanyulw/1527274.html