基于全句内共现的现代汉语和汉语中介语词汇知识对比研究
发布时间:2018-11-26 17:02
【摘要】:词语共现作为词汇知识的重要组成部分,在以往的研究中颇受关注。在前人的研究中,词语共现的范围大都集中在所研究词语左右各5个词语以内。本文根据汉语的实际特点以及研究的需要,将共现范围调整为全句内,开发了基于现代汉语语料库和汉语中介语语料库的“汉语词语全句共现的自动提取”程序,为词汇知识的描述提供了可靠的素材。该程序可以在现代汉语语料库中自动提取指定词语的共现词、共现词距离、共现词义项等信息,在汉语中介语语料库中自动提取共现词、共现词距离、共现词词类并提供相应的汉语水平、母语背景等信息,并可以按照研究者的需要统计频次以及排序。基于“汉语词语全句共现的自动提取”程序所得到的共现信息,不仅可以用作记录词语的词汇知识,以及作为词义表征的部分用在计算机模拟研究中,还可以用在中介语对比分析的过程中。本文对比了“看”在现代汉语语料库和汉语中介语语料库之间,以及中介语各水平语料库之间的共现信息的差异。文章将“看”在现代汉语语料库和汉语中介语语料库中的共现词依照《同义词词林》分别进行语义归类,考察各类词语在中介语当中相对于在现代汉语当中使用过度或使用不足的程度。此外还考察了中介语各水平之间的共现用法的差异。这使得汉语作为第二语言习得的研究不再局限在以往的偏误分析,而是从词汇共现的角度深入考察了中介语和现代汉语之间的用法差异。文章得出的主要结论有:在“看”的共现词的语义分布中,中介语相对于现代汉语使用过度最严重的大类是“助语”,使用不足最严重的大类是“活动”;中介语相对于现代汉语使用过度最严重的三个中类依次是“抽象事物/文教”“抽象事物/社会政法”“物/地貌”,使用不足最严重的三个中类依次是“人/专名”“物/全身”“活动/行政管理”。汉语中介语四个水平的子库中,“看”的共现词的语义大类分布情况起伏不定。在学习一年半至两年时,共现词的整体语义大类分布与现代汉语差异最大,随后随着水平的提高,语义大类分布趋同于现代汉语。
[Abstract]:As an important part of lexical knowledge, lexical co-occurrence has attracted much attention in previous studies. In previous studies, the scope of cooccurrence of words is mostly concentrated in 5 words about each word studied. According to the actual characteristics of Chinese and the needs of the research, this paper adjusts the scope of co-occurrence to the whole sentence, and develops a program of "automatic extraction of Chinese words and phrases co-occurrence" based on modern Chinese corpus and Chinese interlanguage corpus. It provides reliable material for the description of lexical knowledge. The program can automatically extract the information such as cooccurrence words, cooccurrence words distance, co-occurrence terms and other information in modern Chinese corpus, and automatically extract co-occurrence words and cooccurrence words distance in Chinese interlanguage corpus. Co-occurrence of word categories and provide the corresponding Chinese level, mother tongue background and other information, and can be according to the needs of the researcher frequency and ranking. The co-occurrence information obtained from the program "automatic extraction of all sentences in Chinese words" can be used not only to record the lexical knowledge of words, but also to use them as part of word meaning representation in computer simulation research. It can also be used in the process of contrastive analysis of interlanguage. This paper compares the differences of co-occurrence information between the Modern Chinese Corpus and the Chinese Interlanguage Corpus, as well as between the Interlanguage Corpus and the Interlanguage level Corpus. In this paper, the co-occurrence words in modern Chinese corpus and Chinese interlanguage corpus are classified according to synonym forest. To investigate the degree of overuse or underuse of various words in interlanguage relative to modern Chinese. In addition, the differences of co-occurrence between different levels of interlanguage are investigated. This makes the study of Chinese as a second language acquisition no longer confined to the previous error analysis, but from the perspective of lexical co-occurrence, in-depth study of the interlanguage and modern Chinese usage differences. The main conclusions of this paper are as follows: in the semantic distribution of co-occurrence words of "look", the most serious category of interlanguage is "auxiliary language" compared with modern Chinese, and the most serious one is "activity"; The three most serious types of interlanguage used in modern Chinese are "abstract things / culture and education", "abstract things / social laws" and "things / landforms". The three most underused middle classes are "person / proper name", "object / body", "activity / administration". In the four levels of Chinese interlanguage subdatabase, the semantic distribution of co-occurrence words of "look" fluctuates. After a year and a half to two years of study, the overall semantic category distribution of co-occurrence words is most different from that of modern Chinese, and then, with the improvement of the level, the semantic large category distribution converges to modern Chinese.
【学位授予单位】:北京语言大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:H136
本文编号:2359126
[Abstract]:As an important part of lexical knowledge, lexical co-occurrence has attracted much attention in previous studies. In previous studies, the scope of cooccurrence of words is mostly concentrated in 5 words about each word studied. According to the actual characteristics of Chinese and the needs of the research, this paper adjusts the scope of co-occurrence to the whole sentence, and develops a program of "automatic extraction of Chinese words and phrases co-occurrence" based on modern Chinese corpus and Chinese interlanguage corpus. It provides reliable material for the description of lexical knowledge. The program can automatically extract the information such as cooccurrence words, cooccurrence words distance, co-occurrence terms and other information in modern Chinese corpus, and automatically extract co-occurrence words and cooccurrence words distance in Chinese interlanguage corpus. Co-occurrence of word categories and provide the corresponding Chinese level, mother tongue background and other information, and can be according to the needs of the researcher frequency and ranking. The co-occurrence information obtained from the program "automatic extraction of all sentences in Chinese words" can be used not only to record the lexical knowledge of words, but also to use them as part of word meaning representation in computer simulation research. It can also be used in the process of contrastive analysis of interlanguage. This paper compares the differences of co-occurrence information between the Modern Chinese Corpus and the Chinese Interlanguage Corpus, as well as between the Interlanguage Corpus and the Interlanguage level Corpus. In this paper, the co-occurrence words in modern Chinese corpus and Chinese interlanguage corpus are classified according to synonym forest. To investigate the degree of overuse or underuse of various words in interlanguage relative to modern Chinese. In addition, the differences of co-occurrence between different levels of interlanguage are investigated. This makes the study of Chinese as a second language acquisition no longer confined to the previous error analysis, but from the perspective of lexical co-occurrence, in-depth study of the interlanguage and modern Chinese usage differences. The main conclusions of this paper are as follows: in the semantic distribution of co-occurrence words of "look", the most serious category of interlanguage is "auxiliary language" compared with modern Chinese, and the most serious one is "activity"; The three most serious types of interlanguage used in modern Chinese are "abstract things / culture and education", "abstract things / social laws" and "things / landforms". The three most underused middle classes are "person / proper name", "object / body", "activity / administration". In the four levels of Chinese interlanguage subdatabase, the semantic distribution of co-occurrence words of "look" fluctuates. After a year and a half to two years of study, the overall semantic category distribution of co-occurrence words is most different from that of modern Chinese, and then, with the improvement of the level, the semantic large category distribution converges to modern Chinese.
【学位授予单位】:北京语言大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:H136
【参考文献】
相关期刊论文 前2条
1 年洪东;张霄军;;基于语料库的容器类隐喻名词短语研究——以“海洋”为例[J];心智与计算;2009年01期
2 储诚志;陈小荷;;建立“汉语中介语语料库系统”的基本设想[J];世界汉语教学;1993年03期
,本文编号:2359126
本文链接:https://www.wllwen.com/wenyilunwen/yuyanyishu/2359126.html