基于依存语法的蒙古语宾述关系描述与识别研究
发布时间:2018-06-02 22:03
本文选题:蒙古语树库 + 依存语法 ; 参考:《内蒙古大学》2017年硕士论文
【摘要】:蒙古文信息处理研究工作中句法分析属于关键技术。近年来随着信息处理工作的深入,诸如文本校对、机器翻译等应用系统的研发,对句法分析结果提出了更高的要求。本文以蒙古语传统语法学研究为理论依据,在蒙古语词法分析、依存句法分析等信息处理成果的基础上,从统计学和计量学角度,对现代蒙古语宾述关系动态特性进行描述并设计实现了自动识别。宾述关系是一种比较复杂的依存关系类型,在蒙古语句子中所占的比例很高。蒙古语形态变化复杂,致使提高蒙古语宾述关系识别准确率也变得困难,其主要难点在于对省略宾格形式出现的直接宾述关系识别与间接宾述关系识别。正确识别蒙古语宾述关系对于蒙古语句法分析具有重要的意义。主要体现在以下两点:①传统语言学研究方面,用统计学方法为传统语法学原理提供了验证手段和数据。②信息处理方面,扩充了树库语料的同时为细化蒙古语句法分析研究提出了创新型的模式。本文分以下几个步骤对蒙古语宾述关系进行动态特性描述和自动标识研究:一、对现代蒙古语依存树库进行扩充并校对完善。新增校对树库达到189048个词,13154个句子规模。二、对蒙古语宾述关系词法特点、搭配特点、依存句法特点等进行了详细的统计分析,为人工编写识别规则和机器学习特征模板的制定提供了必要的理论依据。三、对蒙古语宾述关系的识别实验分别进行了四组,即①基于CRF统计模型的识别实验;②加入人工编写规则的CRF统计模型识别实验;③加入有条件限制规则的CRF统计模型识别实验。④修订规则后的CRF统计模型识别实验。准确率分别达到89.81%、89.80%、89.80%和89.73%。
[Abstract]:Syntactic analysis is a key technology in Mongolian information processing. In recent years, with the development of information processing, such as text proofreading, machine translation and other application systems, the result of syntactic parsing has been put forward higher requirements. Based on the theoretical basis of Mongolian traditional grammar research and the results of information processing such as lexical analysis and dependency syntax analysis in Mongolian, this paper is based on statistics and metrology. This paper describes the dynamic characteristics of object description relation in modern Mongolian language and realizes automatic recognition. Object-declarative relation is a complex type of dependency relation, which accounts for a high proportion in Mongolian sentences. The complexity of Mongolian morphology makes it difficult to improve the accuracy of object description relationship recognition in Mongolian language. The main difficulty lies in the recognition of direct object relation and indirect object description relation in the form of elliptical object. It is of great significance to correctly recognize the object-to-state relation in Mongolian language for the parsing of Mongolian syntax. Mainly reflected in the following two points: 1. Traditional linguistic research. The statistical method provides the verification means for the traditional grammar principles and the information processing of data .2. The tree corpus is expanded and an innovative model is proposed for the refinement of Mongolian syntactic analysis. This paper is divided into the following steps to describe the dynamic characteristics and automatic identification of Mongolian object description: first, to expand and improve the modern Mongolian dependency tree library. The new proofreading treebank reached 18,9048 words and 13154 sentences. Secondly, this paper makes a detailed statistical analysis on the lexical features, collocation characteristics and dependency syntax features of the object relation in Mongolian language, which provides a necessary theoretical basis for the manual writing of recognition rules and the establishment of machine learning feature templates. Third, four groups of recognition experiments of Mongolian object-declarative relation are carried out, that is, 1 recognition experiment based on CRF statistical model and 2 CRF statistical model recognition experiment based on manual compiling rule; (3) the experiment of CRF statistical model recognition after adding conditional restriction rule to CRF statistical model recognition experiment. The accuracy rate was 89.81%, 89.80% and 89.73%, respectively.
【学位授予单位】:内蒙古大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:H212
【参考文献】
相关期刊论文 前10条
1 哈斯;布音其其格;;基于蒙古语名词语义网的同形词歧义消除研究[J];中文信息学报;2016年06期
2 乌兰;达胡白乙拉;关晓p,
本文编号:1970363
本文链接:https://www.wllwen.com/shoufeilunwen/zaizhiboshi/1970363.html
教材专著