学生论文
|
论文查询结果 |
返回搜索 |
|
|
|
| 论文编号: | 11684 | |
| 作者编号: | 1120170912 | |
| 上传时间: | 2020/6/19 17:22:18 | |
| 中文题目: | 认知计算下的政府公文篇章语义理解研究 | |
| 英文题目: | Discourse Semantic Understanding of Government Documents under Cognitive Computing | |
| 指导老师: | 王芳 | |
| 中文关键字: | 篇章语义理解;政府公文;认知计算;深度学习;文本挖掘 | |
| 英文关键字: | Discourse Semantic Understanding; Government Documents; Cognitive Computing; Deep Learning; Text Mining | |
| 中文摘要: | 篇章作为最高层级的文本单位,由一系列连续的句子和段落以特定的语义结构关系连接而成,表现为一个意义连贯的语言整体。篇章语义理解通过计算分析篇章语义结构及其各成分间的语义关系,能实现机器对文本内容的深度理解,对有效提升文献信息资源的智能化处理水平及开发利用技术能力具有重要作用。但当前,篇章语义理解的研究成果主要集中于英语篇章,由于语言特点差异难以直接应用于汉语篇章语义理解。并且,汉语篇章语义的复杂性和研究难度,也导致汉语篇章语义理解研究成果严重匮乏,不能有效满足实际应用需求。 政府公文作为一种重要的文献信息资源,对其开发利用具有很强的现实意义,然而现阶段其开发利用程度并不高,特别是在基于公文篇章结构与语义特征的智能处理及挖掘应用方面还十分薄弱,迫切需要实现对政府公文的篇章语义理解。但是,政府公文具有特殊的篇章语义现象与规律,现有的篇章语义理解理论无法深入有效地形式化表示其篇章语义,而且现有的计算方法也存在性能不足和不能有效面向政府公文篇章语义特点等问题。鉴于此,本文开展面向政府公文篇章语义理解的形式化表示与计算方法研究,并探析其应用策略。 本文需要解决如下三个大的研究问题:在形式化表示研究问题上,如何基于政府公文篇章语义特点构建其篇章语义体系?在计算方法研究问题上,认知计算作为相关领域特别是认知情报学中最重要的发展方向,如何设计并有效实现面向政府公文篇章语义理解的认知计算,使之更好地满足或接近实际应用需求?在应用策略研究问题上,篇章语义理解在相关应用任务中有何具体价值,其应用策略和效果如何? 针对研究问题,本文的主要研究工作包括:第一,构建了政府公文篇章语义结构表示与关系体系,包括政府公文篇章功能结构树形表示与要素体系,以及面向功能要素的篇章逻辑语义结构树形表示与关系体系。第二,基于篇章语义体系,构建了用于计算研究的政府公文篇章语义理解语料库,包括2,201篇政府公文正文、46,878个自然段、121,835个自然句和6,334,913个字,共计50,004个篇章功能结构要素标签、21,346个段群逻辑语义结构关系标签和52,078个句群逻辑语义结构关系标签。第三,提出并设计实现了一套面向政府公文篇章语义理解的认知计算框架、模型与方法,该框架下包括基于多头自注意力的篇章语义深度学习模型、基于策略梯度的篇章语义深度强化学习方法及基于三支决策和多重策略融合的篇章语义主动学习方法,且通过实验验证了其有效性。第四,提出并明晰了政府公文篇章语义理解的基本应用策略与价值,以文献情报分析中常用的关键词提取和信息抽取这两项文本挖掘任务为例,分别研究了政府公文篇章语义理解在无监督和监督式机器学习中的基本应用策略及效果。 本文的创新点有以下两项: (一)构建了政府公文的篇章语义结构表示与关系体系,并基于该体系构建了较大规模的政府公文篇章语义理解语料库。为政府公文的篇章语义理解及智能处理提供了行之有效的形式化表示依据与形式标记支撑。 (二)设计并实现了基于多头自注意力的篇章语义深度学习模型、基于策略梯度的篇章语义深度强化学习方法及基于三支决策和多重策略融合的篇章语义主动学习方法,为政府公文篇章语义理解提供了一套更为有效的计算模型方法: (1)在篇章语义深度学习模型中,对多头自注意力机制进行改进,构造了一种基于特征分区计算的多头分区自注意力机制,并设计了BERT预训练语言特征、政府公文篇章内容特征、政府公文篇章形式特征和政府公文篇章位置特征等原始特征嵌入,实验结果显示了该模型性能的优异性。 (2)篇章语义深度强化学习方法以本文构建的上述模型为基础,通过对移进归约转移系统和策略梯度方法在篇章语义理解任务上的改造,实现篇章语义深度策略强化学习模型构建及其策略梯度强化学习过程,从而获取篇章语义的更优转移解析策略。实验结果显示,尽管在某些子任务上的表现较之本文篇章语义深度学习模型略低,但整体上更加有效地提升了模型性能。 (3)篇章语义主动学习方法以本文设计的不确定性策略、半监督策略、代表性策略和差异性策略为多重策略,融合粒认知计算中三支决策理论的序贯思想,实现篇章语义的深度主动学习过程,使模型在增量样本下具有更高效的主动学习与迭代优化能力,从而更接近或满足篇章语义理解的实际应用需求。 值得说明的是,本文面向政府公文篇章语义理解的形式化表示、计算方法与应用研究思路及实现流程,具有较高的可移植性和可扩展性,可适于其它体裁或相关领域的篇章语义理解研究与实践。图56幅,表39个,参考文献341篇。 | |
| 英文摘要: | As the top-level text unit, discourse consists of a series of consecutive sentences and paragraphs that are connected by specific semantic structure and relations, and indicates a whole language unit with coherent meaning. By analyzing the semantic structure and relations of discourse, discourse semantic understanding can make the machine understand the content of texts in depth. It is the necessary technical basis for deeper literature information analysis, and plays an important role in improving the effect of intelligent mining, processing and utilization of literature information resources. At present, studies on discourse semantic understanding are mainly limited to English discourse, and their results are difficult to be directly applied to Chinese discourse due to the differences in language characteristics. Besides, the complexity of Chinese discourse semantics and the research difficulty therefrom also lead to the serious lack of research results that are increasingly needed in practice. As a kind of important government information resource, the exploitation of government documents has great practical significance. But at present, it is still insufficient, especially in the application of intelligent processing and mining based on discourse structure and semantic features. To this end, discourse semantic understanding technology is needed urgently. However, because government documents have special semantic phenomena and rules, the existing theories of discourse semantic understanding can not represent their characteristics deeply and effectively, and the existing computing methods of discourse semantic understanding also have deficiencies in performance and effectiveness when facing government documents. In view of this, this paper studies the formalized representation and computing methods of the discourse semantics of government documents, and explores their application strategies. This paper solves the following three major research questions: In the research of formalized representation, how to build the discourse semantic system of government documents based on their discourse semantic characteristics? In the research of computing method, since cognitive computing is the most important development direction in related fields, especially in Cognitive Informatics, how to design and implement a cognitive computing method for discourse semantic understanding of government documents, so as to better meet the actual application needs? In the research of application strategies, what are the specific values of discourse semantic understanding in related application tasks, and what are its application strategies and effects? In view of the research questions, the main research work of this paper includes: firstly, this paper builds a discourse semantic structure representation and relation system of government documents, including the discourse functional structure tree representation and element system of government documents, as well as the discourse functional element oriented logical semantic structure tree representation and relation system. Secondly, based on the discourse semantic system, a corpus for discourse semantic understanding of government documents for computing research is builded, including 2,201 government documents, 46,878 natural paragraphs, 121,835 natural sentences and 6,334,913 words, a total of 50,004 labels of discourse functional structural elements, 21,346 logical semantic structure relation labels of paragraph groups and 52,078 logical semantic structure relation labels of sentence groups. Thirdly, a cognitive computing framework for discourse semantic understanding of government documents is proposed and implemented. The framework includes a deep learning model of discourse semantic understanding based on multi-headed self-attention mechanism, a deep reinforcement learning method of discourse semantic understanding based on policy gradient, and a active learning method of discourse semantic understanding based on three-way decisions theory and multi-strategy-fused. In this paper, their effectivenesses are verified by experiments. Fourthly, it puts forward the basic application strategy of discourse semantic understanding of government documents and clarifies its value. Taking the two text mining tasks of keyword extraction and information extraction commonly used in literature information analysis as examples, it studies and analyzes the application strategies and basic effects of discourse semantic understanding of government documents in unsupervised and supervised machine learning respectively. The innovations of this paper are as follows: 1. This paper builds a discourse semantic structure representation and relation system of government documents, and builds a certain scale of corpus. It provides an effective formal representation basis and formal labels support for discourse semantic understanding and intelligent processing of government documents. 2. This paper designs and implements a deep learning model of discourse semantic understanding based on multi-headed self-attention mechanism, a deep reinforcement learning method of discourse semantic understanding based on policy gradient, and a active learning method of discourse semantic understanding based on three-way decisions theory and multi-strategy-fused. It provides three more effective methods for discourse semantic understanding of government documents: (1) In the deep learning model of discourse semantic understanding, this paper improves the multi-headed self-attention, designs a multi-headed partitioned self- -attention mechanism based on feature partition, and designs the embedding of the original features such as BERT, content features, form features and position features of government documents. The experimental results show the effectiveness of the model. (2) In the deep reinforcement learning method of discourse semantic understanding, this paper builds a deep policy reinforcement learning model and implements the process of the policy gradient reinforcement learning, and obtains better transition policy of discourse semantic understanding through the transformation of the shift-reduce transition and the policy gradient method in the task of discourse semantic understanding. The experimental results show that although the performance of some subtasks is slightly lower than that of the deep learning model of discourse semantic understanding, the overall performance of the model is improved more effectively. (3) In the deep active learning method of discourse semantic understanding, this paper designs multiple strategies, such as uncertainty strategy, semi-supervised strategy, representative strategy and diversity strategy, and integrate the sequential thought of three-way decision theory in granular cognitive computing. It implements the deep active learning process of discourse semantic understanding, and enables the model to have more efficient active learning and iterative optimization ability under incremental samples, so as to be more close to or meet the practical application needs of discourse semantic understanding. In particular, it is worth noting that the theory, method and application research ideas and implementation process of discourse semantic understanding of government documents in this paper have high portability and scalability, and are also suitable for the research and practice of discourse semantic understanding in other related genres or fields. 56 pictures, 39 tables, 341references | |
| 查看全文: | 预览 下载(下载需要进行登录) |