学生论文
|
论文查询结果 |
返回搜索 |
|
|
|
| 论文编号: | 9320 | |
| 作者编号: | 1120140830 | |
| 上传时间: | 2017/6/20 8:43:50 | |
| 中文题目: | 面向食品安全管理的风险信息元素抽取研究 | |
| 英文题目: | Study on Risk Information Elements Extraction for Food Safety Management | |
| 指导老师: | 王芳 | |
| 中文关键字: | 风险信息;信息抽取;食品安全;社交媒体;风险管理 | |
| 英文关键字: | risk information; information extraction; food safety; social media; risk management | |
| 中文摘要: | 风险管理是现代社会面临的重大难题之一,在互联网发展和大数据应用背景下,如何更加有效地识别和控制风险,引起管理实践和学术研究的关注。本文旨在为风险管理提供信息分析的视角,揭示风险信息的主要元素,探索有效的数据采集和信息抽取方法,以实现对社交媒体和行业网站等互联网信息资源的细粒度挖掘和利用,辅助风险管理决策。 本研究选择食品质量安全风险管理作为研究案例。本文首先采用理论演绎法以及访谈、问卷等调查研究法,对风险信息进行了定义和规范化描述,揭示风险信息的主要元素,并说明不同类别风险信息的特征和交互关系。然后参考开放数据的标准考察了互联网信息资源的特征,构建词表并采集了社交媒体等互联网数据,使用自然语言处理方法探讨信息元素的抽取策略。在此过程中,通过数据观察和对比实验实现对信息元素抽取方法的优化。 研究主要结论如下: 第一,不同风险信息在内容、来源、更新、性质等方面有差异,因此风险信息应区分为静态风险信息和动态风险信息。静态风险信息的元素集合可以用五元组(危害因素名称,条件,损失类型,概率,损失系数)表示,动态风险信息可以用四元组(主体名称,行为,对象,关联关系)表示。 第二,互联网中与食品安全有关的风险信息资源具有内容丰富、可检索、可获取、更新及时的特征,但数据结构多样,信息可信度差异大,有效信息密度低和信息冗余度高。搜索词表加网络爬虫的方法可以实现风险信息的数据采集。实验建立了一个包含10万条目的词表和8万条微博的食品安全风险信息实验语料库。 第三,依存语义角色模板加筛选的方法可以一次性从微博文本中抽取动态风险信息的主要元素,查全率为70.77%。使用基于字符表征和依存语义角色相似的微博融合方法加以改进后,可以将准确率提高到67.63%,而查全率无显著下降。 | |
| 英文摘要: | Risk management is one of the difficult problems in modern society, while it is also a multidisciplinary research field. With the development of Internet and big data, how to identify and control risk more effectively has gradually attracted more attention in management practice and academic research. The purpose of this paper is to provide a perspective of information resource management for risk management research, to reveal the connotation and characteristics of risk information, and to develop information collection and analysis methods to mining social media and other Internet information resources in order to support decision of risk management. The paper chooses the risk management of food quality safety as the research case. This paper uses theoretical deductive method, interview survey method and questionnaire survey method to reveal the characteristics of risk information and establish the risk information management model and then using open data standards to examine the characteristics of Internet information resources. Finally, the web crawler technique, natural language processing method are used and developed for social media and other Internet data collection, information extraction and information fusion. In this process, data observation and controlled experiments are designed to obtain the optimal methods of information extraction. The main conclusions are as follows: Firstly, risk information is divided into two categories: static risk information and dynamic risk information. The two types of risk information are different in sources, main elements, and updating speed. The set of static risk information elements is a quintuple (name of hazard factors, condition, the type of loss, probability, loss coefficient), dynamic risk information is a quadruple (entity name, object, action, relationship). Secondly, the Internet information resources related with food safety risk is rich, ACCESSible and updated timely, however the information is different in data structure and credibility. The effective information has a low density while there is high information redundancy. The method based on key words searching and web crawler is effective for risk information collection. A word list of 100 thousand items and a corpus consisting of 80 thousand micro-blogs has been set up in the experiment. Thirdly, this paper develops a method which can extract main risk information elements at same time. The method uses dependency grammar anaylsis and vocabulary selection strategy. The recall rate is 70.77%. Improved by micro-blog fusion strategy using character feature and similarity of dependency roles, the accuracy rate can reach to 67.63% while there is no significant drop in recall rate. | |
| 查看全文: | 预览 下载(下载需要进行登录) |