×

联系我们

方式一(推荐):点击跳转至留言建议,您的留言将以短信方式发送至管理员,回复更快

方式二:发送邮件至 nktanglan@163.com

学生论文

论文查询结果

返回搜索

论文编号:8424 
作者编号:2120142543 
上传时间:2016/6/12 0:22:09 
中文题目:基于互联网搜索数据的流感预警模型比较与优化 
英文题目:Detecting influenza epidemics by comparing 
指导老师:李培 
中文关键字:流感,搜索引擎,百度指数,预警模型,模型比较 
英文关键字: influenza; search engine; Baidu Index; early-warning model; model comparing  
中文摘要:流行性感冒是由流感病毒引起的急性呼吸道感染,也是一种传染性强、传播速度快的流行病。传统流行病监测系统的数据源为手工收集的临床数据,对于报告新兴疾病来说有一定延迟。而对于突发疫情,实时反馈和快速响应至关重要。利用开源的网络数据监测流行病趋势是对传统监测手段的有效补充,它可以提供关于疫情严重程度的早期预警并降低监测费用。2008年,Google发现,搜索流感相关主题的人数与实际患有流感症状的人数之间存在着密切关系,他们依据这种数量关系提出了谷歌流感趋势,受到广泛关注。在我国,网民对搜索引擎的使用率可达到80%,其中国内渗透率最高的综合搜索引擎品牌为百度。在流感预警方面,虽然以往的研究对搜索引擎数据进行了一定程度上的应用,但多数研究的数据源于谷歌,而少数利用百度搜索数据进行国内流感预测的研究也并不系统,且很少有学者专门比较各个模型的预测效果并进行模型优化。基于此,本文拟通过百度搜索数据,分析中文网络关键词和我国流行性疾病监测结果的相关性,拟合并比较各种预测模型,探讨利用网络搜索数据辅助流行病监测的应用可能。本文的主要研究内容和结果如下:(1)从信息行为、信息搜寻行为等理论概念出发,对网络搜索数据与流感病例数据之间的逻辑关系进行探讨,建立理论框架模型,认为个体健康情况可能会激发其对健康信息的需求,从而进一步驱使个体实施健康信息搜寻行为。(2)根据搜索数据与流感病例数的关联框架图,用范围选词法从流感预防、流感症状、流感治疗和流感常用词四个维度对关键词进行初步筛选,得到79个初始搜索词;然后利用互相关,分析出初始搜索关键词与流感病例数之间的相关关系和时差关系,最后得到22个关键词用于构建模型。在实证研究的过程中,时差分析结果与本文给出的关联框架大致相同,先行十周左右的关键词内容都和流感疫苗相关,先行一周左右的关键词多涉及到流感的症状表现,而同步类关键词多为常用搜索词或治疗方法,在一定程度上印证了理论基础的可行性。(3)根据时差关系和模型原理的不同,拟合了8个模型,通过比较拟合优度和预测效果,发现多元线性回归和人工神经网络模型具有更好的拟合优度,但拟合效果好并不一定代表预测精度准;主成分回归模型虽然在理论上可以减少变量之间的共线性,但实践证明无论是其拟合效果还是预测效果相对于多元回归模型来说都有所下降。(4)对仅基于搜索数据的模型进行优化,引入流感监测的历史信息,形成结合历史信息与网络数据两个变量的综合模型。通过比较该优化模型与仅基于历史信息的时间序列模型、基于搜索数据的最优模型,发现历史数据和搜索数据包含的信息具有一定程度的互补性,联合使用两种数据进行预测具有最好的预测效果。 
英文摘要:Influenza (flu) is an acute respiratory infectious disease caused by infection of influenza virus, it has some characteristics such as strong infectivity and rapid transmission. Conventional surveillance for influenza is routinely recommended to monitor influenza-like illness (ILI) and influenza virus infections from clinics and laboratories. This traditional mode of surveillance leads to the report data always lag far behind the development. Thus epidemiologists have been investigating alternative data sources and real-time tools for influenza surveillance. One new developing data source is internet search queries. In 2008, Google found that some search queries related to influenza are good indicators of influenza activity, they developed Google Flu Trends (GFT) which is based on the quantitative relationship between the number of search queries related to flu and the number of ILI. In China, alternative search engines such as Baidu are more widely used than Google. The market share of Google in China is less than 20%, while that for Baidu is more than 80%. On the aspect of influenza warning, previous research has conducted some application on the search engine data, however, few scholars present a systematic method for Baidu search data preprocessing and models comparing. Thus, in this paper, we collect search query data from Baidu to investigate the relationship between online information searches and conventional surveillance data in China. By developing and comparing the early-warning models, this paper explores the possibility of detecting influenza epidemics by Internet data. The contents and results of this study are mainly as follows: (1)To begin with, this paper explores the logical relationship between online information searches and conventional surveillance data based on the concepts of information behaviors, information seeking behaviors and so on. A theoretical framework is established which reveals that health condition of the individuals may motivate their demands for health information, and further driving their health information seeking behaviors. (2) According to the theoretical framework, we determine to use range selection method to select keywords from four areas, including influenza prevention, influenza symptoms, influenza treatments and frequent terms related to influenza. 79 keywords are selected in the first step, and 22 keywords are used to build the models after cross correlation analysis. The empirical research proves the logic rationality of the theoretical framework: the keywords which could reflect flu trends ten weeks in advance are related to influenza vaccines; those a week in advance are referred to influenza symptoms; and most of simultaneous keywords are frequent terms related to influenza. (3) 8 models are established according to the differences of time correlations and theories. Results indicate that multiple linear regressive model and artificial neural network model have more significant goodness-of-fit, but good fitting effect does not necessarily reflect accurate forecasting result. Besides, principal component regression model could reduce the collinear among the variables in theory, whereas both the fitting effect and prediction accuracy of it are relatively lower than those of multiple linear regressive model in practice. (4) Finally, historical ILI cases are introduced to optimize models. By comparing the models based on historical ILI cases, search queries and both above separately, the result shows that the two kinds of information are complementary in influenza surveillance, and combining the two can achieve better monitoring results. 
查看全文:预览  下载(下载需要进行登录)