×

联系我们

方式一(推荐):点击跳转至留言建议,您的留言将以短信方式发送至管理员,回复更快

方式二:发送邮件至 nktanglan@163.com

学生论文

论文查询结果

返回搜索

论文编号:8065 
作者编号:2120132501 
上传时间:2015/12/11 12:08:13 
中文题目:Web宏观结构特性检验研究 
英文题目:An Examination of the Macro-Structure Attributes of Web 
指导老师:李培 
中文关键字:Web宏观结构特性;小世界特性;无尺度特性;复杂网络;Web挖掘 
英文关键字:whole structure attributes of web ;small-world attribute;scale-free attribute;complicated networks;web mining 
中文摘要:Web挖掘是利用数据挖掘、文本挖掘、机器学习等技术从Web页面数据、日志数据、超链接关系中发现感兴趣的、潜在的、有用的规则、模式、领域知识等。Web挖掘根据其挖掘对象分为内容挖掘、结构挖掘和日志挖掘。Web本身是一个复杂网络,上世纪末,研究者们先后发现了其无尺度特性和小世界特性。随后,无尺度网络和小世界网络的研究掀起了热潮,来自不同学科的大量研究者开始针对这两种网络的性质进行了分析,这一系列的研究问题得到了快速的发展和应用。然而,随着这些问题的研究日渐成熟,绝大部分研究倾向于脱离实际观测的理论模型的建立,而缺少运用采集工具和分析工具进行实际测量。本研究正是在这种情况下,借助Web挖掘有关知识,针对Web宏观结构特性——小世界特性和无尺度特性进行检验。本文收集了中国数据堂上clubweb09样本数据集,抽取其中符合研究设计的链接数据。生成大规模复杂网络分析工具Pajek可读的.net文件之后,通过对无尺度网络和小世界网络核心指标——节点度分布、聚集系数和平局路径长度的计算,分析数据集6的两个样例数据集的特征,结果表明:数据集6代表的Web Graph符合复杂网络的无尺度特性和小世界特性。其中,节点度分布统计显示,汉语集和英语集均呈现幂率分布而不是随机分布;汉语集的平均路径长度约为2.48,英语集的平均路径长度约为2.95,表明这两个数据集具有明显的小世界特征。研究在分析数据、总结结果的基础上,指出了在样本收集方法中数据“外包”和数据处理方法低效的不足,进而对未来研究方向作出展望。同时,研究说明了检验研究本身具有独特的意义和本文在复杂网络与Web结构挖掘结合上的创新之处。本研究图21幅,表2张,参考文献78篇。 
英文摘要:With the technique of data mining, text mining, machine learning et al, web mining can discover lots of interest-attracting, latent and useful rules, modes and knowledge of various fields from data of web page as well as relationships of hyperlinks. In terms of what to mine, web mining can be divided into content mining, structure mining and usage mining. Web, known as one kind of complicated networks, has been researched to have two attributes, namely small-world attribute and scale-free attribute in the late of 1990s. A majority of researchers coming from different disciplines started to observe and analysis the small-world network and the scale-free network with the huge wave of these two networks’ studying preference then, leading to a rapid development of these series’ studies. However, as studies go, most researches tended to establish models but to collect data of the real world with proper methods of collecting and analyzing. This study will examine the two attributes again with the help of web mining under these circumstances. Here we’ve had access to a dataset, namely “clubweb09” on datatang and extracted links which met this research design. By calculating the very core indexes of judging whether a network is a scale-free network or small-world network----the distribution of degree, clustering coefficient and the average path length after creating document “.net” which can be readable by the large-scale complicated network analysis tool, Pajek, we’ve analyzed the characters of the two sample of dataset 6.It comes out that Web Graph represented by dataset 6 truly possesses the attributes of scale-free and small-world characters of complicated networks. The degree of nodes implicates that the distribution of nodes, both set of Chinese links and set of English links, obeys the power-law other than the random distribution accordingly. The APL(average path length) of the set of Chinese links is about 2.48, while the English one is about 2.95, which indicates the two sets both show the small-world attributes apparently. The research points out two main shortcomings, the “out-source” access of data collection and the inefficiency of data processing, which leads to the direction of future research, with the foundation of analyzing data and concluding the results. Meanwhile, the research illustrates the unique sense of examination research and the innovation of combing complex networks and Web structure mining. 
查看全文:预览  下载(下载需要进行登录)