学生论文
|
论文查询结果 |
返回搜索 |
|
|
|
| 论文编号: | 994 | |
| 作者编号: | 2120071900 | |
| 上传时间: | 2009/5/25 16:15:11 | |
| 中文题目: | 粗糙集在网络图片广告过滤中的应 | |
| 英文题目: | The Application of Rough Set i | |
| 指导老师: | 安利平 | |
| 中文关键字: | 粗糙集 分组约简 广告过滤 文 | |
| 英文关键字: | Rough Set; Grouping Reduct;/ | |
| 中文摘要: | 互联网的迅速发展和日益普及,使其蕴含了海量的信息资源,已成为人们获取信息的重要途径之一。然而,与此同时网络广告日益泛滥,产生了许多负面影响。一方面,网络广告会干扰网民浏览或查找网络信息,网民对网络广告的反感逐年上升;另一方面,网络广告会淹没网页中的有用信息,严重损害了Web挖掘的质量。在这种背景下,对网络广告进行过滤显得尤为必要,特别是由于品牌图片广告占据了整个网络广告市场的半壁江山,所以针对网络图片广告过滤的研究很有实际意义。 本文将网络图片广告过滤问题转化为文本分类问题,这种文本分类问题是可以通过粗糙集模型来处理的。已有研究存在着各种各样的不足之处,本文针对这些不足,根据网络图片数据集的特点,在一般粗糙集模型的基础上,提出了一种针对网络图片广告过滤的“粗糙集分组约简模型”,采用分组约简的方法对文本数据集进行特征选择,能够提高算法时间效率。设计了对比实验,利用UCI机器学习数据库中的Internet Advertisements数据集,分别用粗糙集分组约简模型和一般粗糙集模型进行了对比分类实验。 本文的主要工作和贡献体现在以下几点。首先,较全面地总结了国内外基于数据挖掘的网络图片广告过滤领域的研究成果;其次,率先将粗糙集理论应用于网络图片广告过滤领域,实证研究表明粗糙集理论能很好地处理网络图片广告过滤问题,分类准确率达到甚至超过了以往的研究结果;第三,根据网络图片数据的特点,改进了一般粗糙集模型,提出了新的粗糙集分组约简模型;最后,设计了对比实验,用实际数据对两个模型进行了实证分析,表明粗糙集分组约简模型能进一步提高分类质量。 本文提出的粗糙集分组约简模型具有较好通用性,可以类比应用于那些可以将特征项分为若干组的数据集,这种数据集在文本分类领域是比较多见的,例如垃圾邮件过滤问题。 | |
| 英文摘要: | Internet is an important way of acquiring information because it contains a mass of information resources with its rapid expanding and increasing popularity. But Internet advertisements have become more and more rampant and induce some negative impacts. On the one hand, Internet users’ resentment against the advertisements is rising year on year. On the other hand, Internet advertisements will take away the useful information, and damage the the quality of web mining seriously. In this context, the filtering of Internet advertisements is particularly necessary. Especially the research of Internet picture advertisements filtering is very practical significant because the picture advertisements occupyed the entire half of the Internet advertising market. This thesis converts the issue of picture advertisements filtering into a text categorization problem which can be resolved by Rough Set theory. In this thesis, a Rough Set Grouping Reduct model is proposed based on the traditional Rough Set theory in accordance with the characteristics of Internet images. The model carries out feature selection using a “grouping reduct” method which can improve the time efficiency of algorithms. Experiments are designed to compare the classification quality of the Rough Set Grouping Reduct model and the traditional Rough set model using UCI Internet Advertisements data set. Many contributions exist in this thesis. First of all, a comprehensive summary of domestic and international researches of Internet picture advertisements filtering was accomplished; Secondly, Rough Set theory was applied to the Internet advertisements filtering for the first time in this thesis, empirical studies show that the rough set theory can handle the issue of Internet picture advertisements filtering perfectly; Thirdly, the traditional Rough Set theory was improved to adapt to the characteristics of the Internet images, a new model called “the Rough Set Grouping Reduct model” was proposed; Finally, a experiment was designed to compare the two model, through comparative analysis of various indicators, the new model achieved a better classification result, the classification accuracy met or even exceeded the results of previous studies. The Rough Set Grouping Reduct model can be applied to other data sets that have analog characteristic: items can be divided into several groups. These data sets are relatively common in the field of text mining, just like the spam filtering. | |
| 查看全文: | 预览 下载(下载需要进行登录) |