学生论文
|
论文查询结果 |
返回搜索 |
|
|
|
| 论文编号: | 5200 | |
| 作者编号: | 2120112370 | |
| 上传时间: | 2013/6/5 11:53:04 | |
| 中文题目: | 基于粗糙集理论的协同过滤推荐算法研究 | |
| 英文题目: | Research of Collaborative Filtering Recommendation Algorithm based on Rough Sets Theory | |
| 指导老师: | 安利平 | |
| 中文关键字: | 推荐系统,协同过滤,属性约简,粗糙K-means聚类,ROUSTIDA算法 | |
| 英文关键字: | Recommender Systems, Collaborative Filtering, Attribute Reduction, Rough K-means Clustering, ROUSTIDA algorithm | |
| 中文摘要: | 随着信息社会的到来,人类之间的相互联系更加紧密,资源配置也开始在全世界范围内流动。但另一方面,人类却不得不面对信息过载的情况,有限的注意力被分散到庞杂无序的信息当中,直接影响了生产效率。如何才能高效地得到我们需要的信息,搜索引擎提供了一种解决方案。但搜索引擎只能针对用户输入的关键词,提供非个性化的输出,而且只适用于用户能够非常准确地描述自身需求的情况。推荐系统的出现从根本上解决了信息过载的问题,把个体用户和海量信息有机地联系在一起。 推荐算法无疑是推荐系统的核心,而协同过滤是当前学界研究最深入、业界应用最广泛的推荐算法。通过寻找具有相似偏好的邻居用户、或者与用户兴趣相似的邻居项目,然后根据这些邻居来进行评分预测和Top-N推荐。然而,随着推荐系统中用户规模和项目数量的指数增长,这种本质上属于“统计”方法的协同过滤推荐算法面临着稀疏性、冷启动、伸缩性等方面的巨大挑战。本文以稀疏性问题为切入点,通过理论回顾、算法研究和实验验证,结合粗糙集理论在处理不确定性数据方面的优势,提出了一种基于粗糙集的协同过滤改进算法,并通过实验证明了算法改进后的效果。具体来说,对传统协同过滤推荐算法的改进包括以下三个方面: 针对高维的用户-项目评分矩阵,采用基于Pawlak属性重要度的属性约简算法进行降维,并通过粗糙K-means聚类将目标用户最近邻的搜索范围限制在与其兴趣相似的几个聚类中,这样就得到了精简的评分矩阵; 针对精简评分矩阵中仍然存在的部分稀疏性,提出了基于变精度粗糙集模型的ROUSTIDA改进算法。填充后的精简矩阵中,用户相互之间的共同评分项目更多,有利于相似性计算; 结合粗糙集理论中“等价划分”的思想,将用户评分看作是对项目进行分类的知识,利用近似分类质量来度量用户之间的相似性。 | |
| 英文摘要: | With the advent of information era, humans are more closely connected with each other, and global resource allocation driven by Internet greatly improves productivity of all sectors. But on the other hand, human attention becomes even scarcer in the case of information overload, and large amount of redundant information directly hinders the operation efficiency. Search engine is such a genius solution to this dilemma. However it provides non-personalized suggestions in response to the input keywords, and only when you are fully aware of what to inquire. In contrast, recommender systems take a fundamental perspective to organically associate individual user with numerous items. Recommendation algorithm is undoubtedly the core of a recommender system, and CF (Collaborative Filtering) Algorithm is the most wildly used one in research and practice. It’s mainly based on the notion that users who have similar rating behaviors should have similar preferences. But the exponential growth of users and items makes it suffer from several bottlenecks, such as sparsity, cold-start and scalability. And the poor performance is primarily attributed to data sparsity, which is also the main issue of this thesis. Rough sets theory was proposed by Z. Pawlak as a new mathematical tool to deal with vague concepts. To address the above drawbacks, a refined hybrid recommender system based on rough sets theory is proposed, and empirical experiments reveal that the new algorithm outperforms most of the state-of-the-art ones. To be specific, the main contributions of this thesis are as follows: The high dimensions of user-item rating matrix are reduced by the Attribute Reduction Algorithm based on Pawlak attribute significance. Rough K-means Clustering technique can help shrink search scope and generate pruned rating matrix; The unknown values of the pruned rating matrix can be further smoothed by an improved ROUSTIDA algorithm based on Variable Precision Rough Sets. And the intersections of rating items between neighbors are sufficient to compute their similarities; Referring to the “Approximation Classification Quality” in Rough Sets Theory, a novel similarity measure is proposed. The idea of Rough Sets Theory has been incorporated into the entire procedures to alleviate sparsity problem and make accurate recommendations. | |
| 查看全文: | 预览 下载(下载需要进行登录) |