Distributed ItemCF Recommendation Algorithm Based on the Combination of MapReduce and Hive

计算机科学 瓶颈 大数据 算法 推荐系统 分布式数据库 数据挖掘 数据库 分布式计算 情报检索 嵌入式系统
作者
Yakai Feng,Lei Wang
出处
期刊:Electronics [Multidisciplinary Digital Publishing Institute]
卷期号:12 (16): 3398-3398 被引量:1
标识
DOI:10.3390/electronics12163398
摘要

The ItemCF algorithm is currently the most widely used recommendation algorithm in commercial applications. In the early days of recommender systems, most recommendation algorithms were run on a single machine rather than in parallel. This approach, coupled with the rapid growth of massive user behavior data in the current big data era, has led to a bottleneck in improving the execution efficiency of recommender systems. With the vigorous development of distributed technology, distributed ItemCF algorithms have become a research hotspot. Hadoop is a very popular distributed system infrastructure. MapReduce, which provides massive data computing, and Hive, a data warehousing tool, are the two core components of Hadoop, each with its own advantages and applicable scenarios. Scholars have already utilized MapReduce and Hive for the parallelization of the ItemCF algorithm. However, these pieces of literature make use of either MapReduce or Hive alone without fully leveraging the strengths of both. As a result, it has been difficult for parallel ItemCF recommendation algorithms to feature both simple and efficient implementation and high running efficiency. To address this issue, we proposed a distributed ItemCF recommendation algorithm based on the combination of MapReduce and Hive and named it HiMRItemCF. This algorithm divided ItemCF into six steps: deduplication, obtaining the preference matrixes of all users, obtaining the co-occurrence matrixes of all items, multiplying the two matrices to generate a three-dimensional matrix, aggregating the data of the three-dimensional matrix to obtain the recommendation scores of all users for all items, and sorting the scores in descending order, with Hive being used to carry out steps 1 and 6, and MapReduce for the other four steps involving more complex calculations and operations. The Hive jobs and MapReduce jobs are linked through Hive’s external tables. After implementing the proposed algorithm using Java and running the program on three publicly available user shopping behavior datasets, we found that compared to algorithms that only use MapReduce jobs, the program implementing the proposed algorithm has fewer lines of source code, lower cyclomatic complexity and Halstead complexity, and can achieve a higher speedup ratio and parallel computing efficiency when processing all datasets. These experimental results indicate that the parallel and distributed ItemCF algorithm proposed in this paper, which combines MapReduce and Hive, has both the advantages of concise and easy-to-understand code as well as high time efficiency.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Lucas应助FF采纳,获得30
1秒前
cdercder应助坚定的炳采纳,获得10
1秒前
六水居士完成签到,获得积分10
1秒前
GBRUCE完成签到,获得积分10
2秒前
vicky发布了新的文献求助10
2秒前
zn315315发布了新的文献求助20
2秒前
llltencion完成签到,获得积分10
2秒前
cloud完成签到,获得积分10
4秒前
SYLH应助11采纳,获得10
4秒前
Jieying发布了新的文献求助30
4秒前
我是老大应助糖糖糖采纳,获得10
5秒前
lucky发布了新的文献求助20
5秒前
qq发布了新的文献求助10
5秒前
深情安青应助o泡果奶采纳,获得10
5秒前
922完成签到,获得积分20
5秒前
m1014发布了新的文献求助10
5秒前
tpsdxq发布了新的文献求助10
6秒前
Ava应助lei029采纳,获得10
6秒前
jessie完成签到,获得积分10
6秒前
溜溜完成签到,获得积分10
7秒前
7秒前
Jasper应助十一采纳,获得10
7秒前
三岁居居发布了新的文献求助10
8秒前
大饼完成签到,获得积分10
9秒前
9秒前
钮钴禄氏梅完成签到,获得积分10
9秒前
开朗向真发布了新的文献求助10
9秒前
9秒前
科研通AI5应助舒适路人采纳,获得10
10秒前
传奇3应助哑牛采纳,获得10
10秒前
11秒前
隐形飞雪发布了新的文献求助10
11秒前
CipherSage应助GregHouse123采纳,获得10
11秒前
海盗船长发布了新的文献求助280
12秒前
12秒前
13秒前
13秒前
秀丽的玉米完成签到,获得积分10
14秒前
斯文败类应助李志超采纳,获得10
14秒前
左丘忻完成签到,获得积分10
15秒前
高分求助中
Les Mantodea de Guyane Insecta, Polyneoptera 2500
Encyclopedia of Geology (2nd Edition) 2000
Technologies supporting mass customization of apparel: A pilot project 450
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
Brain and Heart The Triumphs and Struggles of a Pediatric Neurosurgeon 400
Cybersecurity Blueprint – Transitioning to Tech 400
Mixing the elements of mass customisation 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3786149
求助须知:如何正确求助?哪些是违规求助? 3331690
关于积分的说明 10252167
捐赠科研通 3047090
什么是DOI,文献DOI怎么找? 1672378
邀请新用户注册赠送积分活动 801270
科研通“疑难数据库(出版商)”最低求助积分说明 760110