Two Stage Job Title Identification System for Online Job Advertisements

计算机科学 鉴定(生物学) 机器学习 匹配(统计) 人工智能 聚类分析 数据挖掘 嵌入 相似性(几何) 情报检索 统计 植物 数学 图像(数学) 生物
作者
Ibrahim Rahhal,Kathleen M. Carley,Ismail Kassou,Mounir Ghogho
出处
期刊:IEEE Access [Institute of Electrical and Electronics Engineers]
卷期号:11: 19073-19092 被引量:1
标识
DOI:10.1109/access.2023.3247866
摘要

Data science techniques are powerful tools for extracting knowledge from large datasets. Analyzing the job market by classifying online job advertisements (ads) has recently received much attention. Various approaches for multi-label classification (e.g., self-supervised learning and clustering) have been developed to identify the occupation from a job advertisement and have achieved a satisfying performance. However, these approaches require labeled datasets with hundreds of thousands of examples and focus on specific databases such as the Occupational Information Network (O*NET) that are more adapted to the US job market. In this paper, we present a two-stage job title identification methodology to address the case of small datasets. We use Bidirectional Encoder Representations from Transformers (BERT) to first classify the job ads according to their corresponding sector (e.g., Information Technology, Agriculture). Then, we use unsupervised machine learning algorithms and some similarity measures to find the closest matching job title from the list of occupations within the predicted sector. We also propose a novel document embedding strategy to address the issues of processing and classifying job ads. Our experimental results show that the proposed two-stage approach improves the job title identification accuracy by 14% to achieve more than 85% in some sectors. Moreover, we found that incorporating document embedding-based approaches such as weighting strategies and noise removal improves the classification accuracy by 23.5% compared to approaches based on the Bag of words model. Further evaluations verify that the proposed methodology either outperforms or performs at least as well as the state-of-the-art methods. Applying the proposed methodology to Moroccan job market data has helped identify emerging and high-demand occupations in Morocco.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
NIM-ZHAO完成签到,获得积分10
刚刚
1461644768发布了新的文献求助10
2秒前
希望天下0贩的0应助云云采纳,获得10
6秒前
孙旭完成签到 ,获得积分10
7秒前
7秒前
蚂蚱完成签到 ,获得积分10
8秒前
谷贝贝发布了新的文献求助30
9秒前
洁净的文涛完成签到,获得积分10
9秒前
简简单单发布了新的文献求助10
10秒前
酷波er应助1461644768采纳,获得10
10秒前
10秒前
zqq发布了新的文献求助10
13秒前
可爱的函函应助勤恳的珊采纳,获得10
15秒前
lala完成签到,获得积分10
19秒前
有魅力的雨雪完成签到,获得积分10
19秒前
FashionBoy应助CNS收割机采纳,获得10
19秒前
zqq完成签到,获得积分10
20秒前
共享精神应助cyh413134采纳,获得10
22秒前
心里的种子完成签到,获得积分20
25秒前
正直曼柔完成签到 ,获得积分10
25秒前
科研通AI2S应助ikun采纳,获得10
25秒前
26秒前
叁壹粑粑发布了新的文献求助10
26秒前
古月完成签到,获得积分10
26秒前
30秒前
31秒前
31秒前
32秒前
小张完成签到,获得积分10
33秒前
我是老大应助魏商周采纳,获得10
34秒前
明亮紫夏发布了新的文献求助10
36秒前
Lucas应助Yang采纳,获得10
37秒前
Vvvnnnaa1发布了新的文献求助10
37秒前
CNS收割机发布了新的文献求助10
37秒前
姜小白完成签到,获得积分10
38秒前
凉风送信完成签到,获得积分10
40秒前
JamesPei应助陶醉听芹采纳,获得10
41秒前
43秒前
43秒前
44秒前
高分求助中
Sustainable Land Management: Strategies to Cope with the Marginalisation of Agriculture 1000
Corrosion and Oxygen Control 600
Python Programming for Linguistics and Digital Humanities: Applications for Text-Focused Fields 500
Heterocyclic Stilbene and Bibenzyl Derivatives in Liverworts: Distribution, Structures, Total Synthesis and Biological Activity 500
重庆市新能源汽车产业大数据招商指南(两链两图两池两库两平台两清单两报告) 400
Division and square root. Digit-recurrence algorithms and implementations 400
行動データの計算論モデリング 強化学習モデルを例として 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2547977
求助须知:如何正确求助?哪些是违规求助? 2176407
关于积分的说明 5604321
捐赠科研通 1897193
什么是DOI,文献DOI怎么找? 946780
版权声明 565419
科研通“疑难数据库(出版商)”最低求助积分说明 503913