Using weighted features to predict questions’ answerability in question and answer communities

认识论数据科学计算机科学哲学

作者

Lucas Viana Knochenhauer,Carina F. Dorneles,Daniel Dalip Hasan

出处

期刊：Data technologies and applications [Emerald Publishing Limited]
日期：2025-07-01

标识

DOI：10.1108/dta-11-2024-1088

摘要

Purpose This study addresses the challenge of unanswered questions in community question answering (CQA) platforms, often caused by the high daily influx of new questions. It aims to develop WANQA, a portable classification model that determines whether a question is answerable at the time of submission, enabling real-time user feedback for question refinement. Design/methodology/approach WANQA is a classification model leveraging 20 features commonly present in most CQA platforms, focusing exclusively on submission-time features to ensure portability across diverse communities. The study evaluates 360 configurations by combining feature sets, classification algorithms, question categories, weights and subset sizes, benchmarking the results against three baseline models. Findings The research demonstrates that Naive Bayes, combined with 50% of features selected using the Pearson correlation method, achieves optimal performance by balancing computational efficiency and classification accuracy. This approach effectively detects unanswerable questions at submission, improving user experience. Research limitations/implications The study provides a methodological basis for developing portable classification models in CQAs, though the reliance on specific datasets and feature assumptions may limit generalizability to niche or specialized platforms. Practical implications By identifying unanswerable questions at submission, WANQA enables CQAs to notify users to refine their queries, enhancing content quality and user engagement while maintaining computational efficiency suitable for real-time applications. Social implications Improving question-answering workflows in CQAs fosters more inclusive and accessible knowledge-sharing environments, benefiting diverse user groups by reducing frustration and increasing the relevance of responses. Originality/value WANQA introduces a novel solution to overcome the limitations of prior models that rely heavily on community-specific features such as votes and user reputation. By focusing solely on submission-time features, it offers a portable and adaptable framework for diverse CQA environments.

求助该文献

Using weighted features to predict questions’ answerability in question and answer communities

今日热心研友