A Proactive Approach to Fault Tolerance Using Predictive Machine Learning Models in Distributed Systems

计算机科学 容错 分布式计算 人工智能 机器学习
作者
Mohammad Haroon,Zeeshan Ali Siddiqui,Mohammad Husain,Arshad Ali,Tameem Ahmad
出处
期刊:International journal of experimental research and review [International Academic Publishing House]
卷期号:44: 208-220 被引量:5
标识
DOI:10.52756/ijerr.2024.v44spl.018
摘要

In the era of cloud computing and large-scale distributed systems, ensuring uninterrupted service and operational reliability is crucial. Conventional fault tolerance techniques usually take a reactive approach, addressing problems only after they arise. This can result in performance deterioration and downtime. With predictive machine learning models, this research offers a proactive approach to fault tolerance for distributed systems, preventing significant failures before they arise. Our research focuses on combining cutting-edge machine learning algorithms with real-time analysis of massive streams of operational data to predict abnormalities in the system and possible breakdowns. We employ supervised learning algorithms such as Random Forests and Gradient Boosting to predict faults with high accuracy. The predictive models are trained on historical data, capturing intricate patterns and correlations that precede system faults. Early defect detection made possible by this proactive approach enables preventative remedial measures to be taken, reducing downtime and preserving system integrity. To validate our approach, we designed and implemented a fault prediction framework within a simulated distributed system environment that mirrors contemporary cloud architectures. Our experiments demonstrate that the predictive models can successfully forecast a wide range of faults, from hardware failures to network disruptions, with significant lead time, providing a critical window for implementing preventive measures. Additionally, we assessed the impact of these pre-emptive actions on overall system performance, highlighting improved reliability and a reduction in mean time to recovery (MTTR). We also analyse the scalability and adaptability of our proposed solution within diverse and dynamic distributed environments. Through seamless integration with existing monitoring and management tools, our framework significantly enhances fault tolerance capabilities without requiring extensive restructuring of current systems. This work introduces a proactive approach to fault tolerance in distributed systems using predictive machine learning models. Unlike traditional reactive methods that respond to failures after they occur, this work focuses on anticipating faults before they happen.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
shu发布了新的文献求助10
刚刚
科研通AI6.3应助林菲菲采纳,获得10
刚刚
小丹发布了新的文献求助10
刚刚
刚刚
烟花应助桃青采纳,获得10
1秒前
1秒前
HU完成签到 ,获得积分10
1秒前
彭于晏应助小柠采纳,获得10
1秒前
大胆麦片发布了新的文献求助10
2秒前
2秒前
彩色丸子发布了新的文献求助10
2秒前
朴实的猫咪完成签到 ,获得积分10
3秒前
Yz完成签到 ,获得积分10
3秒前
星辰大海应助xhntt采纳,获得10
4秒前
4秒前
4秒前
4秒前
5秒前
5秒前
Hello应助巷陌采纳,获得10
5秒前
5秒前
6秒前
6秒前
无花果应助abc采纳,获得10
6秒前
乐乐应助Steplan采纳,获得10
7秒前
7秒前
NexusExplorer应助fa采纳,获得10
7秒前
7秒前
7秒前
7秒前
布布完成签到,获得积分20
8秒前
8秒前
luu完成签到,获得积分10
8秒前
Wei Qin应助真真采纳,获得10
8秒前
ding应助寒冷冰香采纳,获得10
9秒前
9秒前
hahaha完成签到,获得积分10
9秒前
jiqihao发布了新的文献求助10
9秒前
科研通AI2S应助imao采纳,获得30
9秒前
大模型应助小瑞采纳,获得10
9秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Modern Epidemiology, Fourth Edition 5000
Kinesiophobia : a new view of chronic pain behavior 5000
Molecular Biology of Cancer: Mechanisms, Targets, and Therapeutics 3000
Propeller Design 1000
Weaponeering, Fourth Edition – Two Volume SET 1000
First commercial application of ELCRES™ HTV150A film in Nichicon capacitors for AC-DC inverters: SABIC at PCIM Europe 1000
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 6000391
求助须知:如何正确求助?哪些是违规求助? 7498641
关于积分的说明 16097114
捐赠科研通 5145398
什么是DOI,文献DOI怎么找? 2757780
邀请新用户注册赠送积分活动 1733578
关于科研通互助平台的介绍 1630844