Federated learning (FL) has recently emerged as a promising paradigm for training large language models (LLMs) across distributed clients without centralizing raw data, thus enhancing privacy and regulatory compliance. However, the distributed and heterogeneous nature of FL also introduces critical security vulnerabilities. Among them, data poisoning attacks constitute a particularly insidious threat: malicious clients can inject carefully crafted samples into local datasets, undermining the safety alignment of LLMs or embedding backdoors that are activated under specific prompts. While prior research has proposed robust aggregation, anomaly detection, and post-hoc sanitization methods, existing defenses show limited effectiveness in non-independent and identically distributed (non-IID) environments, especially when adversaries launch stealthy clean-label attacks that directly target safety objectives. This paper introduces SafeFedPoisonDef, a multi-layer defense framework combining client update anomaly detection, reliability-weighted robust aggregation, and post-hoc fine-tuning on trusted safety datasets. We formalize the threat model, present a mathematical formulation, and provide a reproducible evaluation using federated instruction-tuning scenarios for LLMs. Results demonstrate that SafeFedPoisonDef reduces safety violation rates by up to 65% compared with state-of-the-art baselines while preserving utility under diverse poisoning intensities. We also examine compliance with LGPD, GDPR, and emerging standards for AI safety. The findings highlight the feasibility of resilient and legally responsible federated LLM training in critical applications.