As healthcare costs rise due to aging populations and chronic illnesses, optimized care solutions are urgently needed. Gesture recognition and fall detection are critical for intelligent companion robots in healthcare. However, current deep learning models struggle with accuracy and real-time performance in complex backgrounds due to high computational demands. To address this, we propose an improved RT-DETR R18 model tailored for companion robots. This lightweight, efficient design integrates YOLOv9’s ADown module, the RepNCSPELAN4 module, and custom attention-based AdaptiveGateUpsample and AdaptiveGateDownsample modules for enhanced multi-scale feature fusion, reducing weight and complexity while optimizing real-time detection. Experiments show our model achieves a 51.7% reduction in parameters, a 46.7% decrease in GFLOPS, and higher FPS compared to RT-DETR R18, with mAP@0.5, mAP@0.5-0.95, precision, and recall improving to 99.4%, 86.4%, 99.6%, and 99.4%, respectively. Testing in complex indoor environments confirms its high accuracy for gesture recognition and fall detection, reducing manual workload and offering a novel solution for human behavior recognition in intelligent companionship.