This study's object is the process that improves efficiency and accuracy in delivering personalized recommendations to users in systems based on reinforcement learning. The principal task addressed in the study is to improve recommendation adaptation and personalization by assigning a dedicated agent to each user. This approach reduces the influence of other users’ activity and allows for more precise modeling of individual preferences. The proposed approach employs an Actor–Critic model implemented using the Deep Deterministic Policy Gradient algorithm to achieve more stable training and maximize long-term rewards in sequential decision-making processes. Recommendations are generated using the unique characteristics of items that are based on users’ historical interactions. Neural networks are trained with separate parameter configurations for single-agent and multi-agent models. Experimental results on the MovieLens dataset demonstrate the superiority of the multi-agent model over the single-agent baseline across key evaluation metrics. For top-5 recommendations, the multi-agent model achieved improvements of + 4% for Precision@5; + 0.32% for Recall@5; and + 2.92% in Normalized Discounted Cumulative Gain NDCG@5. For top-10 recommendations, gains were + 1% for Precision@10; + 0.18% for Recall@10; and + 1.14% for NDCG@10, respectively. Simulations for individual users showed that the multi-agent model outperformed the single-agent baseline in 66 out of 100 cases in terms of cumulative reward. The proposed system demonstrates effectiveness in capturing user preferences, improving recommendation quality, and adapting to evolving user preferences over time. The main area of practical application for the results includes dynamic online environments such as e-commerce systems, media platforms, social networks, and news aggregators.