Abstract Tumor treatment efficacy relies heavily on precise drug dosage control, yet traditional methods often depend on empirical formulas and lack adaptability to dynamic tumor microenvironments, risking overdose. Our investigation adopts a deep reinforcement learning (DRL) approach for real-time drug dosage control. Specifically, we employed the Advantage Actor-Critic (A2C) algorithm to perform real-time drug dosage adjustment, aiming to eliminate target cells while minimizing drug usage. Experimental results demonstrate that our approach effectively eradicates cells, and the introduction of a novel reward function further reduces the required drug dosage. Additionally, we conducted supplementary experiments showing that the model exhibits strong robustness across different tumor microenvironments. Finally, we found that models trained in noisy environments respond more effectively to noise compared to those trained in noise-free settings.