Intelligent robots rely on machine vision technology to accomplish tasks in industrial production environments. As one of the important technologies in robotic the vision of the robot arm, the target detection model of the robot arm can accurately detect the position of the target object and guide the robotic arm to grab the object. However, the detection effect of the model often depends on the quality of the dataset, and it is difficult for industrial robots to collect effective datasets when performing tasks, which makes the robotic arm face the great challenge of inaccurate localization when grasping the target object. This paper presents a method for generating a synthetic dataset specifically designed for training robotic arms in grasping tasks. The method combined digital twins technology to generate a substantial dataset for robotic arm gripping. The target identification model trained by the synthetic dataset can precisely identify the real target object. It then converts the object's coordinates into the robotic arm's coordinate system in order to guide the arm to successfully do the grasping job. The experimental results demonstrate that synthetic datasets can efficiently substitute manually annotated real datasets in the context of the robotic arm's grasping and object localization tasks. This method leads to a reduction in the development cycle of the robotic arm vision system and a decrease in the associated costs of development.