Recently, the convolutional neural network has brought impressive\nimprovements for object detection. However, detecting tiny objects in\nlarge-scale remote sensing images still remains challenging. First, the extreme\nlarge input size makes the existing object detection solutions too slow for\npractical use. Second, the massive and complex backgrounds cause serious false\nalarms. Moreover, the ultratiny objects increase the difficulty of accurate\ndetection. To tackle these problems, we propose a unified and self-reinforced\nnetwork called remote sensing region-based convolutional neural network\n($\\mathcal{R}^2$-CNN), composing of backbone Tiny-Net, intermediate global\nattention block, and final classifier and detector. Tiny-Net is a lightweight\nresidual structure, which enables fast and powerful features extraction from\ninputs. Global attention block is built upon Tiny-Net to inhibit false\npositives. Classifier is then used to predict the existence of targets in each\npatch, and detector is followed to locate them accurately if available. The\nclassifier and detector are mutually reinforced with end-to-end training, which\nfurther speed up the process and avoid false alarms. Effectiveness of\n$\\mathcal{R}^2$-CNN is validated on hundreds of GF-1 images and GF-2 images\nthat are 18 000 $\\times$ 18 192 pixels, 2.0-m resolution, and 27 620 $\\times$\n29 200 pixels, 0.8-m resolution, respectively. Specifically, we can process a\nGF-1 image in 29.4 s on Titian X just with single thread. According to our\nknowledge, no previous solution can detect the tiny object on such huge remote\nsensing images gracefully. We believe that it is a significant step toward\npractical real-time remote sensing systems.\n