Salient object detection in optical remote sensing images (RSI-SOD) is critical in remote sensing, yet it faces challenges such as dependency on intensive pixel-level annotations and limited research on low-cost, weakly supervised methods. These challenges are compounded by difficulties in handling complex backgrounds and varying salient object features with existing CNN-based methods. We introduce the Global-Local Semantic Interaction Network (GLSIN), a high-performance, cost-effective RSI-SOD approach based on scribble supervision. GLSIN employs an encoder-decoder framework, blending a Transformer and CNN to create a Dual Branch Encoder that effectively captures both global and local features of images. The Global-Local Affinity Block (GLAB) and Feature Shrinkage Decoder with the Global-Local Fusion Block (GLFB) are integrated to enhance feature interaction and precision in saliency map generation. Experimental results on two public datasets show that our method achieves F maxβ , E maxξ , S α , and M scores of 86.6%, 96.5%, 91.8%, and 0.7% on the EORSSD dataset, and 90.1%, 97.2%, 91.7%, and 1.1% on the ORSSD dataset, respectively. The performance surpasses existing weakly-supervised or unsupervised SOD methods and even some fully-supervised models.