Current Transformer-based image steganography cannot embed data properly without considering the correlation of the cover image and the secret image. In addition, to save computational complexity, spatial-wise Transformer is often used to apply in small spatial windows, which limits the extraction of the global feature. To solve those limitations, we present a channel-wise attention Transformer model for image steganography (CAISFormer), which aims to construct long-range dependencies for identifying inconspicuous positions to embed data. A channel self-attention module (CSAM) is deployed to focus the feature channels suitable for data hiding by establishing channel relationships. Meanwhile, a non-linear enhancement (NLE) layer is employed to enhance the beneficial features while weaken the irrelevant ones. For building feature coupling between the cover image and the secret image, a channel-wise cross attention module (CCAM) is designed to fine-tune cover image features by capturing their cross-dependencies. In addition, for concealing data properly, a global-local aggregation module (GLAM) is deployed to adjust fused features by combining global and local attention, which can focus on inconspicuous and texture regions, respectively. The experimental results demonstrate that CAISFormer obtains PSNR gains of more than 0.36 dB and 0.90 dB for the cover/stego image pair and the secret/recovery image pair, respectively, and the detection ratio is decreased by 3.43%, in single image hiding compared to the state-of-the-art. Moreover, the generalization ability is also proved across a variety of datasets. The code will be made publicly available at https://github.com/YuhangZhouCJY/CAISFormer.