Efficient and accurate indoor radio signal strength prediction methods are essential for the design and operation of wireless communication systems. Recently, attempts have been made to combine radio propagation prediction with deep learning. Inspired by recent advances in computer vision, we propose a prediction model using a convolutional encoder-decoder structure fused with Swin Transformer module. Specifically, we embed the Swin Transformer into the U-Net structure to enhance the global modeling capability of the U-Net network, which can be trained to predict the strength of signals received in a given indoor environment. More importantly, once trained for a sufficient number of scenarios, the model can directly predict the signal strength in unknown indoor environments. The simulation results verify that the model is more effective than the traditional U-Net, with a reduction in validation error of about 40%.