Single image super-resolution (SISR) is a classic ill-posed problem in computer vision. In recent years, deep-learning-based (DL-based) models have achieved promising results with the SISR problem. However, most existing methods suffer from an intrinsic trade-off between distortion and perceptual quality. To satisfy the requirements in different real-world situations, the balance of distortion and visual quality for image super-resolution is a critical issue. In DL-based models, the uses of hybrid loss (i.e., the combination of the distortion loss and the perceptual loss) and network interpolation are two common approaches to balancing the distortion and perceptual quality of super-resolved images. However, these two kinds of methods lack flexibility and hold strict constraints on network architectures. In this paper, we propose an image-fusion interpolation method for image super-resolution, which can balance the distortion and visual quality of super-resolved images, based on the optimal transport theory in the wavelet domain. The advantage of our proposed method is that it can be applied to any pretrained DL-based model, without any requirement from the network architecture and parameters. In addition, our proposed method is parameter-free and can run fast without using a GPU. Compared with existing state-of-the-art SISR methods, experiment results show that our proposed method can achieve a better balance between the distortion and visual quality in super-resolved images.