Satellite-derived high-resolution LST observations are essential for environmental studies. However, the tradeoff between spatial and temporal resolutions largely restricts the application of current LST products. As a consequence, many spatial downscaling or spatiotemporal fusion methods were proposed to overcome this limitation. In this paper, we design a novel empirical weighting method to combine the results from the popular downscaling and fusion methods, thermal sharpening algorithm (TsHARP), and spatial and temporal adaptive reflectance fusion model (STARFM). Specifically, the error of the two methods are firstly estimated and the predictions are blended based on the inverse ratio of the corresponding error. Our method is tested with Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) and Moderate Resolution Imaging Spectroradiometer (MODIS) data in Beijing. Compared with the actual ASTER LST, the combining results could both enhance the accuracy and structure similarity, as our method utilizes spatial-temporal-spectral information. Moreover, our method also has the potential for generating more accurate daily high-resolution LSTs.