Despite the great success of large-scale text-to-image diffusion models in image generation and image editing, existing methods still struggle with editing the layout of real-world images. Although a few works have been developed to address this issue, they either fail to adjust the image layout effectively or encounter challenges in preserving the visual appearance of objects after layout adjustment. To bridge this gap, this paper proposes a novel image layout editing method that not only re-arranges a real-world image to a specified layout, but also ensures that the visual appearance of the objects remains consistent with their original state prior to editing. Concretely, a Multi-Concept Learning scheme is developed to learn the concepts of different objects from a single image, which can be seen as a novel inversion scheme tailored for image layout editing. Then, we leverage the semantic consistency within intermediate features of diffusion models to project the appearance information of objects to the target regions to improve the fidelity of objects after editing. Additionally, a novel initialization noise design is adopted to facilitate the convergence and success rate of re-arranging the layout. The phenomenon of concept entanglement is also analyzed, and resolved by a novel asynchronous editing strategy. Extensive experimental results demonstrate that the proposed method outperforms existing methods in both layout alignment and visual consistency for the task of image layout editing.