The development of Large Language Models (LLMs) has sparked significant debate regarding their capacity for Theory of Mind (ToM)-the ability to attribute mental states to oneself and others. This systematic review examines the extent to which LLMs exhibit Artificial ToM (AToM) by evaluating their performance on ToM tasks and comparing it with human responses. While LLMs, particularly GPT-4, perform well on first-order false belief tasks, they struggle with more complex reasoning, such as second-order beliefs and recursive inferences, where humans consistently outperform them. Moreover, the review underscores the variability in ToM assessments, as many studies adapt classical tasks for LLMs, raising concerns about comparability with human ToM. Most evaluations remain constrained to text-based tasks, overlooking embodied and multimodal dimensions crucial to human social cognition. This review discusses the "illusion of understanding" in LLMs for two primary reasons: First, their lack of the developmental and cognitive mechanisms necessary for genuine ToM, and second, methodological biases in test designs that favor LLMs' strengths, limiting direct comparisons with human performance. The findings highlight the need for more ecologically valid assessments and interdisciplinary research to better delineate the limitations and potential of AToM. This set of issues is highly relevant to psychology, as language is generally considered just one component in the broader development of human ToM, a perspective that contrasts with the dominant approach in AToM studies. This discrepancy raises critical questions about the extent to which human ToM and AToM are comparable.