Large polynomial multiplications are crucial for Post-Quantum Cryptography standards like Module-Lattice-based Key Encapsulation Mechanism (ML-KEM) and Module-Lattice-based Digital Signature (ML-DSA). These multiplications, being complex, are often accelerated using the Number Theoretic Transform (NTT). This work presents a novel architecture of a high-performance NTT accelerator capable of performing both NTT and inverse NTT operations using a single set of hardware resources. The design makes use of a single butterfly configuration unit to reduce resource requirements and improve critical path. The Multi-path Delay Commutator (MDC) strategy is employed to enable fully pipelined and parallel processing of multiple coefficients, supporting both MLKEM and ML-DSA computations. Practical results show that our proposed NTT engine requires 3,821 LUTs, 2970 FFs, 20 DSPs, and 5 BRAMs on an AMD Zynq UltraScale+ FPGA, and can run up to 322 MHz. Our design provides the best Area-Time Product (ATP) among current NTT architectures.