Memristive computing-in-memory and near-threshold computing are two unconventional computing paradigms that can potentially enhance the energy efficiency and real-time performance of edge devices. However, their scalability faces challenges, primarily due to process variation. Here, we report a 1-Mb, 16-macro near-threshold memristive computing-in-memory engine. The two-transistor-one-resistor cells provide strong cell current modulation capability with more than 120-times amplified resistance ratio. To mitigate variation issues, we compensate for transistor mismatches by leveraging the intrinsic variations in memristors. Additionally, we propose a charge stacking technique between multiple analog-to-digital converters to perform analog weight-and-combine operations with small energy and area overhead. Moreover, we introduce an inter-macro hybrid control scheme to reduce the task-level inference power. The fabricated chip can perform highly parallel analog computing over 256 input channels with a 2.4% relative standard deviation. It achieves a throughput up to 10.49 tera-operations per second and an energy efficiency up to 88.51 tera-operations per second per watt.