January 2025
·
8 Reads
IEEE Micro
Interconnection networks are crucial in data centers and supercomputers, ensuring high communication bandwidth and low latency under demanding traffic patterns from data-intensive applications. These patterns can cause congestion, affecting system performance if not addressed efficiently. Current congestion control techniques, like DCQCN, struggle to precisely identify which packets cause congestion, leading to false positives. To address this, we propose the Enhanced Congestion Point (ECP) mechanism, which accurately identifies congesting packets. ECP monitors packets at the head of switch ingress queues, flagging them as congesting when queue occupancy exceeds a threshold and packet requests are rejected. Additionally, ECP introduces a re-evaluation mechanism to cancel the identification of congesting packets if they no longer contribute to congestion after rerouting. We evaluated ECP using a network simulator modeling various configurations and realistic traffic patterns. Results show that ECP significantly improves congestion detection accuracy with a low error margin, enhancing DCQCN performance.