TPR: Traffic Pattern-Based Adaptive Routing for Dragonfly Networks
The Cray Cascade architecture uses Dragonfly as its interconnect topology and employs a globally adaptive routing scheme called UGAL. UGAL directs traffic based on link loads but may make inappropriate adaptive routing decisions in various situations, which degrades its performance. In this work, we propose traffic pattern-based adaptive routing (TPR) for Dragonfly that improves UGAL by incorporating a traffic pattern-based adaptation mechanism. The idea is to explicitly use the link usage statistics that are collected in performance counters to infer the traffic pattern, and to take the inferred traffic pattern plus link loads into consideration when making adaptive routing decisions. Our performance evaluation results on a diverse set of traffic conditions indicate that by incorporating the traffic pattern-based adaptation mechanism, TPR is much more effective in making adaptive routing decisions and achieves significant lower latency under low load and higher throughput under high load than its underlying UGAL.
IEEE Transactions on Multi-Scale Computing Systems
Digital Object Identifier (DOI)
Faizian, Alfaro, J. F., Rahman, M. S., Mollah, M. A., Yuan, X., Pakin, S., & Lang, M. (2018). TPR: Traffic Pattern-Based Adaptive Routing for Dragonfly Networks. IEEE Transactions on Multi-Scale Computing Systems, 4(4), 931–943. https://doi.org/10.1109/TMSCS.2018.2877264