TPR: Traffic Pattern-Based Adaptive Routing for Dragonfly Networks
Document Type
Article
Publication Date
10-1-2018
Abstract
The Cray Cascade architecture uses Dragonfly as its interconnect topology and employs a globally adaptive routing scheme called UGAL. UGAL directs traffic based on link loads but may make inappropriate adaptive routing decisions in various situations, which degrades its performance. In this work, we propose traffic pattern-based adaptive routing (TPR) for Dragonfly that improves UGAL by incorporating a traffic pattern-based adaptation mechanism. The idea is to explicitly use the link usage statistics that are collected in performance counters to infer the traffic pattern, and to take the inferred traffic pattern plus link loads into consideration when making adaptive routing decisions. Our performance evaluation results on a diverse set of traffic conditions indicate that by incorporating the traffic pattern-based adaptation mechanism, TPR is much more effective in making adaptive routing decisions and achieves significant lower latency under low load and higher throughput under high load than its underlying UGAL.
Publication Title
IEEE Transactions on Multi-Scale Computing Systems
Volume
4
Issue
4
First Page
931
Last Page
943
Digital Object Identifier (DOI)
10.1109/TMSCS.2018.2877264
E-ISSN
23327766
Citation Information
Faizian, Alfaro, J. F., Rahman, M. S., Mollah, M. A., Yuan, X., Pakin, S., & Lang, M. (2018). TPR: Traffic Pattern-Based Adaptive Routing for Dragonfly Networks. IEEE Transactions on Multi-Scale Computing Systems, 4(4), 931–943. https://doi.org/10.1109/TMSCS.2018.2877264