TPR: Traffic Pattern-Based Adaptive Routing for Dragonfly Networks

Document Type

Article

Publication Date

10-1-2018

Abstract

The Cray Cascade architecture uses Dragonfly as its interconnect topology and employs a globally adaptive routing scheme called UGAL. UGAL directs traffic based on link loads but may make inappropriate adaptive routing decisions in various situations, which degrades its performance. In this work, we propose traffic pattern-based adaptive routing (TPR) for Dragonfly that improves UGAL by incorporating a traffic pattern-based adaptation mechanism. The idea is to explicitly use the link usage statistics that are collected in performance counters to infer the traffic pattern, and to take the inferred traffic pattern plus link loads into consideration when making adaptive routing decisions. Our performance evaluation results on a diverse set of traffic conditions indicate that by incorporating the traffic pattern-based adaptation mechanism, TPR is much more effective in making adaptive routing decisions and achieves significant lower latency under low load and higher throughput under high load than its underlying UGAL.

Publication Title

IEEE Transactions on Multi-Scale Computing Systems

Volume

4

Issue

4

First Page

931

Last Page

943

Digital Object Identifier (DOI)

10.1109/TMSCS.2018.2877264

E-ISSN

23327766

Share

COinS