EASTER: Learning to Split Transformers at the Edge Robustly

Abstract

Prevalent large transformer models present significant computational challenges for resource-constrained devices at the Edge. While distributing the workload of deep learning models across multiple edge devices has been extensively studied, these works typically overlook the impact of failures of edge devices. Unpredictable failures, due to, e.g., connectivity issues or discharged batteries, can compromise the reliability of inference serving at the Edge. In this paper, we introduce a novel methodology, called EASTER, designed to learn robust distribution strategies for transformer models against device failures that consider the trade-off between robustness (i.e., maintaining model functionality against failures) and resource utilization (considering memory usage and computations). We evaluate EASTER with three representative transformers – ViT, GPT-2, and Vicuna – under device failures. Our results demonstrate EASTER’s efficiency in memory usage, and possible end-to-end latency improvement for inference across multiple edge devices while preserving model accuracy as much as possible under device failures.

Publication
In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (IEEE TCAD), Vol. 43 (Nr. 11), 2024