Excited to share that our paper, “TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations,” will be presented by Jiexiong Guan (UTH,  University of William & Mary) at the International Conference of Supercomputing, which will take place in Salt Lake City, U.S.A, on June 8-11, 2025. This work is a collaboration between UTH, the University of William & Mary, and the University of Georgia.

Mainstream mobile GPUs (such as Qualcomm’s Adreno) usually have a 2.5D L1 texture cache that offers throughput superior to that of on-chip memory. However, to date, there is limited understanding of the performance features of such a 2.5D cache, which limits their optimization potential. TMModel introduces a novel performance modeling framework for mobile GPUs that combines micro-benchmarking, an analytical performance model, and a lightweight compiler to optimize DNN execution based on access patterns and GPU parameters. TMModel delivers up to 66× speedup for end-to-end on-device DNN training with significantly lower tuning cost than existing frameworks. As mobile devices grow more powerful, this work is a step towards efficient, real-time deep learning training directly on such devices.