报告人:微软亚洲研究院 曹婷研究员
报告地点:校本部升华后楼409
报告时间:2021年4月2日(周五)上午10点
报告题目:EmpowerEfficientDNNInferenceonEdgeDevices
个人简介:
Ting Cao is now a Senior Researcher in HEX Group of MSRA. Her research interests include big data and deep learning systems, Hardware/Software co-design, high-level language implementation, and energy efficient hardware design.
She received her PhD from Research School of Computer Science, the Australian National University. Before joining MSRA, she worked in the Compiler and Computing Language Lab in 2012 Labs, Huawei Technologies. She has publications in top-tier conferences, such as PLDI, ISCA, ASPLOS, MobiCom, and MobiSys. She has also served as PC or EPC in a range of conferences such as PLDI, OOPSLA, VEE, and ISMM.
报告摘要:
The boom of edge AI applications has spawned a great many innovations for efficient neural network (NN) algorithms and inference frameworks. Inference on edge devices has specific challenges compared to servers, such as diverse edge AI platforms, weak hardware resources and contradictory requirements for energy and performance. However, existing works do not well consider these challenges, which leads to suboptimal performance.
This talk will introduce our recent progress which aims to empower efficient edge inference by incorporating edge hardware features into the process of NN algorithm and framework design. To enable efficient NN design for diverse edge AI platforms, as a first stage, we propose to use a characterization method to quantify HW behaviour of NN design space. The results are used as heuristics for NN design. To apply our heuristics in a real NN design work as a case study, our designed models have higher accuracy and efficiency on every experimental platform with much lower design cost. Furthermore, to improve edge CPU and GPU utilization for inference, we propose several novel techniques in edge inference frameworks, including cost-model directed block partition, asymmetry aware scheduling, rule-based work group size selection, etc. The inference performance is improved by 87% on the CPU and 15% on the GPU on average compared to the state-of-the-art methods.