keyboard_arrow_up
Acceleration through Fusion of AvgPool2D and Silu Kernels

Authors

Andreas Falkenberg, Dr Falkenberg Technology Consulting Inc, USA

Abstract

The need to accelerate LLM (large language models) requires the use of always advancing compiler technologies. Operator fusion is one of the promising techniques to considerably improve the throughput of LLMs. This paper discusses the impact of operator fusion on the direct operator performance. The paper compares throughputs between pure CPU implementation, versus two kernel implementations versus a fused single kernel solution for AvgPool2D fused with Silu.

Keywords

AvgPool2D, Silu, Kernel, AI, LLM, GPU, CPU

Full Text  Volume 15, Number 2