Authors
Andreas Falkenberg, Dr Falkenberg Technology Consulting Inc, USA
Abstract
Acceleration of LLMs (large language models) requires the use of always advancing compiler technologies. The fusion of operators is one of the promising techniques to considerably improve the throughput of LLMs. This paper discusses the impact of operator fusion on the direct operator performance. The paper compares throughputs between pure CPU implementation, versus two kernel implementations versus a fused single kernel solution for AvgPool2D fused with ReLU.
Keywords
AvgPool2D, ReLU, Kernel, AI, LLM, GPU, CPU