Authors
Andreas Falkenberg, Dr Falkenberg Technology Consulting Inc, USA
Abstract
The need to accelerate LLM (large language models) requires the use of always advancing compiler technologies. Operator fusion is one of the promising techniques to considerably improve the throughput of LLMs. This paper discusses the impact of operator fusion on the direct operator performance. The paper compares throughputs between pure CPU implementation, versus two kernel implementations versus a fused single kernel solution for AvgPool2D fused with Silu.
Keywords
AvgPool2D, Silu, Kernel, AI, LLM, GPU, CPU