CUDA-L2 is a system that combines large language models (LLMs) and reinforcement learning (RL) to automatically optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels. CUDA-L2 ...
Abstract: While the Karatsuba algorithm reduces the complexity of large integer multiplication, the extra additions required minimize its benefits for smaller integers of more commonly-used bitwidths.
Abstract: An improved variant of the precise-integration time-domain (PITD) method is proposed to eliminate the inverse matrix calculation and optimize the storage burden with the help of sparse ...