Cointime

Download App
iOS & Android

Hugging Face Launches Kernels, GPU Operators Installed with One Line of Code

According to monitoring by Dongcha Beating, Hugging Face CEO Clem Delangue announced the official launch of Kernels on Hub. GPU operators are low-level optimization codes that allow graphics cards to achieve extreme speeds, accelerating inference and training by 1.7 to 2.5 times. However, installation has been a nightmare: for example, the commonly used FlashAttention requires about 96GB of memory and several hours for local compilation, and even slight mismatches in PyTorch or CUDA versions can lead to errors, causing most developers to get stuck at this installation step. Kernels Hub moves the compilation to the cloud. Hugging Face pre-compiles the operators for various graphics cards and system environments, allowing developers to write one line of code, with the Hub automatically matching the hardware environment and downloading the pre-compiled files for immediate use within seconds. The same process can load multiple different versions of operators and is compatible with torch.compile. Kernels was tested and launched in June last year and has now been upgraded to a first-level repository type on Hub, alongside Models, Datasets, and Spaces. Currently, there are 61 pre-compiled operators covering common scenarios such as attention mechanisms, normalization, mixture of experts routing, and quantization, supporting four hardware acceleration platforms: NVIDIA CUDA, AMD ROCm, Apple Metal, and Intel XPU. It has been integrated into Hugging Face's inference framework TGI and Transformers library.

Comments

All Comments

Recommended for you