Image generation AI 'Stable Diffusion' turned out to be about 30% faster with TensorFlow and KerasCV
Keras is a deep learning API written in Python that runs on the machine learning platform TensorFlow . It is reported that Stable Diffusion of image generation AI will be about 30% faster by using the modular building block ' KerasCV ' for extending this Keras and performing image classification, object detection, image division, image data reinforcement, etc. , from the research team by the developers of KerasCV.
High-performance image generation using Stable Diffusion in KerasCV
Stable Diffusion is an open source image generation AI that was released to the public in August 2022, and can automatically generate photos and pictures with content according to the entered keyword (prompt).
AI ``Stable Diffusion'' that creates pictures and photos that look like they were drawn by humans according to keywords has been released to the public, so I tried using it-GIGAZINE
Stable Diffusion is a mechanism that converts what is learned from a dataset back to noise, and generates an image by removing noise while being directed by a given prompt or image. You can find out about the general mechanism by reading the following article.
An easy-to-understand illustration of ``how to draw a picture'' that you can understand if you know how to master the image generation AI ``Stable Diffusion''-GIGAZINE
Originally, the basis of Stable Diffusion is super-resolution technology. For example, when simply enlarging a low-resolution image, even if the size increases, the resolution remains the same, so 'roughness' stands out. Therefore, machine learning algorithms have long been used to remove this 'roughness' noise and increase the resolution.
Taking this super-resolution technology even further, we started with the latent diffusion model, including Stable Diffusion, from the question, ``What will be generated if the model used in super-resolution is loaded with complete noise?'' .
Therefore, the research team implemented Stable Diffusion that works with PyTorch / Hugging Face Diffusers with TensorFlow / KerasCV, measured the execution time when generating an image with 50 generation steps, and compared it. The warm start results are as follows. In any case, the KerasCV implementation is faster, especially when using Tesla T4 as the GPU, a 30% improvement compared to Hugging Face Diffusers.
|Warm start||NVIDIA Tesla T4 (VRAM 16GB GDDR6)||NVIDIA Tesla V100 (VRAM 32GB HBM2)|
|KerasCV||28.97 seconds||12.45 seconds|
|Hugging Face Diffusers||41.33 seconds||12.72 seconds|
Also, the measurement results at cold start are as follows, and KerasCV takes longer to process. However, since the cold start is only the first time, the research team says that it can be ignored in a production environment that actually operates Stable Diffusion many times.
|at cold start||NVIDIA Tesla T4 (VRAM 16GB GDDR6)||NVIDIA Tesla V100 (VRAM 32GB HBM2)|
|KerasCV||83.47 seconds||76.43 seconds|
|Hugging Face Diffusers||46.27 seconds||13.90 seconds|
TensorFlow also has a built-in XLA (Accelerated Linear Algebra) compiler. By using this XLA compiler, the speed will be further improved significantly. Normally, it took about 8.17 seconds to generate an image, but if you use only the XLA compiler, it will be reduced to about 6.26 seconds.
In addition, KerasCV can easily enable 'mixed precision', which performs calculations with float16 precision and stores weights in float32 format. Combining this mixed precision with the XLA compiler further reduces the image generation time to about 4.25 seconds, the research team reports.
The repository of Stable Diffusion that can be run with TensorFlow and KerasCV is published on GitHub.
GitHub - divamgupta/stable-diffusion-tensorflow: Stable Diffusion in TensorFlow / Keras