🧨 Generate Diffusers Inference code snippet tailored to your machine

Enter a Hugging Face Hub repo_id and your system specs to get started for inference. This tool uses Gemini to generate the code based on your settings. This is based on sayakpaul/auto-diffusers-docs.

Gemini Model

Select the model to generate the analysis.

Compute in 32-bit precision (caution ⚠️)

Consider applying caching for speed

Consider 8-bit/4-bit quantization

Model is compatible with torch.compile

Model and hardware support FP8 precision

Examples (Click to try)
Hugging Face Repo ID Gemini Model Disable BF16 (Use FP32) Enable lossy caching Allow Lossy Quantization Free System RAM (GB) Free GPU VRAM (GB) torch.compile() friendly fp8 friendly
  • Try changing to the model from Flash to Pro if the results are bad.
  • Please provide the VRAM and RAM details accurately as the suggestions depend on them.
  • As a rule of thumb, GPUs from RTX 4090 and later, are generally good for using torch.compile().
  • When lossy quantization isn't preferred try enabling caching. Caching can still be lossy, though.
  • To leverage FP8, the GPU needs to have a compute capability of at least 8.9.
  • Check out the following docs for optimization in Diffusers:


⛔️ Disclaimer: Large Language Models (LLMs) can make mistakes. The information provided is an estimate and should be verified. Always test the model on your target hardware to confirm actual memory requirements.