🧨 Generate Diffusers Inference code snippet tailored to your machine

Enter a Hugging Face Hub repo_id and your system specs to get started for inference. This tool uses Gemini to generate the code based on your settings. This is based on sayakpaul/auto-diffusers-docs.

Gemini Model

Select the model to generate the analysis.

Calculate using 32-bit precision instead of 16-bit.

Consider 8-bit/4-bit quantization.

Model is compatible with torch.compile.

Model and hardware support FP8 precision.

Examples (Click to try)
Hugging Face Repo ID Gemini Model Disable BF16 (Use FP32) Allow Lossy Quantization Free System RAM (GB) Free GPU VRAM (GB) torch.compile() friendly fp8 friendly
  • Try changing to the model from Flash to Pro if the results are bad.
  • Try to be as specific as possible about your local machine.
  • As a rule of thumb, GPUs from RTX 4090 and later, are generally good for using torch.compile().
  • To leverage FP8, the GPU needs to have a compute capability of at least 8.9.
  • Check out the following docs for optimization in Diffusers:

Generated Code

Your results will appear here...


⛔️ Disclaimer: Large Language Models (LLMs) can make mistakes. The information provided is an estimate and should be verified. Always test the model on your target hardware to confirm actual memory requirements.