[feat] IP Adapters (author @okotaku ) #5713

yiyixuxu · 2023-11-08T20:44:10Z

the author of this PR is @okotaku
and the original PR: #4944

this is a demo of alternative design (alterative to #4944) that add the image_projection layer to Unet

works with SD, SDXL

it works with text-to-image, image-to-image, inpaint, see text-to-image example below, and you can find examples for img2img here and inpaint here

from diffusers import StableDiffusionPipeline
import torch
from diffusers.utils import load_image

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.to("cuda")

image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")

generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality', 
    ip_adapter_image=image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50,
    generator=generator,
).images
images[0].save("yiyi_test_out.png")

It works with LCM-Lora out of box

from diffusers import DiffusionPipeline, LCMScheduler
import torch
from diffusers.utils import load_image

model_id =  "sd-dreambooth-library/herge-style"
lcm_lora_id = "latent-consistency/lcm-lora-sdv1-5"

pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)

pipe.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
pipe.load_lora_weights(lcm_lora_id)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

prompt = "best quality, high quality"
image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")
images = pipe(
    prompt=prompt,
    ip_adapter_image=image,
    num_inference_steps=4,
    guidance_scale=1,
).images[0]

work with controlnet

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
from diffusers.utils import load_image

controlnet_model_path = "lllyasviel/control_v11f1p_sd15_depth"
controlnet = ControlNetModel.from_pretrained(controlnet_model_path, torch_dtype=torch.float16)

pipeline = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16)
pipeline.to("cuda")

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/statue.png")
depth_map = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/depth.png")

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")

generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality', 
    image=depth_map,
    ip_adapter_image=image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50,
    generator=generator,
).images
images[0].save("yiyi_test_2_out.png")

ip_image	condition	output

work with animate diff

# animate diff + ip adapter
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif, load_image

# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
# load SD 1.5 based finetuned model
model_id = "Lykon/DreamShaper"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16)

# scheduler
scheduler = DDIMScheduler(
    clip_sample=False,
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="linear",
    timestep_spacing="trailing",
    steps_offset=1
)
pipe.scheduler = scheduler

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()

# load ip_adapter
pipe.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")

pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
pipe.load_lora_weights("guoyww/animatediff-motion-lora-tilt-up", adapter_name="tilt-up")
pipe.load_lora_weights("guoyww/animatediff-motion-lora-pan-left", adapter_name="pan-left")

seed = 42
image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")
images = [image] * 3
prompts = ["best quality, high quality"] * 3
negative_prompt = "bad quality, worst quality"
adapter_weights = [[0.75, 0.0, 0.0], [0.0, 0.0, 0.75], [0.0, 0.75, 0.75]]

output_frames = []
for prompt, image, adapter_weight in zip(prompts, images, adapter_weights):
    pipe.set_adapters(["zoom-out", "tilt-up", "pan-left"], adapter_weights=adapter_weight)
    output = pipe(
      prompt= prompt,
      num_frames=16,
      guidance_scale=7.5,
      num_inference_steps=30,
      ip_adapter_image = image,
      generator=torch.Generator("cpu").manual_seed(seed),
    )
    frames = output.frames[0]
    output_frames.extend(frames)

export_to_gif(output_frames, "test_out_animation.gif")

yiyixuxu · 2023-11-20T23:58:43Z

@marianbastiUNRN

I think it is fine for now
we would very much like to support face models. It won't take much work and we will ask some help from the community soon :)

let me know if you're interested in working on this

patrickvonplaten

Great job! Let's merge 🚀

* add ip-adapter --------- Co-authored-by: okotaku <to78314910@gmail.com> Co-authored-by: sayakpaul <spsayakpaul@gmail.com> Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

alexblattner · 2023-11-27T09:49:12Z

I've been working on this for 2 weeks and now it's built in.... Thanks haha

sayakpaul · 2023-11-29T01:42:58Z

Open a new issue for this. It's ideal for users to comment on PRs after they have been merged.

TonyLianLong · 2023-11-30T07:26:26Z

This PR seems to break the positional arguments for super calls as it adds a image_encoder before requires_safety_checker. An example of what breaks after the change can be shown here (with the commit fixing it): #5993

We might want to clarify this in the release note for the next release.

yiyixuxu · 2023-11-30T16:57:36Z

hi @TonyLianLong
I looked at #5993 I think it's because it used StableDiffusionPipeline as the base class, instead of DiffusionPipeline

MackorLab · 2023-12-04T09:53:31Z

Hello, I’m just starting to program in Python and I still don’t understand exactly how to do it correctly
Please tell me Can I save the finished file in mp4 format?

okaris · 2023-12-07T05:56:37Z

@marianbastiUNRN

I think it is fine for now we would very much like to support face models. It won't take much work and we will ask some help from the community soon :)

let me know if you're interested in working on this

@yiyixuxu I'm interested in implementing this, can you guide me to the steps necessary please?

* add ip-adapter --------- Co-authored-by: okotaku <to78314910@gmail.com> Co-authored-by: sayakpaul <spsayakpaul@gmail.com> Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

xhinker · 2023-12-28T07:41:43Z

oh img2img is really cool

from transformers import CLIPVisionModelWithProjection, CLIPImageProcessor
from diffusers import StableDiffusionImg2ImgPipeline
import torch
from diffusers.utils import load_image
from PIL import Image

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", 
    subfolder="models/image_encoder",
    torch_dtype=torch.float16,
).to("cuda")

pipeline = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    image_encoder = image_encoder, torch_dtype=torch.float16, safety_checker=None)
pipeline.to("cuda")

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/vermeer.jpg")
ip_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/river.png")


pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality', 
    image = image,
    ip_adapter_image=ip_image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_images_per_prompt=1, 
    num_inference_steps=50,
    generator=generator,
    strength=0.6,
).images
images[0].save("yiyi_test_3_out.png")

ip_image image output

Hi, @yiyixuxu

Could you also provide an img2img IPAdaptor sample for SDXL? I always got below error when using SDXL. Thanks!

RuntimeError: mat1 and mat2 shapes cannot be multiplied (514x1664 and 1280x1280)

xhinker · 2023-12-28T07:55:24Z

oh img2img is really cool

from transformers import CLIPVisionModelWithProjection, CLIPImageProcessor
from diffusers import StableDiffusionImg2ImgPipeline
import torch
from diffusers.utils import load_image
from PIL import Image

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", 
    subfolder="models/image_encoder",
    torch_dtype=torch.float16,
).to("cuda")

pipeline = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    image_encoder = image_encoder, torch_dtype=torch.float16, safety_checker=None)
pipeline.to("cuda")

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/vermeer.jpg")
ip_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/river.png")


pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality', 
    image = image,
    ip_adapter_image=ip_image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_images_per_prompt=1, 
    num_inference_steps=50,
    generator=generator,
    strength=0.6,
).images
images[0].save("yiyi_test_3_out.png")

ip_image image output

Hi, @yiyixuxu

Could you also provide an img2img IPAdaptor sample for SDXL? I always got below error when using SDXL. Thanks!

RuntimeError: mat1 and mat2 shapes cannot be multiplied (514x1664 and 1280x1280)

Never mind, I figured it out, I need to use the sd_models' image encode explicitly. like this:

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    <IP-Adapter Model Path>
    subfolder="models/image_encoder",
    torch_dtype=torch.float16,
).to("cuda")

pipeline = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    <pretrain model path>
    , torch_dtype=torch.float16
    , image_encoder      = image_encoder
)
pipeline.to("cuda")

thibaudart · 2024-01-10T09:40:13Z

is it possible to load multiple image as reference for IP adapter?

patrickvonplaten · 2024-01-11T13:44:34Z

Hey @thibaudart,

Hope you're doing well - we've just recently opened the Discussion tab on the Diffusers' repo: https://github.com/huggingface/diffusers/discussions
Would you mind posting your question there?

thibaudart · 2024-01-12T04:18:42Z

Hey @thibaudart,

Hope you're doing well - we've just recently opened the Discussion tab on the Diffusers' repo: https://github.com/huggingface/diffusers/discussions Would you mind posting your question there?

of course

* add ip-adapter --------- Co-authored-by: okotaku <to78314910@gmail.com> Co-authored-by: sayakpaul <spsayakpaul@gmail.com> Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

xingyouxin · 2024-05-13T08:07:43Z

For controlnet and ip-Adapter, I have a question about the multi-computation by using a batch size, e.g., batch_size = 4. I try to put image, prompt, and generator lists, etc. into the pipeline. But the result failed with an error: ValueError: ip_adapter_image must have same length as the number of IP Adapters. Got 4 images and 1 IP Adapters.

Thus, maybe the multi-computation by using a batch size is not added in this project. I am not sure. Could anyone help me? Thanks.

asomoza · 2024-05-13T09:07:56Z

it would be better if you open a new issue with this, also you will need to provide us with a minimal reproducible code.

Without it, I can say that the error message says it all, you are passing 4 images to the ip adapters but you're only loading one ip adapter.

Probably the error lies in how are you passing the images for the batch.

xingyouxin · 2024-05-13T13:22:52Z

it would be better if you open a new issue with this, also you will need to provide us with a minimal reproducible code.

Without it, I can say that the error message says it all, you are passing 4 images to the ip adapters but you're only loading one ip adapter.

Probably the error lies in how are you passing the images for the batch.

Hello, Mr. asomoza. Thanks for your reply. With your help, I have taken some tests but still failed. So I open an issue about the details.

xingyouxin · 2024-05-14T02:58:42Z

it would be better if you open a new issue with this, also you will need to provide us with a minimal reproducible code.

Without it, I can say that the error message says it all, you are passing 4 images to the ip adapters but you're only loading one ip adapter.

Probably the error lies in how are you passing the images for the batch.

Dear asomoza, it seems that I have figured out my problem. Finally, I find that the ip-Adapter embedding is not supported to work with a batch of images separately. It deals with all the images in one batch uniformly. Thus, the better way is to embed the adapter images one by one and then cat (torch.cat) them up. Then we pass the catted embeddings into our pipeline to generate images in one batch separately. The details can be seen in this issue #7933. Thank you very much.

okotaku and others added 30 commits September 8, 2023 12:14

add ip-adapter

08a1828

modularize.

c4646f8

add to inits.

f3755d4

fix

5887af0

fix

f9aaa54

fix

a45292b

fix

023c2b7

fix

8fe3064

fix

dded7c4

fix

651302b

fix

f10eb25

fix

f051c9e

device placement

6031383

device placement

95e38ac

device placement fix.

3d69688

Merge branch 'main' into feat/ip_adapter

cacee6d

fix import

351180f

composable ip adapter module

2e83d6c

add image_encoder to sd as optional components

1d64cb8

add image_prompt arg

70fae5c

move image_projection to unet, refactor

3aaaa23

update comments

2154d01

fix

2807ee3

make image_encoder default to None.

bc52810

fully delegate the image encoding logic.

eaf94bb

Merge branch 'main' into feat/ip_adapter

c22cd90

debug

7cf7f70

fix

03e2961

fix

982a557

fix:

6059099

Merge branch 'main' into ip-adapter

6645776

patrickvonplaten approved these changes Nov 21, 2023

View reviewed changes

yiyixuxu merged commit ba352ae into main Nov 21, 2023
22 checks passed

yiyixuxu deleted the ip-adapter branch November 21, 2023 17:34

yiyixuxu mentioned this pull request Nov 21, 2023

[IP-Adapter] adding IP adapter support to all ControlNet and T2I pipelines #5884

Open

juancopi81 mentioned this pull request Nov 21, 2023

Support of ip-adapter to the StableDiffusionControlNetInpaintPipeline #5887

Merged

6 tasks

okotaku mentioned this pull request Nov 23, 2023

[Feature] Update diffusers IP Adapter okotaku/diffengine#99

Merged

4 tasks

a-r-r-o-w mentioned this pull request Nov 27, 2023

IP adapter support for most pipelines #5900

Merged

6 tasks

vladmandic mentioned this pull request Nov 29, 2023

Support IP Adapter for AnimateDiffPipeline #5976

Closed

TonyLianLong mentioned this pull request Nov 30, 2023

Fix LLMGroundedDiffusionPipeline super class arguments #5993

Merged

6 tasks

kolabearafk mentioned this pull request Dec 29, 2023

Notebook Example For IP-Adapter Usage In StableDiffusionPipeline openvinotoolkit/openvino_notebooks#1584

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] IP Adapters (author @okotaku ) #5713

[feat] IP Adapters (author @okotaku ) #5713

yiyixuxu commented Nov 8, 2023 •

edited

Loading

yiyixuxu commented Nov 20, 2023

patrickvonplaten left a comment

alexblattner commented Nov 27, 2023

sayakpaul commented Nov 29, 2023

TonyLianLong commented Nov 30, 2023

yiyixuxu commented Nov 30, 2023

MackorLab commented Dec 4, 2023

okaris commented Dec 7, 2023

xhinker commented Dec 28, 2023

xhinker commented Dec 28, 2023

thibaudart commented Jan 10, 2024

patrickvonplaten commented Jan 11, 2024

thibaudart commented Jan 12, 2024

xingyouxin commented May 13, 2024

asomoza commented May 13, 2024

xingyouxin commented May 13, 2024

xingyouxin commented May 14, 2024 •

edited

Loading

[feat] IP Adapters (author @okotaku ) #5713

[feat] IP Adapters (author @okotaku ) #5713

Conversation

yiyixuxu commented Nov 8, 2023 • edited Loading

works with SD, SDXL

It works with LCM-Lora out of box

work with controlnet

work with animate diff

yiyixuxu commented Nov 20, 2023

patrickvonplaten left a comment

Choose a reason for hiding this comment

alexblattner commented Nov 27, 2023

sayakpaul commented Nov 29, 2023

TonyLianLong commented Nov 30, 2023

yiyixuxu commented Nov 30, 2023

MackorLab commented Dec 4, 2023

okaris commented Dec 7, 2023

xhinker commented Dec 28, 2023

xhinker commented Dec 28, 2023

thibaudart commented Jan 10, 2024

patrickvonplaten commented Jan 11, 2024

thibaudart commented Jan 12, 2024

xingyouxin commented May 13, 2024

asomoza commented May 13, 2024

xingyouxin commented May 13, 2024

xingyouxin commented May 14, 2024 • edited Loading

yiyixuxu commented Nov 8, 2023 •

edited

Loading

xingyouxin commented May 14, 2024 •

edited

Loading