Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latte: Latent Diffusion Transformer for Video Generation #8404

Merged
merged 87 commits into from
Jul 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
73f42b7
add Latte to diffusers
maxin-cn Jun 5, 2024
d19cb1d
Merge branch 'main' of https://github.com/huggingface/diffusers into …
maxin-cn Jun 5, 2024
cab96fd
remove print
maxin-cn Jun 5, 2024
7051b29
remove print
maxin-cn Jun 5, 2024
a9f4158
remove print
maxin-cn Jun 5, 2024
cda00ac
Merge branch 'main' of github.com:maxin-cn/diffusers into Latte
maxin-cn Jun 5, 2024
4131d80
remove unuse codes
maxin-cn Jun 6, 2024
e1800ca
remove layer_norm_latte and add a flag
maxin-cn Jun 6, 2024
4c69b5d
remove layer_norm_latte and add a flag
maxin-cn Jun 6, 2024
5d470ea
update latte_pipeline
maxin-cn Jun 6, 2024
241f1c1
update latte_pipeline
maxin-cn Jun 6, 2024
464f36d
remove unuse squeeze
maxin-cn Jun 6, 2024
aa83959
Merge branch 'main' of github.com:maxin-cn/diffusers into Latte
maxin-cn Jun 7, 2024
78dc60e
add norm_hidden_states.ndim == 2: # for Latte
maxin-cn Jun 7, 2024
da5767f
Merge branch 'main' of github.com:maxin-cn/diffusers into Latte
maxin-cn Jun 10, 2024
f05b329
Merge branch 'main' of github.com:maxin-cn/diffusers into Latte
maxin-cn Jun 11, 2024
443a1da
fixed test latte pipeline bugs
maxin-cn Jun 11, 2024
b11424e
Merge branch 'main' of github.com:maxin-cn/diffusers into Latte
maxin-cn Jun 18, 2024
5c34446
fixed test latte pipeline bugs
maxin-cn Jun 18, 2024
8535e55
delete sh
maxin-cn Jun 19, 2024
5c62255
Merge branch 'main' into Latte
sayakpaul Jun 21, 2024
8afed27
add doc for latte
maxin-cn Jun 21, 2024
0eed6df
Merge branch 'Latte' of github.com:maxin-cn/diffusers into Latte
maxin-cn Jun 21, 2024
e25a8ad
add licensing
maxin-cn Jun 21, 2024
26faa51
Move Transformer3DModelOutput to modeling_outputs
maxin-cn Jun 21, 2024
dc27f80
give a default value to sample_size
maxin-cn Jun 21, 2024
014d848
remove the einops dependency
maxin-cn Jun 21, 2024
c07ae15
change norm2 for latte
maxin-cn Jun 23, 2024
0a76232
modify pipeline of latte
maxin-cn Jun 24, 2024
e3fff64
update test for Latte
maxin-cn Jun 24, 2024
278a993
modify some codes for latte
maxin-cn Jun 25, 2024
1583b5d
modify for Latte pipeline
maxin-cn Jun 27, 2024
f69064c
modify for Latte pipeline
maxin-cn Jun 28, 2024
5508be1
modify for Latte pipeline
maxin-cn Jul 2, 2024
2c58134
modify for Latte pipeline
maxin-cn Jul 2, 2024
de3fa71
modify for Latte pipeline
maxin-cn Jul 2, 2024
38191c8
modify for Latte pipeline
maxin-cn Jul 2, 2024
ba031e6
modify for Latte pipeline
maxin-cn Jul 3, 2024
0e0ab56
modify for Latte pipeline
maxin-cn Jul 3, 2024
61ecb4e
modify for Latte pipeline
maxin-cn Jul 3, 2024
cd23ece
modify for Latte pipeline
maxin-cn Jul 3, 2024
82fde6a
Merge branch 'main' into Latte
yiyixuxu Jul 8, 2024
413b2bd
modify for Latte pipeline
maxin-cn Jul 8, 2024
001b8d0
Merge branch 'Latte' of github.com:maxin-cn/diffusers into Latte
maxin-cn Jul 8, 2024
81a3388
modify for Latte pipeline
maxin-cn Jul 9, 2024
7c44a73
modify for Latte pipeline
maxin-cn Jul 9, 2024
870c04f
modify for Latte pipeline
maxin-cn Jul 9, 2024
6c788ae
modify for Latte pipeline
maxin-cn Jul 9, 2024
8457e8b
modify for Latte pipeline
maxin-cn Jul 9, 2024
79b4b75
modify for Latte pipeline
maxin-cn Jul 9, 2024
4703233
modify for Latte pipeline
maxin-cn Jul 9, 2024
aebb542
modify for Latte pipeline
maxin-cn Jul 9, 2024
cf0ce36
Merge branch 'main' into Latte
a-r-r-o-w Jul 9, 2024
a131b83
modify for Latte pipeline
maxin-cn Jul 9, 2024
ab050a1
Merge branch 'Latte' of github.com:maxin-cn/diffusers into Latte
maxin-cn Jul 9, 2024
a0b1778
modify for Latte pipeline
maxin-cn Jul 9, 2024
0dd25f3
modify for Latte pipeline
maxin-cn Jul 9, 2024
85fadf4
modify for Latte pipeline
maxin-cn Jul 9, 2024
180b9d0
modify for Latte pipeline
maxin-cn Jul 9, 2024
250c6c6
modify for Latte pipeline
maxin-cn Jul 10, 2024
78e1d57
modify for Latte pipeline
maxin-cn Jul 10, 2024
0acce55
modify for Latte pipeline
maxin-cn Jul 10, 2024
4f12b4b
video_length -> num_frames; update prepare_latents copied from
a-r-r-o-w Jul 10, 2024
83ed40b
make fix-copies
a-r-r-o-w Jul 10, 2024
d504add
make style
a-r-r-o-w Jul 10, 2024
3eb11f1
typo: videe -> video
a-r-r-o-w Jul 10, 2024
9790fcd
update
a-r-r-o-w Jul 10, 2024
3477028
Merge pull request #1 from a-r-r-o-w/latte
maxin-cn Jul 10, 2024
d8e8750
modify for Latte pipeline
maxin-cn Jul 10, 2024
6028ca9
modify latte pipeline
maxin-cn Jul 10, 2024
76e4f1d
modify latte pipeline
maxin-cn Jul 10, 2024
c85deea
modify latte pipeline
maxin-cn Jul 10, 2024
fcb8f21
modify latte pipeline
maxin-cn Jul 10, 2024
53d721f
modify for Latte pipeline
maxin-cn Jul 10, 2024
29968ff
Merge branch 'Latte' of github.com:maxin-cn/diffusers into Latte
maxin-cn Jul 10, 2024
2ea37c9
Delete .vscode directory
maxin-cn Jul 10, 2024
260bb5c
Merge branch 'Latte' of https://github.com/maxin-cn/diffusers into la…
a-r-r-o-w Jul 11, 2024
6ffc968
make style
a-r-r-o-w Jul 11, 2024
1f09a16
make fix-copies
a-r-r-o-w Jul 11, 2024
034bcdc
add latte transformer 3d to docs _toctree.yml
a-r-r-o-w Jul 11, 2024
521ed5c
update example
a-r-r-o-w Jul 11, 2024
7988119
Merge pull request #2 from a-r-r-o-w/latte-2
maxin-cn Jul 11, 2024
12c71bd
reduce frames for test
maxin-cn Jul 11, 2024
5455ea9
fixed bug of _text_preprocessing
maxin-cn Jul 11, 2024
e54faa4
set num frame to 1 for testing
maxin-cn Jul 11, 2024
e76f5ab
remove unuse print
maxin-cn Jul 11, 2024
95dc0d1
add text = self._clean_caption(text) again
maxin-cn Jul 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -175,4 +175,4 @@ tags
.ruff_cache

# wandb
wandb
wandb
a-r-r-o-w marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,8 @@
title: DiTTransformer2DModel
- local: api/models/hunyuan_transformer2d
title: HunyuanDiT2DModel
- local: api/models/latte_transformer3d
title: LatteTransformer3DModel
- local: api/models/lumina_nextdit2d
title: LuminaNextDiT2DModel
- local: api/models/transformer_temporal
Expand Down
19 changes: 19 additions & 0 deletions docs/source/en/api/models/latte_transformer3d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

## LatteTransformer3DModel

A Diffusion Transformer model for 3D data from [Latte](https://github.com/Vchitect/Latte).

## LatteTransformer3DModel

[[autodoc]] LatteTransformer3DModel
4 changes: 4 additions & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@
"HunyuanDiT2DMultiControlNetModel",
"I2VGenXLUNet",
"Kandinsky3UNet",
"LatteTransformer3DModel",
"LuminaNextDiT2DModel",
"ModelMixin",
"MotionAdapter",
Expand Down Expand Up @@ -269,6 +270,7 @@
"KandinskyV22PriorPipeline",
"LatentConsistencyModelImg2ImgPipeline",
"LatentConsistencyModelPipeline",
"LattePipeline",
"LDMTextToImagePipeline",
"LEditsPPPipelineStableDiffusion",
"LEditsPPPipelineStableDiffusionXL",
Expand Down Expand Up @@ -513,6 +515,7 @@
HunyuanDiT2DMultiControlNetModel,
I2VGenXLUNet,
Kandinsky3UNet,
LatteTransformer3DModel,
LuminaNextDiT2DModel,
ModelMixin,
MotionAdapter,
Expand Down Expand Up @@ -672,6 +675,7 @@
KandinskyV22PriorPipeline,
LatentConsistencyModelImg2ImgPipeline,
LatentConsistencyModelPipeline,
LattePipeline,
LDMTextToImagePipeline,
LEditsPPPipelineStableDiffusion,
LEditsPPPipelineStableDiffusionXL,
Expand Down
2 changes: 2 additions & 0 deletions src/diffusers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
_import_structure["transformers.dit_transformer_2d"] = ["DiTTransformer2DModel"]
_import_structure["transformers.dual_transformer_2d"] = ["DualTransformer2DModel"]
_import_structure["transformers.hunyuan_transformer_2d"] = ["HunyuanDiT2DModel"]
_import_structure["transformers.latte_transformer_3d"] = ["LatteTransformer3DModel"]
_import_structure["transformers.lumina_nextdit2d"] = ["LuminaNextDiT2DModel"]
_import_structure["transformers.pixart_transformer_2d"] = ["PixArtTransformer2DModel"]
_import_structure["transformers.prior_transformer"] = ["PriorTransformer"]
Expand Down Expand Up @@ -86,6 +87,7 @@
DiTTransformer2DModel,
DualTransformer2DModel,
HunyuanDiT2DModel,
a-r-r-o-w marked this conversation as resolved.
Show resolved Hide resolved
LatteTransformer3DModel,
LuminaNextDiT2DModel,
PixArtTransformer2DModel,
PriorTransformer,
Expand Down
7 changes: 5 additions & 2 deletions src/diffusers/models/attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,10 @@ def __init__(
out_bias=attention_out_bias,
) # is self-attn if encoder_hidden_states is none
else:
self.norm2 = None
if norm_type == "ada_norm_single": # For Latte
self.norm2 = nn.LayerNorm(dim, norm_eps, norm_elementwise_affine)
else:
self.norm2 = None
self.attn2 = None

# 3. Feed-forward
Expand Down Expand Up @@ -439,7 +442,6 @@ def forward(
).chunk(6, dim=1)
norm_hidden_states = self.norm1(hidden_states)
norm_hidden_states = norm_hidden_states * (1 + scale_msa) + shift_msa
norm_hidden_states = norm_hidden_states.squeeze(1)
else:
raise ValueError("Incorrect norm used")

Expand All @@ -456,6 +458,7 @@ def forward(
attention_mask=attention_mask,
**cross_attention_kwargs,
)

if self.norm_type == "ada_norm_zero":
attn_output = gate_msa.unsqueeze(1) * attn_output
elif self.norm_type == "ada_norm_single":
Expand Down
1 change: 1 addition & 0 deletions src/diffusers/models/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from .dit_transformer_2d import DiTTransformer2DModel
from .dual_transformer_2d import DualTransformer2DModel
from .hunyuan_transformer_2d import HunyuanDiT2DModel
from .latte_transformer_3d import LatteTransformer3DModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@maxin-cn maxin-cn Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from .lumina_nextdit2d import LuminaNextDiT2DModel
from .pixart_transformer_2d import PixArtTransformer2DModel
from .prior_transformer import PriorTransformer
Expand Down
Loading
Loading