Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run the code with a certain batch? #53

Open
Cram3r95 opened this issue Oct 10, 2024 · 1 comment
Open

How to run the code with a certain batch? #53

Cram3r95 opened this issue Oct 10, 2024 · 1 comment

Comments

@Cram3r95
Copy link

Cram3r95 commented Oct 10, 2024

This code is working:

import torch
import pdb

from xlstm import (
    xLSTMBlockStack,
    xLSTMBlockStackConfig,
    mLSTMBlockConfig,
    mLSTMLayerConfig,
    sLSTMBlockConfig,
    sLSTMLayerConfig,
    FeedForwardConfig,
)

cfg = xLSTMBlockStackConfig(
    mlstm_block=mLSTMBlockConfig(
        mlstm=mLSTMLayerConfig(
            conv1d_kernel_size=4, qkv_proj_blocksize=4, num_heads=4
        )
    ),
    slstm_block=sLSTMBlockConfig(
        slstm=sLSTMLayerConfig(
            backend="cuda",
            num_heads=4,
            conv1d_kernel_size=4,
            bias_init="powerlaw_blockdependent",
        ),
        feedforward=FeedForwardConfig(proj_factor=1.3, act_fn="gelu"),
    ),
    context_length=256,
    num_blocks=7,
    embedding_dim=128,
    slstm_at=[1],

)

xlstm_stack = xLSTMBlockStack(cfg)

x = torch.randn(4, 256, 128).to("cuda")
xlstm_stack = xlstm_stack.to("cuda")
y = xlstm_stack(x)
pdb.set_trace()
y.shape == x.shape

But the network continously reports error if you try to add a batch size to the input, e.g.:

x = torch.randn(32, 4, 256, 128).to("cuda") # (where 32 is the batch size)

You get the following error:

File "/home/carlosgomezh/.local/lib/python3.10/site-packages/xlstm/blocks/mlstm/layer.py", line 102, in forward
B, S, _ = x.shape
ValueError: too many values to unpack (expected 3)

In your case it is a backbone processing a single tensor.

Is it possible to process something like this:

if __name__ == "__main__":
    # Define model hyperparameters
    input_dim = 6  
    hidden_dim = 128  
    output_dim = 1  
    num_layers = 2  
    context_length = 10  

    # Instantiate the model
    model = xLSTM(input_dim, hidden_dim, output_dim, num_layers, context_length).to('cuda')
    
    # Print the model structure
    print(model)
    
    # Example dummy input (batch_size=32, sequence_length=10, input_dim=6)
    dummy_input = torch.randn(32, context_length, input_dim).to('cuda')
    
    # Forward pass through the model
    output = model(dummy_input)
    print(output.shape)

Where you have 6 inputs, the h_dim of the network is 128 (for example), output dim is 1, and the context length is 10? Obviously 32 represents the batch size.

If I run that code, I get the following error:

File "/home/carlosgomezh/.local/lib/python3.10/site-packages/torch/nn/functional.py", line 2573, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[128], expected input with shape [*, 128], but got input of size[32, 10, 6]

@kpoeppel @maximilianmbeck

@kpoeppel
Copy link
Collaborator

@Cram3r95 I think you have the wrong approach here, the size 4 in your example above is already considered the batch size, as the heads are only internal and not exposed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants