How to run the code with a certain batch? #53

Cram3r95 · 2024-10-10T14:49:37Z

This code is working:

import torch
import pdb

from xlstm import (
    xLSTMBlockStack,
    xLSTMBlockStackConfig,
    mLSTMBlockConfig,
    mLSTMLayerConfig,
    sLSTMBlockConfig,
    sLSTMLayerConfig,
    FeedForwardConfig,
)

cfg = xLSTMBlockStackConfig(
    mlstm_block=mLSTMBlockConfig(
        mlstm=mLSTMLayerConfig(
            conv1d_kernel_size=4, qkv_proj_blocksize=4, num_heads=4
        )
    ),
    slstm_block=sLSTMBlockConfig(
        slstm=sLSTMLayerConfig(
            backend="cuda",
            num_heads=4,
            conv1d_kernel_size=4,
            bias_init="powerlaw_blockdependent",
        ),
        feedforward=FeedForwardConfig(proj_factor=1.3, act_fn="gelu"),
    ),
    context_length=256,
    num_blocks=7,
    embedding_dim=128,
    slstm_at=[1],

)

xlstm_stack = xLSTMBlockStack(cfg)

x = torch.randn(4, 256, 128).to("cuda")
xlstm_stack = xlstm_stack.to("cuda")
y = xlstm_stack(x)
pdb.set_trace()
y.shape == x.shape

But the network continously reports error if you try to add a batch size to the input, e.g.:

x = torch.randn(32, 4, 256, 128).to("cuda") # (where 32 is the batch size)

You get the following error:

File "/home/carlosgomezh/.local/lib/python3.10/site-packages/xlstm/blocks/mlstm/layer.py", line 102, in forward
B, S, _ = x.shape
ValueError: too many values to unpack (expected 3)

In your case it is a backbone processing a single tensor.

Is it possible to process something like this:

if __name__ == "__main__":
    # Define model hyperparameters
    input_dim = 6  
    hidden_dim = 128  
    output_dim = 1  
    num_layers = 2  
    context_length = 10  

    # Instantiate the model
    model = xLSTM(input_dim, hidden_dim, output_dim, num_layers, context_length).to('cuda')
    
    # Print the model structure
    print(model)
    
    # Example dummy input (batch_size=32, sequence_length=10, input_dim=6)
    dummy_input = torch.randn(32, context_length, input_dim).to('cuda')
    
    # Forward pass through the model
    output = model(dummy_input)
    print(output.shape)

Where you have 6 inputs, the h_dim of the network is 128 (for example), output dim is 1, and the context length is 10? Obviously 32 represents the batch size.

If I run that code, I get the following error:

File "/home/carlosgomezh/.local/lib/python3.10/site-packages/torch/nn/functional.py", line 2573, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[128], expected input with shape [*, 128], but got input of size[32, 10, 6]

@kpoeppel @maximilianmbeck

The text was updated successfully, but these errors were encountered:

kpoeppel · 2024-10-15T08:37:12Z

@Cram3r95 I think you have the wrong approach here, the size 4 in your example above is already considered the batch size, as the heads are only internal and not exposed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run the code with a certain batch? #53

How to run the code with a certain batch? #53

Cram3r95 commented Oct 10, 2024 •

edited

Loading

kpoeppel commented Oct 15, 2024

How to run the code with a certain batch? #53

How to run the code with a certain batch? #53

Comments

Cram3r95 commented Oct 10, 2024 • edited Loading

kpoeppel commented Oct 15, 2024

Cram3r95 commented Oct 10, 2024 •

edited

Loading