Skip to content

Commit

Permalink
remove skip norms
Browse files Browse the repository at this point in the history
  • Loading branch information
shubhambhokare1 committed Jan 30, 2025
1 parent bbd33c3 commit c590aee
Show file tree
Hide file tree
Showing 134 changed files with 5 additions and 1,605 deletions.
129 changes: 1 addition & 128 deletions docs/Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -29059,7 +29059,7 @@ This version of the operator has been available since version 23 of the default

<dl>
<dt><tt>X</tt> : T</dt>
<dd>The output of the layer for which the skip connection is being created. In general, the shape is (N, C, D1, D2, ... , Dn) for n-dimensional data, where D1 to Dn are the spatial dimension sizes and N is the batch size, C is the number of channels. The root mean squared norm is taken over the last D dimensions, D is determined by the axis attribute.</dd>
<dd>The input tensor to be normalized. In general, the shape is (N, C, D1, D2, ... , Dn) for n-dimensional data, where D1 to Dn are the spatial dimension sizes and N is the batch size, C is the number of channels. The root mean squared norm is taken over the last D dimensions, D is determined by the axis attribute.</dd>
<dt><tt>scale</tt> : V</dt>
<dd>Scale tensor. Scale tensor shape should be broadcastable to the normalized shape ([axis, .., Dn]).</dd>
</dl>
Expand Down Expand Up @@ -29404,133 +29404,6 @@ This version of the operator has been available since version 23 of the default
<dd>Constrain output to int64 tensor, which should be a scalar though.</dd>
</dl>

### <a name="SkipLayerNormalization-23"></a>**SkipLayerNormalization-23**</a>

Applies LayerNormalization to an expanded skip connection as described in the paper https://arxiv.org/pdf/2105.07205v1
The expanded skip connection is defined as follows:
```
xSkip = (scaling_factor * input) + F(input) + Bias
```
where,
F(input): denotes the output of a particular layer.
scaling_factor: a modulating scalar that adjusts the importance of the skip.
Bias: a bias term added to the output of the skip connection.

LayerNorm is then applied to xSkip as follows:
```
output = LayerNormalization(xSkip)
```

#### Version

This version of the operator has been available since version 23 of the default ONNX operator set.

#### Attributes

<dl>
<dt><tt>axis</tt> : int (default is -1)</dt>
<dd>The dimension for layer normalization. If rank(X) is r, axis' allowed range is [-r, r). Negative value means counting dimensions from the back.</dd>
<dt><tt>epsilon</tt> : float (default is 1e-05)</dt>
<dd>The epsilon value to use to avoid division by zero.</dd>
<dt><tt>scaling_factor</tt> : int (default is 1)</dt>
<dd>Modulating scalar by which the skip input is multiplied.</dd>
</dl>

#### Inputs (3 - 5)

<dl>
<dt><tt>X</tt> : T</dt>
<dd>The output of the layer for which the skip connection is being created. In general, the shape is (N, C, D1, D2, ... , Dn) for n-dimensional data, where D1 to Dn are the spatial dimension sizes and N is the batch size, C is the number of channels.</dd>
<dt><tt>S</tt> : T</dt>
<dd>Skip input with same shape as X. This is the input to the layer for which the skip connection is being created.</dd>
<dt><tt>gamma</tt> : T</dt>
<dd>1D tensor representing scale input of layer normalization with shape of the spatial dimension along which layer normalization is applied.</dd>
<dt><tt>beta</tt> (optional) : T</dt>
<dd>1D tensor representing bias input of layer normalization with shape of the spatial dimension along which layer normalization is applied.</dd>
<dt><tt>B</tt> (optional) : T</dt>
<dd>1D bias tensor for the skip connection with shape of the spatial dimension along which layer normalization is applied.</dd>
</dl>

#### Outputs (1 - 2)

<dl>
<dt><tt>Y</tt> : T</dt>
<dd>Output tensor with same shape as X</dd>
<dt><tt>InputSkipBiasSum</tt> (optional) : T</dt>
<dd>Sum of the input and skip inputs (and bias if it exists). Same shape as X</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>T</tt> : tensor(float), tensor(float16)</dt>
<dd>Constrain input and output types to float or half tensors.</dd>
<dt><tt>U</tt> : tensor(float)</dt>
<dd>Constrain mean and inv_std_var to float tensors.</dd>
</dl>

### <a name="SkipRMSNormalization-23"></a>**SkipRMSNormalization-23**</a>

Applies RMSNormalization to an expanded skip connection similar to SkipLayerNormalization
The expanded skip connection is defined as follows:
```
xSkip = (scaling_factor * input) + F(input) + Bias
```
where,
F(input): denotes the output of a particular layer.
scaling_factor: a modulating scalar that adjusts the importance of the skip.
Bias: a bias term added to the output of the skip connection.

RMSNorm is then applied to xSkip as follows:
```
output = RMSNormalization(xSkip)

#### Version

This version of the operator has been available since version 23 of the default ONNX operator set.

#### Attributes

<dl>
<dt><tt>axis</tt> : int (default is -1)</dt>
<dd>The dimension for rms normalization. If rank(X) is r, axis' allowed range is [-r, r). Negative value means counting dimensions from the back.</dd>
<dt><tt>epsilon</tt> : float (default is 1e-05)</dt>
<dd>The epsilon value to use to avoid division by zero.</dd>
<dt><tt>scaling_factor</tt> : int (default is 1)</dt>
<dd>Modulating scalar by which the skip input is multiplied.</dd>
</dl>

#### Inputs (3 - 4)

<dl>
<dt><tt>X</tt> : T</dt>
<dd>The output of the layer for which the skip connection is being created. In general, the shape is (N, C, D1, D2, ... , Dn) for n-dimensional data, where D1 to Dn are the spatial dimension sizes and N is the batch size, C is the number of channels.</dd>
<dt><tt>S</tt> : T</dt>
<dd>Skip input with same shape as X. This is the input to the layer for which the skip connection is being created.</dd>
<dt><tt>gamma</tt> : T</dt>
<dd>1D tensor representing scale input of rms normalization with shape of the spatial dimension along which rms normalization is applied.</dd>
<dt><tt>B</tt> (optional) : T</dt>
<dd>1D bias tensor for the skip connection with shape of the spatial dimension along which rms normalization is applied.</dd>
</dl>

#### Outputs (1 - 2)

<dl>
<dt><tt>Y</tt> : T</dt>
<dd>Output tensor with same shape as X</dd>
<dt><tt>InputSkipBiasSum</tt> (optional) : T</dt>
<dd>Sum of the input and skip inputs (and bias if it exists). Same shape as X</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>T</tt> : tensor(float), tensor(float16)</dt>
<dd>Constrain input and output types to float or half tensors.</dd>
<dt><tt>U</tt> : tensor(float)</dt>
<dd>Constrain mean and inv_std_var to float tensors.</dd>
</dl>

### <a name="Squeeze-23"></a>**Squeeze-23**</a>

Remove single-dimensional entries from the shape of a tensor.
Expand Down
Loading

0 comments on commit c590aee

Please sign in to comment.