llm.net

A while back I made a nerual network from scratch because I was curious on the kind of black magic they are. Now that got me started on the track of recreating some common machine learning structures and so here we have my implementation of a Transformer.

Its worth mentioning, I've leveraged pytorch here to handle most of matrix operations for me which I avoided in my neural network implementation.

Before one comes to the conclusion, yes the project should be named transformer.net but llm.net sounds cooler so I'm keeping it.

To keep true to the name however, the main.py file implements the transformer to create a gpt. Realistically it could be used to create any sort of machine learning utility based of a transformer.

Technicalities

Creating a Transformer

#import statement
from src.core.transformer import Transformer

#create a transformer
transformer = Transformer(vector_in, block_count, attention_space, ml_perceptron_space)

vector_in: The length of a 1D token vector after embedding, a tensor input is a sequence of such 1D embeddings
block_count: A block is defined as an attention block + multilayer perceptron block. this parameter controls the number of blocks in the transformer
attention_space: The dimensions of the attention vector space created in attention blocks of the transformer
ml_perceptron_space: Refers to the number of tokens in the input tensor (or the number of 1D embeddings in a single input tensor)

Training

transformer.train(input_tensor, expected)

input_tensor: A tensor representing the vector inputs to the transformer
expected: A tensor representing the expected output in a normalized row matrix.

Testing

result = transformer.test(input_tensor)

input_tensor: A pytorch tensor represing the vector inputs to the transformer

So whats happening?

Now heres the thing, I dont have the resources to train a model that produces anything especially considering my track record of starting projects a day before the deadline. Regardless I'm gonna try to explain what goes on under the hood of a transformer so it makes sense to someone using this/going through my code.

Embeddings

Embeddings refer to vectors (usually of a higher dimension) that are fed into a transformer. Essentially we convert human readable input into a bunch of numbers

'hello` -> [104, 101, 108, 108, 111]

of course in the above example I've simply used the ASCII values, but that gets the point across. Essentially we take input and tokenise it (the proces of splitting into smaller sub parts or tokens) and embed those tokens as vectors of higher dimensions. This is done through a transformer or a nerual network. Usually though, this requires some input from a programmer (well to gather data)

Attention Block

The attention block of a transformer is what handles context. think of it in terms of 3 matrices

The Query Matrix
The Key Matrix
The Value matrix

-> Query Matrix

This is a matrix that encodes all sorts of relevant queries as, you guessed it, vectors! So consider the word dog. We can describe this using a bunch of words: big, small, cute, adorable etc etc. A query matrix stores this query as a matrix: essentially we multiply an embedding with the matrix and obtain the result of the query, we do this for all the embeds to see how all the embeds are affected by the queries.

In practice a query matrix "stores" mutiples questions within it. These can include "am I golden" or "do I have fur"?

-> Key Matrix

The matrix product of the key matrix and an embed gives us the answer to our queries. the key matrix helps an embed answer our query.

queries = query_matrix * tensor_in
keys  = key_matrix * tensor_in

now we have queries and answers but we need to map out which queries correspond to which answers. This is done by taking the dot product between queries and keys. The vector dot product tells us how much of one vector does another vector contain.

Consider the query Am I golden. The key for embed retriever should have a higher correspondance to this query compared to the key for embed pug. In essence our dot product would be higher for the key of the embed golden than it would be for pug.

This is what transformers define as context, or basically this is how transformers obtain context.

As a good measure we take softmax of the dot product to ensure that these values are normalized.

-> Value Matrix

The value matrix is a matrix is used to amplify the effect of context. Since the any dot product after soft max can have a maximum value of 1, we use the value matrix to make context "more intense".

Now my implementation is slighly different. I dont allow embeds to gain context from embeds that follow it (just a procedure that lets me reuse training examples). So I set dot to an upper triangular matrix. I also split the value matrix into two matrices: value_up and value_down.

queries = query_matrix * tensor_in
keys  = key_matrix * tensor_in

dot = Tensor.softmax(Tensor.upper(Tensor.dot(keys, queries)))
result = input + value_matrix * input * dot
return result

Mutlilayer Perceptron

The multilayer perceptron follows the attention layer, and here embeds do not communicate with each other. So they obtain no context from each other.

result = Tensor.relu(up_project * input) * down_projection

This is the code implementation of an MLP. Its quite straight forward even though theres no way of explaining what it does exactly. Essentially this layer helps embeds stand on their own. Suppose after contexualising, the embed dog maps out to large dog of golden colour as a result of the context. We would want this to map to is known as a golden retriever. Thats where an MLP comes in. It encodes context that is based on certain facts.

And well thats it. generally we have multiple layers of attention and MLP but each of them provide small bits of context until we obtain an output vector that hopefully encodes the most appropriate response to our input. We then de-embedify this vector and present it to the user.

I hope that made sense! also thanks for checking out my project :D

Have a good day!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
src		src
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm.net

Technicalities

So whats happening?

Embeddings

Attention Block

-> Query Matrix

-> Key Matrix

-> Value Matrix

Mutlilayer Perceptron

About

Releases

Packages

Languages

License

kunrex/llm.net

Folders and files

Latest commit

History

Repository files navigation

llm.net

Technicalities

So whats happening?

Embeddings

Attention Block

-> Query Matrix

-> Key Matrix

-> Value Matrix

Mutlilayer Perceptron

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages