Setup guide for smooth-brain noob dummies from base installation of ubuntu to working inference #2900

iculverr · 2023-02-24T09:06:45Z

iculverr
Feb 24, 2023

Description: I am running on Ubuntu (the latest distro available from Feb 24 2023) with an RTX3090. I want to load up for example GPT-2 or GPT-3 for simple text generation to get things running so that I can then start experimenting and learning. There is no better way to start than to have a working product that you can modify to figure out what the functions do. The current "quick demo" does not provide a clear explanation from beginning to end for beginners on what commands to input and why, where to input them, how to get the output, and what the expected output would look like.

What I have done so far is FIRST git clone the ColossalAI repository and I also installed the ColossalAI with pip install colossalai.

THEN I input colossalai run --nproc_per_node 1 train.py and it points to /home/dev/ and cannot find a train.py, I imagine I need to run the command within the specific directory of resnet or the examples, but nowhere does it say this. Now that I look at the repository it says that it is deprecated and archived. So why does the "quick demo" point to it? How do I quickly demo?

Location: When I test out the "quick demo" from https://colossalai.org/docs/get_started/run_demo and the information at: https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/resnet

The README.md within /ColossalAI/examples/tutorial tells me to use a conda virtual environment but it does not say this on the main ColossalAI installation page nor on the colossalai.org website

After digging through the files I found the README.md file in /home/dev/ColossalAI/examples/tutorial/auto_parallel
and the README.md file in /home/dev/ColossalAI/examples/tutorial/opt/opt that contains something more like what I am looking for. But WHY is it not on the front page of the github??? Why is it not in the "quick demo" website?

https://github.com/hpcaitech/ColossalAI#Installation

Expectations:
What I am looking for is a step by step full page commands that I can input into the Linux terminal sequentially from base installation to running my first model and getting an output for text generation. The exact model doesn't matter too much, anything GPT-like or BERT-like or BLOOM-like will do.
Areas of future guidance would also be helpful after this first demo works. For example, if I GPT-2 or GPT-3 is successfully loaded and I wanted to branch into making a Chat GPT replica or something with more persistent memory describe conceptually the actions I would need to take in terms of scripting, directory tinkering, and commands of interest.
I expect that the tutorial contains a list of all the installation requirements necessary to load and run the model, the front page says something about NOT install transformers or pytorch at the beginning which is also confusing because it is suggested that these (or the CUDA files) are loaded when colossal ai is loaded??

Screenshots:
image

image

What is the purpose of the Docker? Do I need it to run inference? It is not mentioned anywhere in the tutorial!

Suggestions:

What I am looking for is a step by step full page commands that I can input into the Linux terminal sequentially from base installation to running my first model and getting an output that is explained like I am five years old and smooth brained.

iculverr · 2023-02-24T17:34:14Z

iculverr
Feb 24, 2023
Author

documenting troubleshooting

1 reply

iculverr Feb 24, 2023
Author

but cuda 11.7 is installed, other installed versions are likely causing conflicts

iculverr · 2023-02-25T05:09:59Z

iculverr
Feb 25, 2023
Author

installing the docker image

docker problems

colossalai check -i gives this:

0 replies

iculverr · 2023-02-25T05:17:42Z

iculverr
Feb 25, 2023
Author

the absolute state of my driver library

2 replies

iculverr Feb 25, 2023
Author

problem spotted

iculverr Feb 25, 2023
Author

unsupported distro????

Why?
Hmm
Somethin?

this is why

type
open /etc/apt/sources.list.d/nvidia-docker.list
and enter
deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64 /
deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64 /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 /

iculverr · 2023-02-26T16:26:42Z

iculverr
Feb 26, 2023
Author

when trying to load the 6.7b model with:
python opt_fastapi.py opt-6.7b --checkpoint /data/opt-6.7b

fails to find checkpoint file despite downloading something earlier
WHICH directory should the checkpoint file be in?
HOW do you import the checkpoint file to the location?
Is the checkpoint file not installed via docker pull?
If so, why is it not recognized?
If it is not, where and how do you install them?

0 replies

iculverr · 2023-02-26T17:09:07Z

iculverr
Feb 26, 2023
Author

Can successfully load opt6.7b but NOT with the checkpoint flag enabled, why is this?

manual installation attempt:

max length is limited to 256 despite changing maxlength (le) in the opt_fastapi.py and opt_server.py files

it is not made clear where to change this, but when inspecting the openapi.json, the numbers have not changed to reflect opt_fastapi.py or opt_server.py

the models.py file in the .../site-packages/fastapi/openapi
suggests that it has something to do with the class Schema(BaseModel)

ghetto workaround is to change it to 1024 but I don't know where the BaseModels are stored in the colossalapi pipeline

0 replies

iculverr · 2023-02-26T17:09:07Z

iculverr
Feb 26, 2023
Author

Can successfully load opt6.7b but NOT with the checkpoint flag enabled, why is this?

manual installation attempt:

max length is limited to 256 despite changing maxlength (le) in the opt_fastapi.py and opt_server.py files

it is not made clear where to change this, but when inspecting the openapi.json, the numbers have not changed to reflect opt_fastapi.py or opt_server.py

the models.py file in the .../site-packages/fastapi/openapi
suggests that it has something to do with the class Schema(BaseModel)

ghetto workaround is to change it to 1024 but I don't know where the BaseModels are stored in the colossalapi pipeline

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup guide for smooth-brain noob dummies from base installation of ubuntu to working inference #2900

{{title}}

Replies: 6 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Setup guide for smooth-brain noob dummies from base installation of ubuntu to working inference #2900

iculverr Feb 24, 2023

Replies: 6 comments · 3 replies

iculverr Feb 24, 2023 Author

iculverr Feb 24, 2023 Author

iculverr Feb 25, 2023 Author

iculverr Feb 25, 2023 Author

iculverr Feb 25, 2023 Author

iculverr Feb 25, 2023 Author

iculverr Feb 26, 2023 Author

iculverr Feb 26, 2023 Author

iculverr Feb 26, 2023 Author

iculverr
Feb 24, 2023

Replies: 6 comments 3 replies

iculverr
Feb 24, 2023
Author

iculverr Feb 24, 2023
Author

iculverr
Feb 25, 2023
Author

iculverr
Feb 25, 2023
Author

iculverr Feb 25, 2023
Author

iculverr Feb 25, 2023
Author

iculverr
Feb 26, 2023
Author

iculverr
Feb 26, 2023
Author

iculverr
Feb 26, 2023
Author