Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The generated python code contains redundancies "]" #575

Closed
willwu1984 opened this issue Oct 17, 2023 · 12 comments
Closed

The generated python code contains redundancies "]" #575

willwu1984 opened this issue Oct 17, 2023 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@willwu1984
Copy link

Describe the bug
The generated python code has syntax errors and contains redundancies "]".
# sort array by bubble sort
企业微信截图_f7995034-a623-4cef-936b-2884d56b7d6f

Information about GPU

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+

Additional context
compose.yaml:

version: '3.5'
services:
  tabby:
    restart: always
    image: tabbyml/tabby
    command: serve --model TabbyML/CodeLlama-13B --device cuda
    volumes:
      - "$HOME/.tabby:/data"
    ports:
      - 8080:8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

VS Code Version: 1.83.1

@willwu1984 willwu1984 added the bug Something isn't working label Oct 17, 2023
@icycodes
Copy link
Member

Hi, @willwu1984,

I am trying to reproduce the issue. However, I'm unsure about the position where the completion is triggered.
Could you please provide a screenshot that shows the ghost text? Alternatively, if you could share the server log, it would be helpful. You can locate the server log at ~/.tabby/events/{date}.json.

@willwu1984
Copy link
Author

The ghost text screenshot:
企业微信截图_14cb23ae-8a3e-4d0d-9f21-59bc54935b75
server log: 2023-10-17.json

@wsxiaoys
Copy link
Member

wsxiaoys commented Oct 17, 2023

Could you please provide the version information? You can obtain it by executing the following command: curl -X POST http://localhost:8080/v1/health

Based on the logs, it appears that version is not the latest release (v0.3.0). It is advisable to test it with the latest release (tabbyml/tabby:0.3.0) to check if the issue still persists

@willwu1984
Copy link
Author

I have upgrade the image, but the problem already exist.

{"model":"TabbyML/CodeLlama-13B","device":"cuda","arch":"x86_64","cpu_info":"Intel(R) Xeon(R) Platinum 8338C CPU @ 2.60GHz","cpu_count":16,"cuda_devices":["NVIDIA A30"],"version":{"build_date":"2023-10-14","build_timestamp":"2023-10-14T18:53:34.992103202Z","git_sha":"00c91854884ac735ddd1c9db855cb17624ec92ec","git_describe":"v0.3.0"}}
企业微信截图_bc65fa17-f1bd-4f4d-8010-ce943503bc07

The server log:
2023-10-17.json

@icycodes
Copy link
Member

icycodes commented Oct 17, 2023

The server log:
2023-10-17.json

It seems that after updating the server (begin at line 56 of the server log), the two completion requests resulted in empty responses. The screenshot may show the cached completion on the client side.


@wsxiaoys
And I reproduced this issue on my local environment by:
Request:

curl -X 'POST' \
  'http://localhost:8080/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "language": "python",
      "segments": {
        "prefix": "# find max element in array\ndef find_max(arr):\n    max_element = arr[0]\n    for i in range(1, len(arr)) :\n        if arr[i] > max_element :\n            max_element = arr[i]\n    return max_element\n\n# sort array by bubble sort\ndef",
        "suffix": "\n\narr = [3, 4, 0, 9]\nprint(find_max(arr))"
      }
}'

Response:

{"id":"cmpl-5316a07a-c334-4525-8627-38dfee21b672","choices":[{"index":0,"text":" bubble_sort(arr):\n    for i in range(len(arr)) :\n        for j in range(len(arr)) :\n            if arr[j] > arr[j + 1]] :\n                arr[j], arr[j + 1]] = arr[j + 1]]], arr[j]]\n    return arr"}]}

Server state:

{"model":"/data/models/TabbyML/CodeLlama-13B","chat_model":"/data/models/TabbyML/Mistral-7B","device":"cuda","arch":"x86_64","cpu_info":"13th Gen Intel(R) Core(TM) i7-13700KF","cpu_count":24,"cuda_devices":["NVIDIA GeForce RTX 4090"],"version":{"build_date":"2023-10-14","build_timestamp":"2023-10-14T02:15:20.128129335Z","git_sha":"3dd4233dd79e8375288bdd3bda6634ee784ebcd2","git_describe":"v0.3.0"}}

Others:

Ubuntu 22.04.3 LTS
Linux 5.15.0-86-generic
Docker version 24.0.6, build ed223bc

@wsxiaoys
Copy link
Member

The model simply generates the wrong output. Surprisingly, this only happens on the ctranslate2 inference engine, though.

This isn't something that can be fixed immediately from tabby's side. However, over the long run, we might consider implementing grammar constraint sampling (ggerganov/llama.cpp#1773) to eliminate cases like this.

@wsxiaoys wsxiaoys self-assigned this Oct 17, 2023
@willwu1984
Copy link
Author

@wsxiaoys How can I configure to use ggml models?

@wsxiaoys
Copy link
Member

The ggml (llama.cpp) inference engine is exclusively designed for the metal backend. For more detailed information, please visit https://github.com/TabbyML/tabby/blob/main/MODEL_SPEC.md.

Are there more bad cases generated soley from CodeLlama-7B?

@willwu1984
Copy link
Author

So far we've found that most of the code that contains arrays has problems, and the language isn't limited to python, javascript also behaves the same way. By the way, llama.cpp also support cuda env. Is it possible to add configuration usage options?

@wsxiaoys
Copy link
Member

If this duplication occurs in more than just this simple case, I would say it's likely a bug - Let us investigate it further and get back to you.

In the meantime, if you come across such cases, please consider posting a screenshot or log record to this thread. It would be very helpful for us to debug and pinpoint the issue. Thank you!

@wsxiaoys
Copy link
Member

wsxiaoys commented Nov 6, 2023

in 0.5.0 we've fully switched to gguf for cuda - this should fixed the issue. Hi @willwu1984 could you test it?

@willwu1984
Copy link
Author

@wsxiaoys This issue has been fixed using version v0.5.4. Great, thank you!

@wsxiaoys wsxiaoys closed this as completed Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants