The generated python code contains redundancies "]" #575

willwu1984 · 2023-10-17T03:53:54Z

Describe the bug
The generated python code has syntax errors and contains redundancies "]".
# sort array by bubble sort

Information about GPU

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+

Additional context
compose.yaml:

version: '3.5'
services:
  tabby:
    restart: always
    image: tabbyml/tabby
    command: serve --model TabbyML/CodeLlama-13B --device cuda
    volumes:
      - "$HOME/.tabby:/data"
    ports:
      - 8080:8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

VS Code Version: 1.83.1

The text was updated successfully, but these errors were encountered:

icycodes · 2023-10-17T05:55:23Z

Hi, @willwu1984,

I am trying to reproduce the issue. However, I'm unsure about the position where the completion is triggered.
Could you please provide a screenshot that shows the ghost text? Alternatively, if you could share the server log, it would be helpful. You can locate the server log at ~/.tabby/events/{date}.json.

willwu1984 · 2023-10-17T06:27:54Z

The ghost text screenshot:

server log: 2023-10-17.json

wsxiaoys · 2023-10-17T06:33:20Z

Could you please provide the version information? You can obtain it by executing the following command: curl -X POST http://localhost:8080/v1/health

Based on the logs, it appears that version is not the latest release (v0.3.0). It is advisable to test it with the latest release (tabbyml/tabby:0.3.0) to check if the issue still persists

willwu1984 · 2023-10-17T07:14:31Z

I have upgrade the image, but the problem already exist.

{"model":"TabbyML/CodeLlama-13B","device":"cuda","arch":"x86_64","cpu_info":"Intel(R) Xeon(R) Platinum 8338C CPU @ 2.60GHz","cpu_count":16,"cuda_devices":["NVIDIA A30"],"version":{"build_date":"2023-10-14","build_timestamp":"2023-10-14T18:53:34.992103202Z","git_sha":"00c91854884ac735ddd1c9db855cb17624ec92ec","git_describe":"v0.3.0"}}

企业微信截图_bc65fa17-f1bd-4f4d-8010-ce943503bc07

The server log:
2023-10-17.json

icycodes · 2023-10-17T11:15:53Z

The server log:
2023-10-17.json

It seems that after updating the server (begin at line 56 of the server log), the two completion requests resulted in empty responses. The screenshot may show the cached completion on the client side.

@wsxiaoys
And I reproduced this issue on my local environment by:
Request:

curl -X 'POST' \
  'http://localhost:8080/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "language": "python",
      "segments": {
        "prefix": "# find max element in array\ndef find_max(arr):\n    max_element = arr[0]\n    for i in range(1, len(arr)) :\n        if arr[i] > max_element :\n            max_element = arr[i]\n    return max_element\n\n# sort array by bubble sort\ndef",
        "suffix": "\n\narr = [3, 4, 0, 9]\nprint(find_max(arr))"
      }
}'

Response:

{"id":"cmpl-5316a07a-c334-4525-8627-38dfee21b672","choices":[{"index":0,"text":" bubble_sort(arr):\n    for i in range(len(arr)) :\n        for j in range(len(arr)) :\n            if arr[j] > arr[j + 1]] :\n                arr[j], arr[j + 1]] = arr[j + 1]]], arr[j]]\n    return arr"}]}

Server state:

{"model":"/data/models/TabbyML/CodeLlama-13B","chat_model":"/data/models/TabbyML/Mistral-7B","device":"cuda","arch":"x86_64","cpu_info":"13th Gen Intel(R) Core(TM) i7-13700KF","cpu_count":24,"cuda_devices":["NVIDIA GeForce RTX 4090"],"version":{"build_date":"2023-10-14","build_timestamp":"2023-10-14T02:15:20.128129335Z","git_sha":"3dd4233dd79e8375288bdd3bda6634ee784ebcd2","git_describe":"v0.3.0"}}

Others:

Ubuntu 22.04.3 LTS
Linux 5.15.0-86-generic
Docker version 24.0.6, build ed223bc

wsxiaoys · 2023-10-17T17:54:59Z

The model simply generates the wrong output. Surprisingly, this only happens on the ctranslate2 inference engine, though.

This isn't something that can be fixed immediately from tabby's side. However, over the long run, we might consider implementing grammar constraint sampling (ggerganov/llama.cpp#1773) to eliminate cases like this.

willwu1984 · 2023-10-19T05:45:14Z

@wsxiaoys How can I configure to use ggml models?

wsxiaoys · 2023-10-19T06:18:25Z

The ggml (llama.cpp) inference engine is exclusively designed for the metal backend. For more detailed information, please visit https://github.com/TabbyML/tabby/blob/main/MODEL_SPEC.md.

Are there more bad cases generated soley from CodeLlama-7B?

willwu1984 · 2023-10-19T09:01:34Z

So far we've found that most of the code that contains arrays has problems, and the language isn't limited to python, javascript also behaves the same way. By the way, llama.cpp also support cuda env. Is it possible to add configuration usage options?

wsxiaoys · 2023-10-19T14:40:53Z

If this duplication occurs in more than just this simple case, I would say it's likely a bug - Let us investigate it further and get back to you.

In the meantime, if you come across such cases, please consider posting a screenshot or log record to this thread. It would be very helpful for us to debug and pinpoint the issue. Thank you!

wsxiaoys · 2023-11-06T19:23:54Z

in 0.5.0 we've fully switched to gguf for cuda - this should fixed the issue. Hi @willwu1984 could you test it?

willwu1984 · 2023-11-09T09:24:59Z

@wsxiaoys This issue has been fixed using version v0.5.4. Great, thank you!

willwu1984 added the bug Something isn't working label Oct 17, 2023

wsxiaoys self-assigned this Oct 17, 2023

wsxiaoys closed this as completed Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The generated python code contains redundancies "]" #575

The generated python code contains redundancies "]" #575

willwu1984 commented Oct 17, 2023

icycodes commented Oct 17, 2023

willwu1984 commented Oct 17, 2023

wsxiaoys commented Oct 17, 2023 •

edited

Loading

willwu1984 commented Oct 17, 2023

icycodes commented Oct 17, 2023 •

edited

Loading

wsxiaoys commented Oct 17, 2023

willwu1984 commented Oct 19, 2023

wsxiaoys commented Oct 19, 2023

willwu1984 commented Oct 19, 2023

wsxiaoys commented Oct 19, 2023

wsxiaoys commented Nov 6, 2023 •

edited

Loading

willwu1984 commented Nov 9, 2023

The generated python code contains redundancies "]" #575

The generated python code contains redundancies "]" #575

Comments

willwu1984 commented Oct 17, 2023

icycodes commented Oct 17, 2023

willwu1984 commented Oct 17, 2023

wsxiaoys commented Oct 17, 2023 • edited Loading

willwu1984 commented Oct 17, 2023

icycodes commented Oct 17, 2023 • edited Loading

wsxiaoys commented Oct 17, 2023

willwu1984 commented Oct 19, 2023

wsxiaoys commented Oct 19, 2023

willwu1984 commented Oct 19, 2023

wsxiaoys commented Oct 19, 2023

wsxiaoys commented Nov 6, 2023 • edited Loading

willwu1984 commented Nov 9, 2023

wsxiaoys commented Oct 17, 2023 •

edited

Loading

icycodes commented Oct 17, 2023 •

edited

Loading

wsxiaoys commented Nov 6, 2023 •

edited

Loading