Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cost_estimate.py #1810

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Fix cost_estimate.py #1810

wants to merge 1 commit into from

Conversation

xksteven
Copy link
Contributor

@xksteven xksteven commented May 8, 2024

Fixes several bugs to get cost_estimate.py working again. Bugs include:

python scripts/cost_estimate.py 
...
  File "/data/steven_test/workspace/lm-evaluation-harness/scripts/cost_estimate.py", line 91, in main
    task_dict={taskname: tasks.get_task(taskname)()},
AttributeError: module 'lm_eval.tasks' has no attribute 'get_task'
Traceback (most recent call last):
...
  File "/data/steven_test/workspace/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
TypeError: simple_evaluate() got an unexpected keyword argument 'lm'
Traceback (most recent call last):
...
  File "/data/steven_test/workspace/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
TypeError: simple_evaluate() got an unexpected keyword argument 'task_dict'
  File "/data/steven_test/workspace/lm-evaluation-harness/scripts/cost_estimate.py", line 24, in loglikelihood
    for ctx, cont in requests:
TypeError: cannot unpack non-iterable Instance object

Needed to add

res.append(-random.random())

Otherwise wikitext evaluation fails and crashes the program.

@CLAassistant
Copy link

CLAassistant commented May 8, 2024

CLA assistant check
All committers have signed the CLA.

Fixes several bugs to get cost_estimate.py working again.  Bugs include:

```
python scripts/cost_estimate.py 
...
  File "/data/steven_test/workspace/lm-evaluation-harness/scripts/cost_estimate.py", line 91, in main
    task_dict={taskname: tasks.get_task(taskname)()},
AttributeError: module 'lm_eval.tasks' has no attribute 'get_task'
```

```
Traceback (most recent call last):
...
  File "/data/steven_test/workspace/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
TypeError: simple_evaluate() got an unexpected keyword argument 'lm'
```

```
Traceback (most recent call last):
...
  File "/data/steven_test/workspace/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
TypeError: simple_evaluate() got an unexpected keyword argument 'task_dict'
```

```
  File "/data/steven_test/workspace/lm-evaluation-harness/scripts/cost_estimate.py", line 24, in loglikelihood
    for ctx, cont in requests:
TypeError: cannot unpack non-iterable Instance object
```
@sepiatone
Copy link
Contributor

@xksteven @haileyschoelkopf @lintangsutawika I wonder whether lm-eval can usefully provide this costing functionality.

  1. Models do not all use the same tokenizer. OpenAI models use the tiktoken library, for Claude 3, Anthropic has not yet publicly provided a tokenization libarary, Cohere uses it's own etc.
  2. Pricing information is frequently updated. Till date it's been a downward revision so that may be less of a problem (unless it stops someone from running an eval :)) Amusingly, the cost associated with the models reported by the cost estimator are double or three times the final cost at the time of their deprecation.
  3. Even from the same organization, every model has a slightly different cost structure. For example LangChain has to manage this just for Open AI.

Perhaps it may be better to consider this functionality as a token count estimator (rather than a token cost estimator) and users could use it to determine their cost from the current pricing of the model they intend to use. Yes, this still runs up against point 1, but still as an estimate it may be useful.

@xksteven
Copy link
Contributor Author

xksteven commented May 9, 2024

@sepiatone I am not sure I have enough time to make a proper abstraction from this into what you're suggesting.

I wanted the base code to work to then be able to edit the model pricing myself to get an estimate of the cost. After doing that I made the PR to just get the previous code working again. I'm happy to have someone else take over to convert it into a general cost estimator for running different models.

Also add a personal preference I like small PRs as they're easier to review.
As an aside once the code is working it's not difficult to swap out the pricing or the tokenizer either to whatever custom one you'd like.

@sepiatone
Copy link
Contributor

@sepiatone I am not sure I have enough time to make a proper abstraction from this into what you're suggesting.

Perhaps I should've been clearer - I'm not suggesting anything specific with this PR other than what has already been done. What is being done in this PR is required whatever further direction is taken.


Perhaps it's better to take this discussion (having a token count estimator) to a separate issue.

@StellaAthena
Copy link
Member

StellaAthena commented May 16, 2024

@xksteven @sepiatone At the time we wrote this code, there was just the OpenAI ada/babbage/currie/devinci APIs and computing the costs was quite easy. Now this is a much bigger lift and I think that it would probably make sense to delete it because anything that works would need to be very bespoke and can be broken by policy changes at companies.

Maybe we can add a warning somewhere that says "this might be really expensive, enter y if you are sure you want to do it" whenever someone runs a sufficiently large eval task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants