Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Add Groq to model providers #1432

Closed
1 task done
Palvr opened this issue Jul 8, 2024 · 1 comment
Closed
1 task done

[Feature Request]: Add Groq to model providers #1432

Palvr opened this issue Jul 8, 2024 · 1 comment
Labels

Comments

@Palvr
Copy link

Palvr commented Jul 8, 2024

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

No response

Describe the feature you'd like

Add Groq to model providers

Describe implementation you've considered

Just like Ollama or XInference are used to manage the load of models

Documentation, adoption, use case

No response

Additional information

No response

@JinHai-CN
Copy link
Contributor

#1447

@KevinHuSh KevinHuSh mentioned this issue Jul 10, 2024
27 tasks
KevinHuSh added a commit that referenced this issue Jul 12, 2024
#1432  #1447 
This PR adds support for the GROQ LLM (Large Language Model).

Groq is an AI solutions company delivering ultra-low latency inference
with the first-ever LPU™ Inference Engine. The Groq API enables
developers to integrate state-of-the-art LLMs, such as Llama-2 and
llama3-70b-8192, into low latency applications with the request limits
specified below. Learn more at [groq.com](https://groq.com/).
Supported Models


| ID | Requests per Minute | Requests per Day | Tokens per Minute |

|----------------------|---------------------|------------------|-------------------|
| gemma-7b-it | 30 | 14,400 | 15,000 |
| gemma2-9b-it | 30 | 14,400 | 15,000 |
| llama3-70b-8192 | 30 | 14,400 | 6,000 |
| llama3-8b-8192 | 30 | 14,400 | 30,000 |
| mixtral-8x7b-32768 | 30 | 14,400 | 5,000 |

---------

Co-authored-by: paresh0628 <[email protected]>
Co-authored-by: Kevin Hu <[email protected]>
@yingfeng yingfeng mentioned this issue Aug 6, 2024
61 tasks
Halfknow pushed a commit to Halfknow/ragflow that referenced this issue Nov 11, 2024
infiniflow#1432  infiniflow#1447 
This PR adds support for the GROQ LLM (Large Language Model).

Groq is an AI solutions company delivering ultra-low latency inference
with the first-ever LPU™ Inference Engine. The Groq API enables
developers to integrate state-of-the-art LLMs, such as Llama-2 and
llama3-70b-8192, into low latency applications with the request limits
specified below. Learn more at [groq.com](https://groq.com/).
Supported Models


| ID | Requests per Minute | Requests per Day | Tokens per Minute |

|----------------------|---------------------|------------------|-------------------|
| gemma-7b-it | 30 | 14,400 | 15,000 |
| gemma2-9b-it | 30 | 14,400 | 15,000 |
| llama3-70b-8192 | 30 | 14,400 | 6,000 |
| llama3-8b-8192 | 30 | 14,400 | 30,000 |
| mixtral-8x7b-32768 | 30 | 14,400 | 5,000 |

---------

Co-authored-by: paresh0628 <[email protected]>
Co-authored-by: Kevin Hu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants