-
-
Notifications
You must be signed in to change notification settings - Fork 77
08. Sampling
Samplers are used to alter raw probabilities during response generations. Users can tune these to adjust what outputs they get.
Note
Sampling is not a catch-all solution if your generations are behaving the wrong way! These factors can also fall to the prompt, frontend, model, etc. Please do not set arbitrary sampler values without understanding what they do first!
Repetition Penalty -
-
API request field:
repetition_penalty
-
Default:
1.0
- Off -
Description: Multiplicative method of preventing repetition of previous tokens in the context.
Frequency Penalty -
-
API request field:
frequency_penalty
-
Default:
0.0
- Off -
Description: A constant value added each time each time a token is sampled, reducing the probability for that specific token.
Presence Penalty -
-
API request field:
presence_penalty
-
Default:
0.0
- Off -
Description: Additive method of preventing repetition of previous tokens in the context. Encourages new ideas to get generated. Unlike frequency penalty, this is a one-off application. tldr; repetition penalty, but additive.
Penalty Range -
Note
Unlike other backends, 0
disables penalties entirely!
-
API Request:
penalty_range
orrepetition_range
orrepetition_penalty_range
-
Default:
-1
-
When frequency OR presence penalty is enabled, a penalty_range value of
-1
applies the penalty to only the output tokens. A lower range is advised. -
Otherwise a penalty range value of
-1
= max sequence length
-
-
Description: Amount of tokens to look behind when applying penalties.
- For frequency and presence penalty, this should be a low value to avoid "backing the model into a corner" when selecting similar tokens, resulting in large amounts of synonym repeats (aka "thesaurus mode").
Top-P -
-
API request field:
top_p
-
Default:
1.0
- Off
Min-P -
-
API request field:
min_p
-
Default:
0.0
- Off
Top-K -
-
API request field:
top_k
-
Default:
0.0
- Off
Top-A -
-
API request field:
top_a
-
Default:
0.0
- Off
Temperature -
-
API request field:
temperature
-
Default:
1.0
- Off -
Description: A constant value applied to softmax calculation. A higher temperature = more randomness when choosing the next token.
Temp last -
-
API request field:
temp_last
-
Default:
false
- Off -
Description: Places temperature application last in the sampling stack. Necessary for min-P sampling.
Typical -
-
API request field:
typical
-
Default:
1.0
- Off
Tail-free Sampling -
-
API request field:
tfs
-
Default:
1.0
- Off
Logit bias -
-
API request field:
logit_bias
-
Default:
None
- Off -
Example:
[{"1": 50}, {"2": 75}]
- An array of bias objects -
Description: Adds a positive or negative value to change the occurrence of a specific token. Format:
{"token": bias}
where bias is from-100
to100
.
Mirostat mode -
-
API request field:
mirostat_mode
-
Default:
0
- Off- Exllamav2 only applies mirostat when
mirostat mode = 2
- Exllamav2 only applies mirostat when
Mirostat tau -
-
API request field:
mirostat_tau
-
Default:
1.5
- Off unless mirostat_mode = 2
Mirostat eta -
-
API request field:
mirostat_eta
-
Default:
0.1
- Off unless mirostat_mode = 2