Implementing discrete variables #524

fpmw · 2024-01-16T14:45:24Z

fpmw
Jan 16, 2024

All,

In mathematics you can have terms that include the factorial (example: Poisson distribution) or discrete exponents (example: Taylor expansion), and so on. Sometimes you know or suspect that a solution should include a discrete term.

For example, to implement the factorial, you could Include the gamma function, combining with nested constraints and porperly define the complexity of constants and variables.

However, would it be possible to somehow tell PYSR to consider discrete values for a certain variable while others may be continous?

Regards,
Frank

MilesCranmer · 2024-01-16T16:59:12Z

MilesCranmer
Jan 16, 2024
Maintainer

You have a few options. Also check out the following discussion: #469 and #485.

First, the easiest is to write down custom operators that simply map their input to integers before calculation (just make sure they map back to the same type after - using convert(typeof(x), ...)). For example,

    # Define for Julia (for the search)
    unary_operators=["fact(x) = convert(typeof(x), factorial(round(Int, x)))"],
    # Define for SymPy (for exporting to Python)
    extra_sympy_operators={"fact": lambda x: sympy.factorial(sympy.ceiling(x - 0.5))}

would define a factorial that rounds its input to the nearest integer. So that when PySR has continuous values for the constants, it maps them to integers when evaluating.

There is an example of this here: https://astroautomata.com/PySR/examples/#7-julia-packages-and-types for discovering a prime number relationship.

Another option is if you know the constants are going to be drawn from a fairly small set of integers (like, say, 1-10), or well-known constants (e.g., pi) you could simply define these as additional features.

For example:

X = np.random.randn(100, 3)  # Your dataset, say
variable_names = ["a", "b", "c"]  # Your regular variable names

for i in range(1, 11):
    X = np.append(X, np.ones((100, 1)) * i)
    variable_names.append(f"_{i}")  # Give the variable name like _1 for 1, _2 for 2, etc.


# Prevent it from finding regular constants, so it can only use the passed variables:
model.complexity_of_constants = 100

model.fit(X, y, variable_names=variable_names)

2 replies

fpmw Jan 17, 2024
Author

Thank you very much.

I tried variants of custom functions, one given below. However, I remember (I need to check again) the response of something like "cannot apply function round to symbolic expression". Based on your suggestions, I will revisit it.

BTW: I find PYSR one of the best tools out there!

Sympy Mapping:
"myfac": lambda x: factorial(round(x))

Regards,
Frank

MilesCranmer Jan 17, 2024
Maintainer

Oh right, the issue is that round is a python function rather than sympy function. The sympy mappings need to use sympy-compatible functions though. Some python functions work, like + and max, but that is only because python has allowed those operators to be customized and for sympy to overload their behavior. For round this is not true (sadly).

SymPy weirdly doesn't have a round, but it does have a ceiling operation that I know of. So I just do sympy.ceiling(x - 0.5) and it's the same thing.

And thanks!!

fpmw · 2024-01-18T11:54:42Z

fpmw
Jan 18, 2024
Author

Hello Miles,

For values > 21, the Julia function factorial complains about not being able to find a bigger value in a lookup table. The suggestion is to use the function big in combination with factorial. After some playing and tweaking I added the following to the unary_operators and it seems to work. However, I noticed that if you leave out the upperlimit in the defitinition below, compiling the backend takes forever. As if the Julia compiler tries to create an unbounded lookup table if no upper limit was given. But that's me guessing. So, my conclusion is to use some upper limit. I my case 1000 is more than enough.

"myfac(x::T) where {T} = x < 0 || x > 1000 ? T(NaN) : T(factorial(big(convert(Int, round(x)))))"

with the following added to extra_sympy_mappings (import sympy as sp):

"myfac": lambda x: factorial(sp.ceiling(x - 0.5)),

Regards,
Frank

3 replies

MilesCranmer Jan 19, 2024
Maintainer

I think it's probably just computing some factorials which are much too large and this is simply slowing the evaluation to a standstill. I think your solution is a good idea.

If you are using precision=32 (the default), you can actually set the upper bound to 35 rather than 1000. This is because even going beyond that will overflow a Float32:

julia> convert(Float32, factorial(big(35)))
Inf32

MilesCranmer Jan 19, 2024
Maintainer

Float64 seems to have a limit of factorial(171) if you wish to use that instead. In which case you should set precision=64 in the PySR params so it will use Float64 during evaluation.

If needed we can also discuss enabling BigFloat which would give arbitrary precision. I'm happy to point you to where to add it to the codebase if interested.

fpmw Jan 19, 2024
Author

if changed to precision=64 in python right from the start :-)
It seems that julia is conmplaining on size of the intergers when offered to factorial. Find below a snippet of the output I got:

julia> Int
Int64

julia> factorial(23)
ERROR: OverflowError: 23 is too large to look up in the table; consider using factorial(big(23)) instead

So, the 'big issue' is to use big :-) :-)

Anyway, I think it is all working sofar.
BTW: The 1000 is large, in my case 200 maybe enough. For now I will leave it as is.

Thanks for your input!

On the interger constants, I will think about a solution...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing discrete variables #524

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Implementing discrete variables #524

fpmw Jan 16, 2024

Replies: 2 comments · 5 replies

MilesCranmer Jan 16, 2024 Maintainer

fpmw Jan 17, 2024 Author

MilesCranmer Jan 17, 2024 Maintainer

fpmw Jan 18, 2024 Author

MilesCranmer Jan 19, 2024 Maintainer

MilesCranmer Jan 19, 2024 Maintainer

fpmw Jan 19, 2024 Author

fpmw
Jan 16, 2024

Replies: 2 comments 5 replies

MilesCranmer
Jan 16, 2024
Maintainer

fpmw Jan 17, 2024
Author

MilesCranmer Jan 17, 2024
Maintainer

fpmw
Jan 18, 2024
Author

MilesCranmer Jan 19, 2024
Maintainer

MilesCranmer Jan 19, 2024
Maintainer

fpmw Jan 19, 2024
Author