Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for time-based integer IDs (e.g. snowflake) using encode/decode #729

Open
calebj opened this issue Jan 6, 2025 · 1 comment · May be fixed by #730
Open

Support for time-based integer IDs (e.g. snowflake) using encode/decode #729

calebj opened this issue Jan 6, 2025 · 1 comment · May be fixed by #730

Comments

@calebj
Copy link

calebj commented Jan 6, 2025

Creating a new issue from #528 (comment).

Currently the only supported ways to use IDs/integers in a control column are as a timestamp directly representing seconds, ms, us or ns since the UNIX epoch; or as a generic, non-timestamp ID range. #683 adds functionality to use ranges of UUID and text values via conversion functions. I want to add support for integers as well.

The primary example is "snowflake" IDs, used by X, Discord, Instagram, Mastodon, and others. I brought them up in #388 and the resulting changes let me use snowflake IDs as ordinary integers, with offsets for interval and retention. This works well enough, but leads to gaps when no data is being added for a while since premake only looks at the current maximum value.

For background, snowflake values fit in 63 bits (for signed int64) and encode a timestamp plus a combination of worker and sequence information. Since each implementation uses a different bit range and epoch for the timestamp, partman won't be able to cover all possibilities with a single setting for p_epoch like 'snowflake'.

I propose a new option for p_epoch of 'func', which indicates that partman should use the encoder/decoder functions for IDs the same way it does for UUIDs and text. This should be a straightforward change, since most of the work has already been done in #683. Also, the parameter documentation for encoder/decoder mentions snowflakes, but those are only allowed if they are stored as a string rather than a bigint.

Example Discord ID encoder and decoder:
-- Discord snowflakes use a 41 bit timestamp and 22 bit worker/process/sequence.
-- The timestamp is the number of milliseconds since the first second of 2015 UTC (1420070400s since epoch)
CREATE FUNCTION discord_snowflake_to_timestamp(id bigint)
    RETURNS TIMESTAMPTZ LANGUAGE SQL IMMUTABLE PARALLEL SAFE AS
    $$SELECT TO_TIMESTAMP((id >> 22)/1000.0 + 1420070400)$$;

CREATE FUNCTION timestamp_to_discord_snowflake(ts TIMESTAMPTZ)
    RETURNS BIGINT LANGUAGE SQL IMMUTABLE PARALLEL SAFE AS
    $$SELECT CAST((EXTRACT(epoch FROM ts) - 1420070400)*1000 AS BIGINT) << 22$$;
@keithf4
Copy link
Collaborator

keithf4 commented Jan 6, 2025

@akulapid
Just tagging you here on this one in case you might want to take a pass at it. Otherwise, I'll try and take a look at this in the future when I have some time.

@keithf4 keithf4 added this to the Future milestone Jan 6, 2025
calebj added a commit to calebj/pg_partman that referenced this issue Jan 7, 2025
This PR allows users to specify the special value of 'func' for p_epoch
to use custom functions to encode/decode time-ordered integers other than
the classic seconds, ms, us or ns since the UNIX epoch.

Resolves pgpartman#729.
@calebj calebj linked a pull request Jan 7, 2025 that will close this issue
calebj added a commit to calebj/pg_partman that referenced this issue Jan 8, 2025
This PR allows users to specify the special value of 'func' for p_epoch
to use custom functions to encode/decode time-ordered integers other than
the classic seconds, ms, us or ns since the UNIX epoch.

Resolves pgpartman#729.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants