This is a simple schedulable script that lets us remove outdated Segment Warehouse data from Redshift.
It's useful for automatically archiving data outside of a given retention window (i.e. after 6 months).
The script relies on a handful of environment variables -
REDSHIFT_URL
- URL to use to connect to the target Redshift instance/databaseONLY_TABLES
- (optional) comma-separated list of fully-qualified (i.e. schema.tablename) tables to which to limit pruning operationsRETENTION_INTERVAL
- (optional, defaults to365 days
) prune data older than this time ago, as specified by a Postgres interval value
With those set, you can run the following Rake tasks:
rake redshift:list_tables
- list all tables in the database along with the count of prunable rows in eachrake redshift:prune_all
- prune data older than the specified threshold from all tables in the database
You'll need a working Ruby 2.3 setup with Bundler
Once those are set up, you can install dependencies with bundle install
.
Redshift Unloader is released under the MIT License
Ello was created by idealists who believe that the essential nature of all human beings is to be kind, considerate, helpful, intelligent, responsible, and respectful of others. To that end, we will be enforcing the Ello rules within all of our open source projects. If you don’t follow the rules, you risk being ignored, banned, or reported for abuse.