Anony is a small library that defines how ActiveRecord models should be anonymised for deletion purposes.
class User < ActiveRecord::Base
include Anony::Anonymisable
anonymise do
overwrite do
hex :first_name
end
end
end
irb(main):001:0> user = User.find(1)
=> #<User id="1" first_name="Alice">
irb(main):002:0> user.anonymise!
=> #<Anony::Result status="overwritten" fields=[:first_name] error=nil>
For our policy on compatibility with Ruby and Rails versions, see COMPATIBILITY.md.
This library is distributed as a Ruby gem, and we recommend adding it your Gemfile:
gem "anony"
The library injects itself using a mixin. To add this to a model class, you should include
Anony::Anonymisable
:
class User < ActiveRecord::Base
include Anony::Anonymisable
# ...
end
Alternatively, if you have a Rails application, you might wish to expose this behaviour
for all of your models: in which case, you can instead add it to ApplicationRecord
once:
# app/models/application_record.rb
class ApplicationRecord < ActiveRecord::Base
include Anony::Anonymisable
end
There are two primary ways to use this library: to either overwrite existing fields on a record, or to destroy the record altogether.
First, you should establish an anonymise
block in your model class:
class Employee < ActiveRecord::Base
include Anony::Anonymisable
anonymise do
end
end
If you want to overwrite certain fields on the model, you should use the overwrite
DSL. There are many different ways (known as "strategies") to overwrite your fields (see
Field strategies below). For now, let's use the hex
& nilable
strategies, which
overwrites fields using SecureRandom.hex
or sets them to nil
:
anonymise do
overwrite do
hex :field_name
nilable :nullable_field
end
end
Alternative, you may wish to simply destroy the record altogether when we call
#anonymise!
(this is useful if you're anonymising a collection of different models
together, only some of which need to be destroyed). This can be configured liked so:
anonymise do
destroy
end
Please note that both the overwrite
and destroy
strategies cannot be used simultaneously.
Now, given a model instance, we can use the #anonymise!
method to apply our strategies:
irb(main):001:0> model = Model.find(1)
=> #<Model id="1" field_name="Previous value" nullable_field="Previous">
irb(main):002:0> model.anonymise!
=> #<Anony::Result status="overwritten" fields=[:field_name, :nullable_field] error=nil>
Or, if you were using the destroy
strategy:
irb(main):002:0> model.anonymise!
=> #<Anony::Result status="destroyed" fields=nil error=nil>
When a model is anonymised, an Anony::Result
is returned. This allows the library to detail the changes is made and the strategy it used. The result object also contains the errors that may have been raised within Anony, allowing you to handle them elegantly without using the exceptions for flow control.
The result object has 3 attributes:
status
- If the model wasdestroyed
,overwritten
,skipped
or the operationfailed
fields
- In the event the model wasoverwritten
, the fields that were updated (excludes timestamps)error
- In the event the anonymisationfailed
, then the associated error. Note only rescues the following errors:ActiveRecord::RecordNotSaved
,ActiveRecord::RecordNotDestroyed
. Anything else is thrown.record
- The model instance that was anonymised to produce this result.
For convenience, the result object can also be queried with destroyed?
, overwritten?
, skipped?
and failed?
, so that it can be directly interrogated or used in a switch case
with the status
property.
This library ships with a number of built-in strategies:
- nilable overwrites the field with
nil
- hex overwrites the field with random hexadecimal characters
- email overwrites the field with an email
- phone_number overwrites the field with a dummy phone number
- current_datetime overwrites the field with
Time.zone.now
(using ActiveSupport's TimeWithZone)
You can override the default strategies, or add your own ones to make them available
everywhere, using the Anony::FieldLevelStrategies.register(name, &block)
method somewhere after
your application boots (e.g. in a Rails initializer):
Anony::FieldLevelStrategies.register(:reverse) do |original|
original.reverse
end
class Employee < ApplicationRecord
include Anony::Anonymisable
anonymise do
overwrite do
reverse :first_name
end
end
end
One strategy you might want to override is
@gocardless.com
domain so we can ensure any emails accidentally sent to this address would be quickly identified and fixed.:phone_number
is another strategy that you might wish to replace (depending on your primary location).
You can also use strategies on a case-by-case basis, by honouring the
.call(existing_value)
signature:
module OverwriteUUID
def self.call(_existing_value)
SecureRandom.uuid
end
end
require "overwrite_uuid"
class Manager < ApplicationRecord
include Anony::Anonymisable
anonymise do
overwrite do
with_strategy OverwriteUUID, :id
end
end
end
If your strategy doesn't respond to .call
, then it will be used as a constant value
whenever the field is anonymised.
class Manager < ApplicationRecord
include Anony::Anonymisable
anonymise do
overwrite do
with_strategy 123, :id
end
end
end
irb(main):001:0> manager = Manager.first
=> #<Manager id=42>
irb(main):002:0> manager.anonymise!
=> #<Anony::Result status="overwritten" fields=[:id] error=nil>
irb(main):003:0> manager
=> #<Manager id=123>
You can also use a block, which is executed in the context of the model so it can access local properties & methods. Blocks take the existing value of the column as the only argument:
class Manager < ApplicationRecord
include Anony::Anonymisable
anonymise do
overwrite do
with_strategy(:first_name) { |name| Digest::SHA2.hexdigest(name) }
with_strategy(:last_name) { "previous-name-of-#{id}" }
end
end
end
irb(main):001:0> manager = Manager.first
=> #<Manager id=42>
irb(main):002:0> manager.anonymise!
=> #<Anony::Result status="overwritten" fields=[:first_name, :last_name] error=nil>
irb(main):003:0> manager
=> #<Manager first_name="e9ab2800-d4b9-4227-94a7-7f81118d8a8a" last_name="previous-name-of-42">
Note: This is an experimental feature and has not been tested widely in production environments.
You can use selectors to anonymise multiple records. You first define a block for a specific subject that returns a list of anonymisable records.
anonymise do
selectors do
for_subject(:user_id) { |user_id| find_all_users(user_id) }
end
end
You can also use scopes
, where
, etc when defining your selectors:
anonymise do
selectors do
for_subject(:user_id) { |user_id| where(user_id: user_id) }
end
end
This can then be used to anonymise all those subject using this API:
ModelName.anonymise_for!(:user_id, "user_1234")
If you attempt to anonymise records with a selector that has not been defined it will throw an error.
When anonymising models using selectors, an array of Anony::Result
objects will be returned, one result per anonymised record in the model. These results contain a reference to the record that was anonymised to produce that result, so that changes made or failures can easily be linked back to the specific record.
If your model has an anonymised_at
column, Anony will automatically set that value
when calling #anonymise!
(similar to how Rails will modify the updated_at
timestamp).
This means you could automatically filter out anonymised records without matching on the
anonymised values.
Here is an example of adding this column with new tables:
class AddEmployees < ActiveRecord::Migration[6.0]
def change
create_table(:employees) do |t|
# ... the rest of your columns
t.column :anonymised_at, :datetime, null: true
end
end
end
Here is an example of adding this column to an existing table:
class AddAnonymisedAtToEmployees < ActiveRecord::Migration[6.0]
def change
add_column(:employees, :anonymised_at, :datetime, null: true)
end
end
Records can then be filtered out like so:
class Employees < ApplicationRecord
scope :without_anonymised, -> { where(anonymised_at: nil) }
end
There is also a helper defined when `Anony::Anonymisable" is included:
Employees.anonymised?
You might have a need to preserve model data in some (or all) circumstances. Anony exposes
the skip_if
DSL for expressing this preference, which runs the given block before
attempting any strategy.
- If the block returns truthy, anonymisation is skipped.
- If the block returns falsey, anonymisation continues.
class Manager
def should_not_be_anonymised?
id == 1 # The first manager must be kept
end
anonymise do
skip_if { should_not_be_anonymised? }
end
end
The result object will indicate the model was skipped:
irb(main):001:0> manager = Manager.find(1)
=> #<Manager id=1>
irb(main):002:0> manager.anonymise!
=> #<Anony::Result status="skipped" fields=[] error=nil>
One of the goals of this library is to ensure that your field strategies are complete, i.e. that the anonymisation behaviour of the model is always correct, even when database columns are added/removed or the contents of those columns changes.
As such, Anony will validate your model configuration when you try to anonymise the
model (unfortunately this cannot be safely done at boot as the database might not be
available). If your configuration is incomplete, calling #anonymise!
will raise a
FieldsException
and will not return an Anony:Result
object. This is perceived
to a critical error as anony cannot safely anonymise the model.
irb(main):001:0> manager = Manager.find(1)
=> #<Manager id=1>
irb(main):002:0> manager.anonymise!
Anony::FieldException (Invalid anonymisation strategy for field(s) [:username])
We recommend adding a test for each model that you anonymise (see Testing below).
Anony will fail if you try to anonymise a model without specifying a strategy for all of the columns (to ensure that anonymisation rules aren't missed over time). However, it's fine to define a strategy for a column that hasn't yet been added.
This means that, in order to add a new column, you should:
- Define a strategy for the new column (e.g.
nilable :new_column
) - Add the column in a database migration.
At GoCardless we do zero-downtime deploys so we would deploy the first change before then deploying the migration.
Rails applications typically have an id
, created_at
and updated_at
column on all new
tables by default. To avoid anonymising these fields (and thus prevent a
FieldsException
), they can be globally ignored:
# config/initializers/anony.rb
Anony::Config.ignore_fields(:id, :created_at, :updated_at)
By default, Config.ignore_fields
is an empty array and all fields are considered
anonymisable.
This library ships with a set of useful RSpec examples for your specs. Just require them somewhere before running your spec:
require "anony/rspec_shared_examples"
# spec/models/employee_spec.rb
RSpec.describe Employee do
# We use FactoryBot at GoCardless, but
# however you setup a model instance is fine
subject { FactoryBot.build(:employee) }
# If you just anonymise fields normally
it_behaves_like "overwritten anonymisable model"
# Or, if your anonymised model should be skipped
it_behaves_like "skipped anonymisable model"
# Or, if you anonymise by destroying the record
it_behaves_like "destroyed anonymisable model"
end
You can also override the subject inside the shared example if it helps (e.g. if you need to persist the record before anonymising it):
RSpec.describe Employee do
it_behaves_like "anonymisable model with destruction" do
subject { FactoryBot.create(:employee) }
end
end
If you're not using RSpec, or want more control over the tests, Anony also exposes an
instance method called #valid_anonymisation?
. A simple spec would be:
RSpec.describe Employee do
subject { described_class.new }
it { is_expected.to be_valid_anonymisation }
end
At GoCardless, we use Rubocop heavily to ensure consistency in our applications. This
library includes some Rubocop cops, which can be used by adding anony/cops
to the
require
list in your .rubocop.yml
:
require:
- anony/cops
This cop ensures that all models in your application have defined an anonymise
block.
The output looks like this:
app/models/employee.rb:7:1: W: Lint/DefineDeletionStrategy:
Define .anonymise for Employee, see https://github.com/gocardless/anony/blob/master/README.md for details:
class Employee < ApplicationRecord ...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If your models do not inherit from ApplicationRecord
, you can specify their superclass
in your .rubocop.yml
:
Lint/DefineDeletionStrategy:
ModelSuperclass: Acme::Record
If your models use multiple superclasses, you can specify a list of superclasses in your .rubocop.yml
. Note that you will have to specify ApplicationRecord
explicitly in this list should you want to lint all models which inherit from ApplicationRecord
.
Lint/DefineDeletionStrategy:
ModelSuperclass:
- Acme::Record
- UmbrellaCorp::Record
- Anony is available as open source under the terms of the MIT License.
- Bug reports and pull requests are welcome on GitHub at https://github.com/gocardless/anony.
GoCardless ♥ open source. If you do too, come join us.