Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSGVO-compatible server configuration #118

Open
Mischback opened this issue Mar 16, 2024 · 1 comment
Open

DSGVO-compatible server configuration #118

Mischback opened this issue Mar 16, 2024 · 1 comment
Labels
area/dependencies Affects the dependencies area/repository Affects the repository structure type/feature New feature / feature request
Milestone

Comments

@Mischback
Copy link
Owner

Mischback commented Mar 16, 2024

This will still need more research!

Ok, so the DSGVO / GDPR considers even the users' IP address personal data, which means, that storing and processing this data is regulated (strictly).

On the other hand, protecting my own server, e.g. fail2ban, works by analyzing the log files and requires the real IP addresses to work properly.

Resources

Related

@Mischback Mischback added area/dependencies Affects the dependencies area/repository Affects the repository structure type/feature New feature / feature request labels Mar 16, 2024
@Mischback Mischback added this to the Crawl milestone Mar 16, 2024
@Mischback
Copy link
Owner Author

Mischback commented Mar 17, 2024

Idea

nginx does log the actual IP address into a dedicated and short-term log file (max. 24h rotation time? Might even be shorter, like 2h, 6h?).

fail2ban can work on this log file, doing its magic by banning suspicious activities by IP address.

The short-term access log is then processed to get rid of the actual IP addresses and can then be transferred to the actual server logs.

Other Idea

The idea above assumes that there is an actual requirement to let fail2ban (or similar tools) work for the website. Is this really the case?

The website mischback.de is statically generated, pure HTML. There is no dynamic web technology attached. Hopefully this means, there is no attack vector against the website. The only (assumed) attack surface is the webserver itsself.

Possible Attack Vectors:

  • Enumeration Attack to find the used CMS
    • There is no CMS
    • There is no dynamic admin interface or stuff like that
    • There is a brief description of the setup; the whole project is Open Source anyways
    • Does not apply but will result (most likely) in a high number of 404 pages. Probably nginx will handle the expected load.
  • (D)DoS Attack
    • Yeah, if nginx or the actual server can't handle the load, we're f*****. Nothing to do about this. fail2ban might help mitigating this, but as there is no business interest in the website, nobody will care anyway.
  • Highly sophisticated Attack to gain root access to the actual server
    • has to be an attack against nginx while serving static content (most likely this is unlikely)
    • I doubt that my website/webserver would be a primary target for something like this
    • nginx is open source software, so probably fast fix

Summary: Storing of the actual IP address is not required. A generalized and anonymized identifier is sufficient for on-page optimization (which has to be implemented during further development). In order to comply with the DSGVO, not storing the IP address seems like the easiest way (see privacy statement of C4).

Getting rid of actual IP addresses

  • hashing of the IP addresses is not enough! While this is not reversible, the actual IP address may be determined using a brute-force attack (or a rainbow table attack).
  • reducing the resolution of the IP address is kind of tricky: Dropping too many octets will make further analysis difficult/impossible. Dropping not enough octets will still allow identification of the user (especially for IPv6, which is highly relevant for the assumed IPv6-based host).
  • combination of both: Drop some resolution of the IP address (like 40 bit), adding a salt (which is generated randomly at the point of processing the short-term log and not stored anywhere) and then apply a hash function. This should allow the tracking of some user's movement on the page without allowing identification of that user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependencies Affects the dependencies area/repository Affects the repository structure type/feature New feature / feature request
Projects
None yet
Development

No branches or pull requests

1 participant