Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning logic for discovery_nodes table - using database timestamp #4049

Merged
merged 5 commits into from
Sep 4, 2023

Conversation

arcusfelis
Copy link
Contributor

@arcusfelis arcusfelis commented Jun 30, 2023

Adds logic to remove nodes from the disco table, so we don't try to contact them if they are not available.
This also requires a way to signal that the node is still alive (we update timestamp column for that).

Proposed changes include:

  • Add mongoose_rdbms_timestamp.erl to get timestamp from the RDBMS server.
  • We don't use datetime() datatype to store updated_timestamp, because each database has its own functions to work with time. Even just getting the current timestamp is different in different DBs.
  • We remove all records for the cluster which are not updated the last 24 hours.
  • We allow to override timestamp for tests (avoids adding mocks).
  • We store timestamp in seconds - microseconds are overkill for what we are using it.

This PR addresses MIM-1942:

We put new node names into that table, so they are available for discovery.
Which could contain a lot of them if we use dynamic node names (k8s non-stateful set or autoscaling groups tend to produce these names). But we never remove these names automatically.

Issues of not removing:

  • This makes reviewing that table content hard.
  • It returns dead nodes to CETS clustering logic and net_kernel tries to contact them.
  • It uses node_nums but never releases them, so we could overflow the 255 maximum value (nothing critical will happen, but still a bit dirty).

We need some mechanism to clean these nodes.
We can use DB time to calculate TTL and remove (but each DB uses their own time functions for that, especially if you wanna get an integer from the DB).

Alternative PR #4045 that uses just timestamp from Erlang node (but we have to ensure that all nodes have the correct timestamp. Using DB in this PR is better, because DB is already available, if we want to remove something from DB).

Test with Helm:

# Setup DB using MIM test scripts:
DB="pgsql" ./tools/setup-db.sh
# Ensure MIM stopped...
killall beam.smp
# Get helm scripts
git clone https://github.com/esl/MongooseHelm.git
cd MongooseHelm
# Ensure old hem cluster is dead
helm uninstall mim-test
# Test with these changes
helm install mim-test MongooseIM --set replicaCount=3 --set image.tag=PR-4049 --set persistentDatabase=rdbms --set rdbms.username=ejabberd --set rdbms.password=mongooseim_secret --set rdbms.database=ejabberd --set volatileDatabase=cets
# Check that all is fine in logs:
kubectl get pod
kubectl logs mongooseim-0
kubectl exec -it mongooseim-0 -- mongooseimctl cets systemInfo

@mongoose-im
Copy link
Collaborator

mongoose-im commented Jun 30, 2023

small_tests_24 / small_tests / 4c7ee8e
Reports root / small


small_tests_25_arm64 / small_tests / 4c7ee8e
Reports root / small


small_tests_25 / small_tests / 4c7ee8e
Reports root / small


ldap_mnesia_24 / ldap_mnesia / 4c7ee8e
Reports root/ big
OK: 2224 / Failed: 0 / User-skipped: 840 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / 4c7ee8e
Reports root/ big
OK: 4200 / Failed: 0 / User-skipped: 90 / Auto-skipped: 0


ldap_mnesia_25 / ldap_mnesia / 4c7ee8e
Reports root/ big
OK: 2224 / Failed: 0 / User-skipped: 840 / Auto-skipped: 0


dynamic_domains_mysql_redis_25 / mysql_redis / 4c7ee8e
Reports root/ big
OK: 4174 / Failed: 0 / User-skipped: 116 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / 4c7ee8e
Reports root/ big
OK: 4200 / Failed: 0 / User-skipped: 90 / Auto-skipped: 0


internal_mnesia_25 / internal_mnesia / 4c7ee8e
Reports root/ big
OK: 2370 / Failed: 0 / User-skipped: 694 / Auto-skipped: 0


pgsql_mnesia_24 / pgsql_mnesia / 4c7ee8e
Reports root/ big
OK: 4573 / Failed: 0 / User-skipped: 99 / Auto-skipped: 0


pgsql_mnesia_25 / pgsql_mnesia / 4c7ee8e
Reports root/ big
OK: 4573 / Failed: 0 / User-skipped: 99 / Auto-skipped: 0


elasticsearch_and_cassandra_25 / elasticsearch_and_cassandra_mnesia / 4c7ee8e
Reports root/ big
OK: 2730 / Failed: 0 / User-skipped: 673 / Auto-skipped: 0


pgsql_cets_25 / pgsql_cets / 4c7ee8e
Reports root/ big
OK: 4549 / Failed: 0 / User-skipped: 123 / Auto-skipped: 0


mysql_redis_25 / mysql_redis / 4c7ee8e
Reports root/ big
OK: 4559 / Failed: 0 / User-skipped: 113 / Auto-skipped: 0


mssql_mnesia_25 / odbc_mssql_mnesia / 4c7ee8e
Reports root/ big
OK: 4570 / Failed: 0 / User-skipped: 102 / Auto-skipped: 0

@codecov
Copy link

codecov bot commented Jun 30, 2023

Codecov Report

Patch coverage: 94.73% and project coverage change: +0.01% 🎉

Comparison is base (e6cbd30) 83.89% compared to head (d1dade1) 83.91%.
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4049      +/-   ##
==========================================
+ Coverage   83.89%   83.91%   +0.01%     
==========================================
  Files         551      552       +1     
  Lines       33596    33624      +28     
==========================================
+ Hits        28187    28215      +28     
  Misses       5409     5409              
Files Changed Coverage Δ
src/rdbms/mongoose_rdbms_timestamp.erl 88.88% <88.88%> (ø)
src/mongoose_cets_discovery_rdbms.erl 88.05% <96.55%> (+2.64%) ⬆️

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@arcusfelis arcusfelis marked this pull request as draft July 6, 2023 08:43
Base automatically changed from feature/cets to master August 8, 2023 14:57
@arcusfelis arcusfelis force-pushed the feature-cets-rdbms-disco-cleaning-use-db-time branch from 4c7ee8e to d6223aa Compare August 30, 2023 08:11
@mongoose-im
Copy link
Collaborator

mongoose-im commented Aug 30, 2023

elasticsearch_and_cassandra_25 / elasticsearch_and_cassandra_mnesia / d6223aa
Reports root/ big
OK: 369 / Failed: 0 / User-skipped: 38 / Auto-skipped: 0


small_tests_24 / small_tests / d6223aa
Reports root / small


small_tests_25 / small_tests / d6223aa
Reports root / small


small_tests_25_arm64 / small_tests / d6223aa
Reports root / small


dynamic_domains_mysql_redis_25 / mysql_redis / d6223aa
Reports root/ big
OK: 4204 / Failed: 0 / User-skipped: 116 / Auto-skipped: 0


ldap_mnesia_24 / ldap_mnesia / d6223aa
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 833 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / d6223aa
Reports root/ big
OK: 4236 / Failed: 0 / User-skipped: 84 / Auto-skipped: 0


ldap_mnesia_25 / ldap_mnesia / d6223aa
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 833 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / d6223aa
Reports root/ big
OK: 4236 / Failed: 0 / User-skipped: 84 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_25 / odbc_mssql_mnesia / d6223aa
Reports root/ big
OK: 4233 / Failed: 0 / User-skipped: 87 / Auto-skipped: 0


pgsql_mnesia_24 / pgsql_mnesia / d6223aa
Reports root/ big
OK: 4625 / Failed: 0 / User-skipped: 91 / Auto-skipped: 0


internal_mnesia_25 / internal_mnesia / d6223aa
Reports root/ big
OK: 2421 / Failed: 0 / User-skipped: 687 / Auto-skipped: 0


pgsql_cets_25 / pgsql_cets / d6223aa
Reports root/ big
OK: 4595 / Failed: 0 / User-skipped: 121 / Auto-skipped: 0


pgsql_mnesia_25 / pgsql_mnesia / d6223aa
Reports root/ big
OK: 4625 / Failed: 0 / User-skipped: 91 / Auto-skipped: 0


mssql_mnesia_25 / odbc_mssql_mnesia / d6223aa
Reports root/ big
OK: 4622 / Failed: 0 / User-skipped: 94 / Auto-skipped: 0

@arcusfelis arcusfelis marked this pull request as ready for review August 30, 2023 08:44
Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good in general. I added some comments.

src/mongoose_cets_discovery_rdbms.erl Outdated Show resolved Hide resolved
big_tests/tests/cets_disco_SUITE.erl Show resolved Hide resolved
big_tests/tests/cets_disco_SUITE.erl Outdated Show resolved Hide resolved
big_tests/tests/cets_disco_SUITE.erl Outdated Show resolved Hide resolved
@mongoose-im
Copy link
Collaborator

mongoose-im commented Sep 1, 2023

elasticsearch_and_cassandra_25 / elasticsearch_and_cassandra_mnesia / d1dade1
Reports root/ big
OK: 369 / Failed: 0 / User-skipped: 38 / Auto-skipped: 0


small_tests_24 / small_tests / d1dade1
Reports root / small


small_tests_25 / small_tests / d1dade1
Reports root / small


small_tests_25_arm64 / small_tests / d1dade1
Reports root / small


ldap_mnesia_24 / ldap_mnesia / d1dade1
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 833 / Auto-skipped: 0


dynamic_domains_mysql_redis_25 / mysql_redis / d1dade1
Reports root/ big
OK: 4204 / Failed: 0 / User-skipped: 116 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / d1dade1
Reports root/ big
OK: 4236 / Failed: 0 / User-skipped: 84 / Auto-skipped: 0


ldap_mnesia_25 / ldap_mnesia / d1dade1
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 833 / Auto-skipped: 0


internal_mnesia_25 / internal_mnesia / d1dade1
Reports root/ big
OK: 2421 / Failed: 0 / User-skipped: 687 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / d1dade1
Reports root/ big
OK: 4236 / Failed: 0 / User-skipped: 84 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_25 / odbc_mssql_mnesia / d1dade1
Reports root/ big
OK: 4233 / Failed: 0 / User-skipped: 87 / Auto-skipped: 0


mysql_redis_25 / mysql_redis / d1dade1
Reports root/ big
OK: 4605 / Failed: 0 / User-skipped: 111 / Auto-skipped: 0


pgsql_mnesia_24 / pgsql_mnesia / d1dade1
Reports root/ big
OK: 4625 / Failed: 0 / User-skipped: 91 / Auto-skipped: 0


pgsql_mnesia_25 / pgsql_mnesia / d1dade1
Reports root/ big
OK: 4625 / Failed: 0 / User-skipped: 91 / Auto-skipped: 0


pgsql_cets_25 / pgsql_cets / d1dade1
Reports root/ big
OK: 4595 / Failed: 0 / User-skipped: 121 / Auto-skipped: 0


mssql_mnesia_25 / odbc_mssql_mnesia / d1dade1
Reports root/ big
OK: 4622 / Failed: 0 / User-skipped: 94 / Auto-skipped: 0

@arcusfelis arcusfelis requested a review from chrzaszcz September 1, 2023 13:21
Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment from me.

EnsureMocked = fun() ->
Timestamp = rpc(Node, mongoose_rdbms_timestamp, select, [])
end,
EnsureMocked(),
Copy link
Member

@chrzaszcz chrzaszcz Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why execute and ignore result? Seems to do nothing.

Copy link
Contributor Author

@arcusfelis arcusfelis Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am matching Timestamp variable here.
So, EnsureMocked will crash if select() returns something different.

Copy link
Contributor Author

@arcusfelis arcusfelis Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. if meck fails to mock for some reason, we would get:

=== ERROR! init_per_testcase crashed!
	Location: [{cets_disco_SUITE,'-mock_timestamp/2-fun-0-',[127](file:///Users/mikhailuvarov/erlang/esl/MongooseIM/big_tests/ct_report/[email protected]_17.25.22/big_tests.tests.cets_disco_SUITE.logs/run.2023-09-01_17.25.23/cets_disco_suite.src.html#127)},
              {cets_disco_SUITE,mock_timestamp,[129](file:///Users/mikhailuvarov/erlang/esl/MongooseIM/big_tests/ct_report/[email protected]_17.25.22/big_tests.tests.cets_disco_SUITE.logs/run.2023-09-01_17.25.23/cets_disco_suite.src.html#129)},
              {cets_disco_SUITE,init_per_testcase,[50](file:///Users/mikhailuvarov/erlang/esl/MongooseIM/big_tests/ct_report/[email protected]_17.25.22/big_tests.tests.cets_disco_SUITE.logs/run.2023-09-01_17.25.23/cets_disco_suite.src.html#50)},
              {test_server,do_init_per_testcase,1552},
              {test_server,run_test_case_eval1,1253},
              {test_server,run_test_case_eval,1223}]
	Reason: {{badmatch,1693581924},
 [{cets_disco_SUITE,'-mock_timestamp/2-fun-0-',2,
                    [{file,"cets_disco_SUITE.erl"},{line,127}]},
  {cets_disco_SUITE,mock_timestamp,2,
                    [{file,"cets_disco_SUITE.erl"},{line,129}]},
  {cets_disco_SUITE,init_per_testcase,2,
                    [{file,"cets_disco_SUITE.erl"},{line,50}]},
  {test_server,do_init_per_testcase,2,[{file,"test_server.erl"},{line,1552}]},
  {test_server,run_test_case_eval1,6,[{file,"test_server.erl"},{line,1253}]},
  {test_server,run_test_case_eval,9,[{file,"test_server.erl"},{line,1223}]}]}

I could remove EnsureMocked though, it is just to verify and mocking is working.

@chrzaszcz chrzaszcz merged commit c140a5b into master Sep 4, 2023
@chrzaszcz chrzaszcz deleted the feature-cets-rdbms-disco-cleaning-use-db-time branch September 4, 2023 08:42
@chrzaszcz chrzaszcz added this to the 6.2.0 milestone Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants