Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-11675 fixup: MDEV-35474 Start Alter GTID Error Message Can Use Wrong Server_Id #3662

Open
wants to merge 1 commit into
base: 10.11
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions mysql-test/suite/rpl/r/rpl_start_alter_restart_master.result
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ set global slave_parallel_mode=optimistic;
set global gtid_strict_mode=1;
start slave;
connection master;
call mtr.add_suppression("ALTER query started at .+ could not be completed");
SET @@session.server_id= 11;
call mtr.add_suppression("ALTER query started at 0-11-\\d+ could not be completed");
SET @old_debug_master= @@global.debug;
set binlog_alter_two_phase=true;
create table t3( a int primary key, b int) engine=innodb;
Expand All @@ -37,7 +38,7 @@ t3 CREATE TABLE `t3` (
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
master-bin.000001 # Gtid # # BEGIN GTID #-#-#
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test obfuscates GTIDs. Checking GTIDs would make the test more assertive.

I checked locally and confirm that entries are 0-11-x before the primary crashes and 0-1-x after it restarts.

master-bin.000001 # Query # # use `mtr`; INSERT INTO test_suppressions (pattern) VALUES ( NAME_CONST('pattern',_latin1'ALTER query started at .+ could not be completed' COLLATE 'latin1_swedish_ci'))
master-bin.000001 # Query # # use `mtr`; INSERT INTO test_suppressions (pattern) VALUES ( NAME_CONST('pattern',_latin1'ALTER query started at 0-11-\\d+ could not be completed' COLLATE 'latin1_swedish_ci'))
master-bin.000001 # Query # # COMMIT
master-bin.000001 # Gtid # # GTID #-#-#
master-bin.000001 # Query # # use `test`; create table t3( a int primary key, b int) engine=innodb
Expand All @@ -56,7 +57,7 @@ connection slave;
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
slave-bin.000001 # Gtid # # BEGIN GTID #-#-#
slave-bin.000001 # Query # # use `mtr`; INSERT INTO test_suppressions (pattern) VALUES ( NAME_CONST('pattern',_latin1'ALTER query started at .+ could not be completed' COLLATE 'latin1_swedish_ci'))
slave-bin.000001 # Query # # use `mtr`; INSERT INTO test_suppressions (pattern) VALUES ( NAME_CONST('pattern',_latin1'ALTER query started at 0-11-\\d+ could not be completed' COLLATE 'latin1_swedish_ci'))
slave-bin.000001 # Query # # COMMIT
slave-bin.000001 # Gtid # # GTID #-#-#
slave-bin.000001 # Query # # use `test`; create table t3( a int primary key, b int) engine=innodb
Expand Down
7 changes: 6 additions & 1 deletion mysql-test/suite/rpl/t/rpl_start_alter_restart_master.test
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,12 @@ set global gtid_strict_mode=1;
start slave;

--connection master
call mtr.add_suppression("ALTER query started at .+ could not be completed");
# MDEV-35474 Start Alter GTID Error Message Can Use Wrong Server_Id
#
# When replicating, the GTID the slave warns should be of this sessional
# server_id rather than 1 from reconnecting to the restarted master.
SET @@session.server_id= 11;
call mtr.add_suppression("ALTER query started at 0-11-\\d+ could not be completed");

SET @old_debug_master= @@global.debug;
--let $binlog_alter_two_phase= `select @@binlog_alter_two_phase`
Expand Down
5 changes: 3 additions & 2 deletions sql/log.cc
Original file line number Diff line number Diff line change
Expand Up @@ -626,8 +626,9 @@ bool write_bin_log_start_alter(THD *thd, bool& partial_alter,
start_alter_info *info= thd->rgi_slave->sa_info;
bool is_shutdown= false;

info->sa_seq_no= start_alter_id;
info->domain_id= thd->variables.gtid_domain_id;
info->gtid.domain_id= thd->variables.gtid_domain_id;
info->gtid.server_id= thd->variables.server_id;
info->gtid.seq_no= start_alter_id;
Comment on lines +629 to +631
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_alter_id originates from get_start_alter_id(), which gives thd->variables.gtid_seq_no in a (configured) replica and 0 otherwise.
And there’re code that branches on this 0 or proceeds otherwise.

Perhaps the replica-checking logic should be split away from start_alter_id, so THD::variables can hold a GTID multi-member like start_alter_info in this PR?
Implementation’s out of this ticket’s scope, of course.

mysql_mutex_lock(&mi->start_alter_list_lock);
// possible stop-slave's marking of the whole alter state list is checked
is_shutdown= mi->is_shutdown;
Expand Down
14 changes: 7 additions & 7 deletions sql/log_event_server.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1682,8 +1682,9 @@ static start_alter_info *get_new_start_alter_info(THD *thd)
sql_print_error("Failed to allocate memory for ddl log free list");
return 0;
}
info->sa_seq_no= 0;
info->domain_id= 0;
info->gtid.domain_id= 0;
info->gtid.server_id= 0;
info->gtid.seq_no= 0;
info->direct_commit_alter= false;
info->state= start_alter_state::INVALID;
mysql_cond_init(0, &info->start_alter_cond, NULL);
Expand Down Expand Up @@ -1769,8 +1770,8 @@ int Query_log_event::handle_split_alter_query_log_event(rpl_group_info *rgi,
List_iterator<start_alter_info> info_iterator(mi->start_alter_list);
while ((info= info_iterator++))
{
if(info->sa_seq_no == rgi->gtid_ev_sa_seq_no &&
info->domain_id == rgi->current_gtid.domain_id)
if(info->gtid.seq_no == rgi->gtid_ev_sa_seq_no &&
info->gtid.domain_id == rgi->current_gtid.domain_id)
{
info_iterator.remove();
break;
Expand Down Expand Up @@ -2705,10 +2706,9 @@ static void check_and_remove_stale_alter(Relay_log_info *rli)
{
DBUG_ASSERT(info->state == start_alter_state::REGISTERED);

sql_print_warning("ALTER query started at %u-%lu-%llu could not "
sql_print_warning("ALTER query started at %u-%u-%llu could not "
"be completed because of unexpected master server "
"or its binlog change", info->domain_id,
mi->master_id, info->sa_seq_no);
"or its binlog change", PARAM_GTID(info->gtid));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe real reason I swapped to a nested struct is so I can write this PARAM_GTID? 🙃

info_iterator.remove();
mysql_mutex_lock(&mi->start_alter_lock);
info->state= start_alter_state::ROLLBACK_ALTER;
Expand Down
3 changes: 1 addition & 2 deletions sql/rpl_rli.h
Original file line number Diff line number Diff line change
Expand Up @@ -654,8 +654,7 @@ struct start_alter_info
/*
ALTER id is defined as a pair of GTID's seq_no and domain_id.
*/
decltype(rpl_gtid::seq_no) sa_seq_no; // key for searching (SA's id)
uint32 domain_id;
rpl_gtid gtid; // rpl_gtid::seq_no is the key for searching (SA's id)
bool direct_commit_alter; // when true CA thread executes the whole query
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (as opposed to rpl_gtid *gtid) will put the GTID in the memory next to sibling members like direct_commit_alter (at least in C). This also matches sa_seq_no and domain_id.

I’m not sure about this “key” designation in the comments, though – I didn’t encounter any “keying” while migrating affected code.

/*
0 prepared and not error from commit and rollback
Expand Down
2 changes: 1 addition & 1 deletion sql/sql_table.cc
Original file line number Diff line number Diff line change
Expand Up @@ -7628,7 +7628,7 @@ write_bin_log_start_alter_rollback(THD *thd, uint64 &start_alter_id,
start_alter_info *info= thd->rgi_slave->sa_info;
Master_info *mi= thd->rgi_slave->rli->mi;

if (info->sa_seq_no == 0)
if (info->gtid.seq_no == 0)
{
/*
Error occurred before SA got to processing incl its binlogging.
Expand Down