Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zephyr does not store a new IRK when another device re-bonds with a Zephyr device #46798

Closed
jilu-ot opened this issue Jun 22, 2022 · 27 comments
Closed
Assignees
Labels
area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@jilu-ot
Copy link
Contributor

jilu-ot commented Jun 22, 2022

Describe the bug

This issue has been observed when pairing an Android 13 device with a Zephyr 2.2.1 device using privacy and secure connections. The problem is probably also present in Android 12 devices with a security patch from June 2022.

The Zephyr function ull_filter_rl_find in ull_filter.c searches the controller's resolve list for an identity address and returns the resolve list index for it if it is found. It determines that an identity already exists if only bit 0 of the address type and the address itself is a match.

It is used by ll_rl_add to see if an entry for a specified identity address is already present in the controller's resolve list or to give the index for the next available index.

ll_rl_add returns BT_HCI_ERR_INVALID_PARAM if the identity address is already found in the controller's resolve list. This return value is passed up to hci_id_add which is called from bt_id_add.

bt_id_add does nothing if hci_id_add returns BT_HCI_ERR_INVALID_PARAM, probably because the key bt_id_add was called to store was determined by ull_filter_rl_find to already exist in the controller's resolve list.

This creates a problem if a device that re-bonds with a Zephyr device is using a new IRK.

This new IRK will not be stored in the controller's resolve list because ull_filter_rl_find says it is already there based only on a comparison of bit 0 of the address type and the address itself. The IRK is not compared.

Re-bonding with a Zephyr device using an IRK, that is different from the last bonding, will therefore result in the new IRK not being stored in the controller's resolve list.

The device bonding with the Zephyr device using a new IRK will therefore not be able to re-connect to the zephyr device after it lost the BLE connection to it, since the Zephyr device's controller will not be able to resolve the device's random address because the Zephyr device's controller's resolve list still contains the previous IRK.

A potential work around would be to reboot the Zephyr device in order to make it repopulate the Zephyr device's controller's resolve list with identities from the persisted bonding list obtained through Zephyr's settings system (setting load).

To Reproduce

You need the following two devices.

Device 1: A device that changes its IRK when its bonding list is empty with a Zephyr device. This could perhaps be an Android 13 based device with the right beta version or an Android 12 device with a sufficiently new security patch. See this issue's description.

Device 2: A Zephyr based device.

Steps to reproduce the behavior:

  1. Bond device 1 with device 2.
  2. Forget the bonding with device 2 in device 1 and make sure the bonding list in device 1 is now empty so it changes its IRK.
  3. Bond with device 1 with device 2 again.
  4. Disconnect device 2 from device 1.
  5. Try to reconnect to device 1 to device 2 and see that you cannot.

Expected behavior

Device 1 should be able to re-connect successfully in step 5. in the steps to reproduce.

Impact

This creates confusion for users with little or no technical skills, such as elderly hearing aid users, since they would likely not understand why their phone cannot re-connect to their hearing aids after, for instance, having brought their phone out of BLE range with their hearing aid.

Potentially even worse if rebooting the Zephyr device does not make reconnection possible again.

Logs and console output

None.

Environment (please complete the following information):

  • Zephyr 2.2.1.
  • Android 12 and 13.

Additional context

None.

@jilu-ot jilu-ot added the bug The issue is a bug, or the PR is fixing a bug label Jun 22, 2022
@kruithofa kruithofa added the area: Bluetooth Host Bluetooth Host (excluding BR/EDR) label Jun 23, 2022
@alwa-nordic
Copy link
Collaborator

The controller is behaving correctly according to Core 5.3 Vol 4 Part E 7.8.38. Quote follows.

If there is an existing entry in the resolving list with the same
Peer_Identity_Address and Peer_Identity_Address_Type, or with the same
Peer_IRK, the Controller should return the error code Invalid HCI Command
Parameters (0x12).

The fix should be done on the host side.

@alwa-nordic
Copy link
Collaborator

I suggest the fix to be performing an HCI_LE_Remove_Device_From_Resolving_List as part of bt_id_add before adding the device again.

@alwa-nordic alwa-nordic self-assigned this Jun 23, 2022
@Thalley
Copy link
Collaborator

Thalley commented Jun 23, 2022

If I understand this correctly, we have
Device 1: Which bonds with Device 2 (Zephyr), and then disconnects. After the disconnect, Device 1 then removes its local bonding information, and attempts to pair (and bond) with Device 2 again, but with same identity address and a new IRK. Is that correct?

@jilu-ot
Copy link
Contributor Author

jilu-ot commented Jun 23, 2022

If I understand this correctly, we have Device 1: Which bonds with Device 2 (Zephyr), and then disconnects. After the disconnect, Device 1 then removes its local bonding information, and attempts to pair (and bond) with Device 2 again, but with same identity address and a new IRK. Is that correct?

Yes, that is correct.

@jilu-ot
Copy link
Contributor Author

jilu-ot commented Jun 23, 2022

It is a new behavior in Android 12 and 13 phones, which is implemented to prevent tracking. A security issue.

@jilu-ot
Copy link
Contributor Author

jilu-ot commented Jun 23, 2022

I suggest the fix to be performing an HCI_LE_Remove_Device_From_Resolving_List as part of bt_id_add before adding the device again.

I tried that as a quick fix by deleting the old key and adding the new if hci_id_add returns an error. This seemed to work. Needs to be investigated further though. Something like this.

	err = hci_id_add(&keys->addr, keys->irk.val);
	if (err) {
		BT_ERR("Failed to add IRK to controller");
		/* Quick fix for overwriting the duplicate entry with the new since the IRK could have been changed. */
		bt_id_del(keys);
		err = hci_id_add(&keys->addr, keys->irk.val);
		/* What to do if it still returns an error? */
		goto done;
	}

@Thalley
Copy link
Collaborator

Thalley commented Jun 23, 2022

@jilu-ot You mentioned that this issue exist for Zephyr 2.2.1 (which is fairly old at this point).

I assume the issue also exists for Zephyr 3.1?

@jilu-ot
Copy link
Contributor Author

jilu-ot commented Jun 23, 2022

I heard that @ibaz-nordic has already worked on a an issue (issue 45827) that sounds like this one. Is this right?

@jilu-ot
Copy link
Contributor Author

jilu-ot commented Jun 23, 2022

@jilu-ot You mentioned that this issue exist for Zephyr 2.2.1 (which is fairly old at this point).

I assume the issue also exists for Zephyr 3.1?

Yes, it looks that way, since nothing seems to be done in bt_id_add if hci_id_add returns an error.

	err = hci_id_add(keys->id, &keys->addr, keys->irk.val);
	if (err) {
		BT_ERR("Failed to add IRK to controller");
		goto done;
	}

@ibaz-nordic
Copy link
Contributor

I heard that @ibaz-nordic has already worked on a an issue (issue 45827) that sounds like this one. Is this right?

The issues are quite similar. Mine has the same IRK. The fix was reverted because it breaks PTS. So it's still an open issue.

@Thalley
Copy link
Collaborator

Thalley commented Jun 24, 2022

@ibaz-nordic Any work going on right now to fix it (again)?

@ibaz-nordic
Copy link
Contributor

@ibaz-nordic Any work going on right now to fix it (again)?

I don't have time to look into it (for the next ~1 mo). Should I re-open the issue in case someone has the capacity to look into it sooner? What's the procedure?

@Thalley
Copy link
Collaborator

Thalley commented Jun 24, 2022

Reopened the issue #45827

@jilu-ot
Copy link
Contributor Author

jilu-ot commented Jun 24, 2022

Regarding issue #45827. This is not the same problem as I see it.

Issue #45827 is about preventing the same identity from being added to the controller's resolve list, and it seems like the IRK is not being considered in that issue.

This issue is about a new IRK from the same device (device 1 in the example) that is NOT stored in the controller's resolve list because the the controller is only looking at bit 0 of the address type and the address bytes.

This makes the controller unable to resolve random addresses from the device that changed its IRK.

@Thalley
Copy link
Collaborator

Thalley commented Jun 24, 2022

Agreed, it seems to be a separate issue (although somewhat related).

In any case, the other issue is still present after the revert of the PR :)

As @alwa-nordic also pointed out, it seems like this probably needs to be fixed in the host

@jilu-ot
Copy link
Contributor Author

jilu-ot commented Jun 24, 2022

Yes. This issue should also be fixed in the host as the controller is allowed by the spec. to return invalid parameter, even if the IRKs are not a match.

I have the opportunity to work on this, so I would like to make PR for this.

@jilu-ot
Copy link
Contributor Author

jilu-ot commented Jun 28, 2022

Could one of you assign me to this issue? I don't think I can assign myself. Maybe you can @alwa-nordic ?

@fabiobaltieri fabiobaltieri added the priority: medium Medium impact/importance bug label Jun 28, 2022
@jilu-ot
Copy link
Contributor Author

jilu-ot commented Jun 29, 2022

I ran into some problems with MIC errors when deleting the key before adding it to make sure the new IRK is stored. I am currently working on finding out why that is happening. I have so far discovered that it (in my system) is because it ends up using the LTK from the previous bonding.

Do any of you have an idea as to why this is happening?

@Thalley
Copy link
Collaborator

Thalley commented Jun 29, 2022

I ran into some problems with MIC errors when deleting the key before adding it to make sure the new IRK is stored. I am currently working on finding out why that is happening. I have so far discovered that it (in my system) is because it ends up using the LTK from the previous bonding.

Do any of you have an idea as to why this is happening?

Without seeing the changes you've made, it'll be difficult to provide input :) I guess it's easier to get input if you create a draft PR

@carlescufi
Copy link
Member

@jilu-ot is main affected as well, or just v2.2.x?

@carlescufi
Copy link
Member

I ran into some problems with MIC errors when deleting the key before adding it to make sure the new IRK is stored. I am currently working on finding out why that is happening. I have so far discovered that it (in my system) is because it ends up using the LTK from the previous bonding.

Do any of you have an idea as to why this is happening?

So you are implementing the solution proposed by @alwa-nordic in here?

@Thalley
Copy link
Collaborator

Thalley commented Jun 29, 2022

@jilu-ot is main affected as well, or just v2.2.x?

#46798 (comment)

@jilu-ot
Copy link
Contributor Author

jilu-ot commented Jun 29, 2022

The IRK issue should be in main as well. I will make a draft PR, with the IRK fix.

@alwa-nordic
Copy link
Collaborator

alwa-nordic commented Jun 29, 2022

I'll copy my post on Discord here for reference:

Relevant issues:
#46798
#45827

Two bonds conflict if either (or both)

  • the LE-Addresses are equal
  • the IRKs are equal

HCI_LE_Add_Device_To_Resolving_List will fail with Invalid HCI Command Parameters (0x12) when there is a conflicting bond already on the list.

I have ideas for two solutions. We might want to implement both in separate PRs.

  • For correctness, the host should keep the Controller's Resolve List (RL) in sync with the Host's bond management. At logical level, as part of a bonding, if there is a conflicting bond, there should be an event for a bond removal of the old bond and an event for the addition of the new bond. Both the removal and the addition should be propagated to the Controllers RL at some point.
    • Note that the Controllers RL may be too small to fit all bonds, and this should also be somehow handeled.
  • bt_id_add can react to Invalid HCI Command Parameters (0x12) and either..
    • try to remove the entry with HCI_LE_Remove_Device_From_Resolving_List.
      • This will not work if the error was due to duplicate IRK. This can happen if a device changes address but not IRK. This will happen in Zephyr if the controller provides an IR but no address from HCI Read_Static_Addresses.
      • It will work for equal addresses, but it indicates a bug in the host if the host is supposed to keep the Controller's Resolve List (RL) in sync with the Host's bond management.
    • perform a HCI_LE_Clear_Resolving_List and re-add all bonds.
      • This is a good way to recover from a situation where the Controller's Resolve List (RL) and the Host's bond management somehow went out of sync.

jilu-ot added a commit to jilu-ot/zephyr that referenced this issue Jul 14, 2022
Fixes zephyrproject-rtos#46798

Delete the new key in the controller's resolve list,
if it exists, and then save the key. This will ensure
that a a potentially new IRK is save upon rebonding.

The IRK fix will cause a problem in smp_ident_addr_info
because it will copy the identity address into the key
pool entry with the re-bonding device's random address
and thereby creating a second entry with the same
identity address in the key pool. This will make
bt_smp_request_ltk use bt_keys_find to get the LTK for
the re-bonding device, but it will return the LTK for
the previous bonding which is still in the key pool and
is stored before the new entry. This is fixed by creating
a new function in keys.c that will clear all keys with
the re-bonding device's identity address, except for the
keys entry referred to in the parameter req in
smp_ident_addr_info.

Signed-off-by: Jim Benjamin Luther <[email protected]>
@alwa-nordic
Copy link
Collaborator

@jilu-ot You closed your PR with comment #47043 (comment), saying the problem is already solved by #33266. Can this issue be closed? Or are we missing a back port to a still-supported Zephyr version?

@jilu-ot
Copy link
Contributor Author

jilu-ot commented Jul 18, 2022

Oh, I am not sure if there are missing backports. I can see on PR #33266 's page that some older versions also have this fix. Zephyr 1.14, 2.4 and 2.5. But I don't know if there are other versions that should have this fix.

@alwa-nordic
Copy link
Collaborator

According to https://github.com/zephyrproject-rtos/zephyr/blob/ae0297e8de1f94b050de0afecc9e2150006e3ec6/.github/SECURITY.md, only Zephyr v2.7 and v3.0 are currently supported. Those tags contain the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants