Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add connection status to the connection info #28009

Closed
hanselfberg opened this issue Sep 3, 2020 · 41 comments · Fixed by #42540
Closed

Add connection status to the connection info #28009

hanselfberg opened this issue Sep 3, 2020 · 41 comments · Fixed by #42540
Assignees
Labels
area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Bluetooth area: Documentation Enhancement Changes/Updates/Additions to existing features

Comments

@hanselfberg
Copy link

Since the bt_conn_get_info() function can be called successfully even when not connected, the returned info can be confusing.

Adding connection status (i.e. the connection status at the time bt_conn_get_info() was called) would greatly improve understanding the connection info. Also some clarification in documentation particularly for the case when connection status is "not connected" could be useful.

@hanselfberg hanselfberg added the Enhancement Changes/Updates/Additions to existing features label Sep 3, 2020
@carlescufi
Copy link
Member

carlescufi commented Oct 1, 2020

Let's document the fact that connection objects are usable before and after the connection is alive. And that application callbacks should be used to track connection state.

@hanselfberg
Copy link
Author

Let's document the fact that connection objects are usable before and after the connection is alive. And that application callbacks should be used to track connection state.

I think it will be impossible/difficult for the users of the connection info to take on that responsibility due to the "asynchronous-ness", won't it? Because you can have got connected/disconnected without knowing it at the time when you get the info, i think.

@Thalley
Copy link
Collaborator

Thalley commented Jan 24, 2022

@joerchan You closed the PR that supposedly fixed this enhancement. Why was that closed, and should we fix or close this enhancement?

@joerchan
Copy link
Contributor

@Thalley I only wanted to document the behavior, as we agreed above. I closed the PR because I wasn't going to address the review comments. I'll leave it up to you what you think is the best way forwards with this issue.

@Thalley
Copy link
Collaborator

Thalley commented Jan 25, 2022

Been thinking about this.

Documenting which values are valid in which states doesn't really help much, since we don't expose the state. The BT_CONN_CONNECTED and BT_CONN_DISCONNECTED states are exposed via callbacks, but that's about it, and the application may want to request this information at any state.

What I suggest is that we expose bt_conn_state_t and then thoroughly document which values are valid in which states, but I am not a fan of that as that is error prone and hard to keep up to date.

The alternative is that we simply close this request, as we haven't seen or heard any more about it besides this one, unless @hanselfberg still wants this to go in.

@hanselfberg
Copy link
Author

@Thalley Your suggestion is in line with my initial suggestion and i still think that would be a good improvement to include the status. Of course the status can at any time be outdated, but at least it will be in sync with the received info and should be better in sync than having a local connection status variable updated by the application callbacks. So if the reason to not include the connection status is that the status is available by the application callback, I still disagree with that reason. If there are other reason, it would be good to know about them i think.

@Thalley
Copy link
Collaborator

Thalley commented Jan 25, 2022

@hanselfberg One issue with exposing the state, is that the state isn't as simple as the application may think. It isn't just disconnected -> Connecting -> Connected, but rather a more complex enum:

typedef enum __packed {
	BT_CONN_DISCONNECTED,
	BT_CONN_DISCONNECT_COMPLETE,
	BT_CONN_CONNECT_SCAN,
	BT_CONN_CONNECT_AUTO,
	BT_CONN_CONNECT_ADV,
	BT_CONN_CONNECT_DIR_ADV,
	BT_CONN_CONNECT,
	BT_CONN_CONNECTED,
	BT_CONN_DISCONNECT,
} bt_conn_state_t;

and I don't feel like exposing these values will be that helpful to the application.
We could cut it down to e.g. a boolean connected value, but there are values that are not valid in the connecting state, so even what won't provide what you are really looking for, unless we introduce yet another enum with e.g. disconnected, connecting and connected states. Even after being connected, some of the value may change during the phy update or data length update.

Can you give me a use case or a scenario where the current implementation doesn't provide what you want/need, and why the connection state will be helpful?

@joerchan
Copy link
Contributor

From what I recall I discussed exposing the state with @jhedberg who thought this was a bad idea, which is why we instead wanted to document it.

@Thalley
Copy link
Collaborator

Thalley commented Jan 25, 2022

From what I recall I discussed exposing the state with @jhedberg who thought this was a bad idea, which is why we instead wanted to document it.

I would also like to avoid that. We could however add a simplified public state enum, maybe similar to the ISO state enum:

enum {
	/** Channel disconnected */
	BT_ISO_DISCONNECTED,
	/** Channel in connecting state */
	BT_ISO_CONNECT,
	/** Channel ready for upper layer traffic on it */
	BT_ISO_CONNECTED,
	/** Channel in disconnecting state */
	BT_ISO_DISCONNECT,
};

But let's see what we really need, and if we can avoid exposing the state altogether.

@jhedberg
Copy link
Member

So, general comment is that we should only expose stuff for which we have a well justified use case.

As for what @joerchan is referring to wrt something being a "bad idea", I believe that would be a public get_state() function, since it's inherently racy (the state may change between getting it and doing something with it).

@hanselfberg
Copy link
Author

hanselfberg commented Jan 26, 2022

@Thalley I found that this status would have been useful while I was working on a test stressing the system and got values that I didn't expect, and I think that would not have been an issue for me if I knew the status at the time the parameters were sent by the function. I think there should not be any race conditions (at least no complicated ones) since the purpose of including the state is just to know what the state was at the time the function was called (i.e. I'd think the worst case would be that the return parameters would have to be prepared again in case the status changed while preparing them). I don't think the simplified enum is necessary, but I think it should be enough to solve the issue and probably easy to make as well (i.e. either of the existing status and a simplified status should be ok imho). Just possibly there should be an unknown/unspecified status in addition for the simplified status?

@jhedberg I didn't intend this as get_status() functionality, so probably it should be mentioned that it is the status at the time when the parameters were provided by the function and may shortly after be different.

I think the best would be if the status could be added to the info, and second best would be simplified status added. An additional possibility would be to only provide info if in the connected state. I think that would make sense from the function name point of view ( bt_conn_get_info() ) but from the suggestion @Thalley had about the simplified state it looks like the function is meant for use when not in a connection as well?

@Thalley
Copy link
Collaborator

Thalley commented Jan 26, 2022

the function is meant for use when not in a connection as well?

It certainly can be. There's is currently one use of it in a disconencted callback. The bt_conn's are statically declared, so the pointers are always usable.

@hanselfberg I'm sorry, but I fail to see how a connection state would help the case you mention. Can you elaborate on

got values that I didn't expect

@hanselfberg
Copy link
Author

@Thalley It's almost one and a half years back now, but if i recall correctly it was the phy info that was unexpected and I concluded that I'd probably need the connection state to be able to know what to expect.

@Thalley
Copy link
Collaborator

Thalley commented Jan 26, 2022

@Thalley It's almost one and a half years back now, but if i recall correctly it was the phy info that was unexpected and I concluded that I'd probably need the connection state to be able to know what to expect.

Hmm, the PHY can change during connection IIRC, so the connection state wouldn't actually provide any additional information there. One of the first things the Zephyr stack does after connection established is to update the PHY. Did you by any chance use the le_phy_updated callback to update your PHY information?

@hanselfberg
Copy link
Author

@Thalley I think so, it looks like I went for that approach instead (from what I can see revisiting the test now).

@Thalley
Copy link
Collaborator

Thalley commented Jan 27, 2022

So that issue would not be fixed by including the connection status in this.

Did you experience other issues where the connection status would be helpful?

@hanselfberg
Copy link
Author

@Thalley No, I don't have a concrete case for it. I think it seems likely to be out of sync sometimes unless the function is only called from the connection-related callbacks, but that should be ok I think. However, a flag for the CONNECTED state would probably be quite useful to avoid caching connection info about active connection(s) in the application (i.e. just call the connection info function instead).

@jhedberg
Copy link
Member

However, a flag for the CONNECTED state would probably be quite useful to avoid caching connection info about active connection(s) in the application (i.e. just call the connection info function instead).

Wouldn't the connected() and disconnected() callbacks as part of the bt_conn_cb struct suffice for that?

@hanselfberg
Copy link
Author

@jhedberg I think it's possible to use those connection callbacks and maintain a local cache for the values, but the flag would make it easier and better since no cache is needed and I can just call the connection info function from a thread to get information about the connection and as well know that it is actually connected.

@Thalley
Copy link
Collaborator

Thalley commented Jan 27, 2022

@jhedberg I think it's possible to use those connection callbacks and maintain a local cache for the values, but the flag would make it easier and better since no cache is needed and I can just call the connection info function from a thread to get information about the connection and as well know that it is actually connected.

Not really sure why you mention caching here, unless you mean caching of the connect state of the bt_conn, which arguably the application should in most cases know what state it is in at all times anyways. Basically whenever you need some information, you should call the get_info function. Regardless of connection state, caching the returned value may give you outdated values later on as several of the fields can change during a connection.

@hanselfberg
Copy link
Author

@Thalley Should I assume the info returned by the function is consistent, so just updating a local connected flag in the connected and disconnected callbacks and call the get info function from a priority lower than the callbacks would be enough to have the info and also know that the connection is connected (i.e. no point using the callbacks (connected, disconnected, le_param_updated, le_param_req, security_changed, identity_resolved and le_phy_updated) to keep a local cache up to date)?

@Thalley
Copy link
Collaborator

Thalley commented Jan 27, 2022

@Thalley Should I assume the info returned by the function is consistent, so just updating a local connected flag in the connected and disconnected callbacks and call the get info function from a priority lower than the callbacks would be enough to have the info and also know that the connection is connected (i.e. no point using the callbacks (connected, disconnected, le_param_updated, le_param_req, security_changed, identity_resolved and le_phy_updated) to keep a local cache up to date)?

No, quite the opposite. You cannot rely on (at least quite a few of the the fields) to be consistent, as they made change over time.
If you want to keep your cached data up to date, you'll either need to update it manually in the callbacks, or call get_info in the callbacks to have the host stack update the struct for you.

call the get info function from a priority lower than the callbacks

I don't see any reason why not to call the get_info in the callback itself, rather than doing it in a separate thread/function.

@hanselfberg
Copy link
Author

@Thalley I think your answer seems to indicate that the get_info function is meant to be called only from the callbacks. Is that how it's meant to be? If so, I think there is no issue except that I think this is important to mention specifically in the documentation for the function.

In addition I think that if this function is meant to be called only from the connection callbacks, it would actually be a good idea to deprecate the function to eventually remove it and instead provide the connection info as a parameter in the callback. This with the intention to simplify and avoid calls to it when it is in an inconsistent state.

@Thalley
Copy link
Collaborator

Thalley commented Jan 28, 2022

@Thalley I think your answer seems to indicate that the get_info function is meant to be called only from the callbacks. Is that how it's meant to be? If so, I think there is no issue except that I think this is important to mention specifically in the documentation for the function.

It is not meant to be called only from callbacks. My point is that it can be called in any context, and there's no reason not to call it from the callbacks, if you need the data there.

My other point is that is isn't a good idea to cache this in the application, rather than calling it whenever you need any current specific information (e.g. the PHY, the connection interval, peer address, etc.), and caching the result can lead to it being out of date.

@hanselfberg
Copy link
Author

@Thalley Ok. Then it should be fine doing what I wrote before ("updating a local connected flag in the connected and disconnected callbacks and call the get info function from a priority lower than the callbacks would be enough to have the info and also know that the connection is connected")? I.e. there is no issue where I'll get the info with partly old values and the other part with new values due to for instance connection parameter update happened somewhere during the get_info call? It's for this case I thought I might have to use all the connection callbacks to maintain a local copy up to date. Values should then not be out of date provided that there are callbacks for all updates to the connection info, is what I would expect.

@Thalley
Copy link
Collaborator

Thalley commented Jan 28, 2022

@Thalley Ok. Then it should be fine doing what I wrote before ("updating a local connected flag in the connected and disconnected callbacks and call the get info function from a priority lower than the callbacks would be enough to have the info and also know that the connection is connected")? I.e. there is no issue where I'll get the info with partly old values and the other part with new values due to for instance connection parameter update happened somewhere during the get_info call? It's for this case I thought I might have to use all the connection callbacks to maintain a local copy up to date. Values should then not be out of date provided that there are callbacks for all updates to the connection info, is what I would expect.

If you call get_info in the callback, then you can (probably) be sure that the data won't change until you get the result. This of course comes from the fact that you are in the RX thread, and the RX thread can thus not change anything while in that function.

If you call get_info in another thread with lower priority, then yes, the values may change (although somewhat unlikely). But this is the case regardless of whether we add connection state in the struct or not.

We could consider looking into adding a mutex for the function to avoid any race conditions, but I reckon that if we go down that road, there's many functions that would have similar race conditions.

@hanselfberg
Copy link
Author

@Thalley I think I'll definitely try to remember to avoid calling the function other than in the connection callbacks next time, so I think I should have enough "state awareness" as well by doing it that way.

It sounds like a good idea to look into similar race conditions, i think. It will of course not be good if the app works on the wrong assumptions due to wrong values from the controller.

@Thalley
Copy link
Collaborator

Thalley commented Jan 28, 2022

@Thalley I think I'll definitely try to remember to avoid calling the function other than in the connection callbacks next time, so I think I should have enough "state awareness" as well by doing it that way.

It sounds like a good idea to look into similar race conditions, i think. It will of course not be good if the app works on the wrong assumptions due to wrong values from the controller.

Do you think this issue can be considered closed then?
It seems like there are indeed some... Challenges... But they are not fixable by adding the connection state the function or struct.

@hanselfberg
Copy link
Author

@Thalley Since you say get_info is intended to be called not just from the connection callbacks, I'd say the state should be included as well, but now knowing that the data can be corrupted (even though the risk might be low) makes me change my intention to only call it from the connection callbacks from now on. So since doing it this way also implicitly gives me the state information as well, I can let it be up to you guys to decide. :-)

@Thalley
Copy link
Collaborator

Thalley commented Jan 31, 2022

@Thalley Since you say get_info is intended to be called not just from the connection callbacks, I'd say the state should be included as well, but now knowing that the data can be corrupted (even though the risk might be low) makes me change my intention to only call it from the connection callbacks from now on. So since doing it this way also implicitly gives me the state information as well, I can let it be up to you guys to decide. :-)

I would argue that the state doesn't help, as the state is provided via the callbacks in any case, and there are more intermediary states that we don't want to add to the public API, so we would also need to add a new (simpler) state enum for this specific function.

The larger issue at hand, i.e. the race conditions, is outside the scope of this PR, as that may happen in many scenarios I'm afraid, and we need to fix that, it needs to be fixed properly and not just for a single function.

In my opinion, adding the state here doesn't provide anything new, although it would be a way for the application to not keep track of the state itself. @jhedberg I'll let you decide whether to keep this open, or close it then.

@hanselfberg
Copy link
Author

I would argue that the state doesn't help, as the state is provided via the callbacks in any case

But that is only if it's called from a lower priority thread.

@Thalley
Copy link
Collaborator

Thalley commented Feb 1, 2022

I would argue that the state doesn't help, as the state is provided via the callbacks in any case

But that is only if it's called from a lower priority thread.

Not completely sure I follow, can you elaborate?

@hanselfberg
Copy link
Author

Not completely sure I follow, can you elaborate?

I mean if you call the info_get function from a priority that blocks your callback when for instance the peer disconnects the link, then the state in the local state variable will say connected and the actual state is disconnected.

@Thalley
Copy link
Collaborator

Thalley commented Feb 1, 2022

Not completely sure I follow, can you elaborate?

I mean if you call the info_get function from a priority that blocks your callback when for instance the peer disconnects the link, then the state in the local state variable will say connected and the actual state is disconnected.

At the time info_get is called, the state is connected, and you'd get the right values, so I don't see the issue here.

If you have blocked the RX thread (that calls the callback) for a longer time before calling info_get then you got much worse issues than out-of-data values I reckon :D

@hanselfberg
Copy link
Author

If you have blocked the RX thread (that calls the callback) for a longer time before calling info_get then you got much worse issues than out-of-data values I reckon :D

Ok. but it doesn't have to be very long time of blocking the RX thread. Just that you blocked it slightly before the state changed to disconnected. Maybe I have misunderstood something here, but I would definitely have thought that for this case the internal state variable and the stat variable in the app would not be in sync.

@Thalley
Copy link
Collaborator

Thalley commented Feb 1, 2022

If you have blocked the RX thread (that calls the callback) for a longer time before calling info_get then you got much worse issues than out-of-data values I reckon :D

Ok. but it doesn't have to be very long time of blocking the RX thread. Just that you blocked it slightly before the state changed to disconnected. Maybe I have misunderstood something here, but I would definitely have thought that for this case the internal state variable and the stat variable in the app would not be in sync.

True, there is a very small time between e.g. the disconnect event in the host, and until the callback is called, but that will always exist regardless of whether the connection state exists in this struct, as it is not an atomic operation.

@hanselfberg
Copy link
Author

True, there is a very small time between e.g. the disconnect event in the host, and until the callback is called, but that will always exist regardless of whether the connection state exists in this struct, as it is not an atomic operation.

I see what you mean, I think. Possibly how good it gets depends a bit on how it's done as well. I think a "dirty flag" could work to make it kind of atomic, so that "undefined state" can be returned while updating the variables (including the state).

@Thalley
Copy link
Collaborator

Thalley commented Feb 1, 2022

I see what you mean, I think. Possibly how good it gets depends a bit on how it's done as well. I think a "dirty flag" could work to make it kind of atomic, so that "undefined state" can be returned while updating the variables (including the state).

If we want it to be atomic, we should implement that properly. One such way is to add a mutex as mentioned in #28009 (comment), but that opens up a whole can of worms, as this issue exists throughout the BT subsystem.

@hanselfberg
Copy link
Author

If we want it to be atomic, we should implement that properly. One such way is to add a mutex as mentioned in #28009 (comment), but that opens up a whole can of worms, as this issue exists throughout the BT subsystem.

If you think it's not a good idea to make a simple fix to get a similar state behavior as I suggested, I think the best temporary solution (until the proper fix is implemented) would be to add a comment to the function interface telling that it should only be called from the connection callbacks to avoid inconsistent/corrupted information. I think that kind of comment would be very helpful, and also implementing with that limitation in mind does not require any change when the proper fix comes. Also I think calling the function from the connection callbacks means a state return parameter is not necessary. It would be nice to have for other functions as well where you have identified the same problem.

@Thalley
Copy link
Collaborator

Thalley commented Feb 2, 2022

If you think it's not a good idea to make a simple fix to get a similar state behavior as I suggested,

Adding the state does not solve anything regarding the race condition.

I think the best temporary solution (until the proper fix is implemented) would be to add a comment to the function interface telling that it should only be called from the connection callbacks to avoid inconsistent/corrupted information

I am not opposed to that, but that comment needs to apply to almost every function in the BT subsystem as you are reading or writing to values that may be changed by higher priority threads. As I've mentioned, the issue you are referring to is not about this single function, but rather the entire BT subsystem. We can add the comment for just this one function, but that seems like patching a very small thing in a large issue.

I think calling the function from the connection callbacks means a state return parameter is not necessary

Arguably the return value doesn't provide much in any case, as the only time it will not return 0 is if the conn pointer is invalid (i.e. not allocated/initialized, or a pointer to a different struct, or something like that).

@jhedberg Do you have any input here? Correct me if I am wrong, but aside from the use of atomic_* function for setting/testing flags, we don't really have a thread secure BT subsystem, right?

@jhedberg
Copy link
Member

jhedberg commented Feb 2, 2022

Correct me if I am wrong, but aside from the use of atomic_* function for setting/testing flags, we don't really have a thread secure BT subsystem, right?

That's correct. We use those since similar non-atomic helpers for bitmaps don't exist. We'll need to add locking for a bunch of things if/when the stack is made safe for pre-emptive multithreading (as opposed to cooperative multithreading, which it now relies on and gets away with a lot of locking because of it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Bluetooth area: Documentation Enhancement Changes/Updates/Additions to existing features
Projects
None yet
6 participants