-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow mixing latched and unlatched publishers. #1991
Conversation
Just commented in #1966 (comment): Allowing mixed latched and volatile (non-latching) publishers in a single node is error-prone. The singleton publication is either reporting itself as latching or not in the TCPROS connection header. Normal subscribers do not consider this flag and mixed publishers indeed behave as intended. But Overall, I agree that removing the check is a viable solution for |
@meyerj Thanks for the in-depth explanation. Given these situations:
My understanding is that prior to #1544, case 1 would would work initially but not play back from bag properly, and case 2 would not even work initially (you'd get whichever was the most recent message, rather than one per latched publisher). With #1544, case 1 is now able to be played back from bag—even historical bags, curiously, since the limitation was always just in rosbag's ability to create multiple latched publishers at playback time. Case 2 now works initially, but still can't be accurately bagged since the TCPROS protocol only sees processes, not individual publishers within them. And since rosbag playback of case 1 is itself case 2, rebagging that will also mess it up (I believe Anyway, I think all that makes sense, but this is the bit that I'm hazy on:
As you say, the only/main case where the flag matters is for special subscribers like This obviously leaves a race where a subscriber may see the topic as unlatched because the latched publisher in that same process hasn't been instantiated yet. But if I'm correct about the old behaviour then that at least wouldn't be a regression. |
@mikepurvis Yes, that's a concise summary of the situation, with the following correction:
The first publisher created in a roscpp process on a given topic decides whether the publication is advertised as latching or not in the connection header: From topic_manager.cpp: bool TopicManager::advertise(const AdvertiseOptions& ops, const SubscriberCallbacksPtr& callbacks)
{
// [...]
pub = lookupPublicationWithoutLock(ops.topic);
// [...]
if (pub)
{
// [...]
- if (pub->isLatched() != ops.latch)
- {
- ROS_ERROR("Tried to advertise on topic [%s] with latch=%d but the topic is already advertised with latch=%d",
- ops.topic.c_str(), ops.latch, pub->isLatched());
- return false;
- }
pub->addCallbacks(callbacks);
return true;
}
pub = PublicationPtr(boost::make_shared<Publication>(ops.topic, ops.datatype, ops.md5sum, ops.message_definition, ops.queue_size, ops.latch, ops.has_header));
pub->addCallbacks(callbacks);
advertised_topics_.push_back(pub);
} So if there is already a publisher on the given topic and the check is removed as suggested in this PR, then As you also suggested, maybe it would be best to remove or deprecate the The standard example for a broken situation is a node/process that has multiple
I agree. The issue cannot be fixed in a nice and sane way anyway without changing the TCPROS protocol. So if removing the check at least resolves the use case of On the other hand, I still cannot think of a good use case where mixing latching and non-latching publishers on the same topic, even in different nodes, is actually required. If that would be strictly forbidden - and maybe even enforced by the ROS master - then there would be no problem with an inconsistently reported latching state in the connection header, and |
It should also be noted that all above is only valid for roscpp. Other client libraries will behave differently. For rospy @mgrrx proposed a patch in #146 (comment) (magazino@7c6f5b0), which originally motivated #1544. I still think that this commit should be applied to rospy to achieve at least a consistent behavior of rospy and roscpp nodes having multiple latched publishers (e.g. multiple tf2_ros.StaticTransformBroadcaster). |
@meyerj Have implemented now the behaviour where the singleton is marked as latched if any of the instances are. Ideally this would have been just done in the Let me know if this approach is acceptable to you. Regarding the question of valid use-cases, I'm not sure that there are any either, and I've cleaned up the ones I'm aware of in our codebase, but we still have historical bags where a rogue node was publishing rosout as latched or whatever. We also have a situation where bags that had been processed for use with webviz included collapsing together the tf (unlatched) and tf_static (latched) topics, which is now a source of error. At the end of the day though, I just don't like the idea of an arbitrary runtime incompatibility on the level of MD5 breakage that's buried in code and has to be navigated like this, especially when it's something that bites later on in development (when playing a bag, when combining nodelets, etc). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@meyerj Have implemented now the behaviour where the singleton is marked as latched if any of the instances are. Ideally this would have been just done in the isLatched accessor, but since that accessor was implemented in the header, it would have been inlined and thus broken ABI. So instead I leave the accessor as-is and update the existing member.
Let me know if this approach is acceptable to you.
Yes, thanks for the update. It is a reasonable choice to keep the existing latch_
flag up-to-date in addCallbacks()
and removeCallbacks()
instead of removing it. This likely happens less frequently than new incoming connections from subscribers, where isLatching()
is checked for the connection header in transport_subscriber_link.cpp:114.
Strictly speaking latch_
must be protected by the callbacks_mutex_
now, because the flag is not immutable anymore after the construction of a Publication
instance. Or it needs to be of type std::atomic<bool>
, but that would break the ABI, too. I guess for now it is acceptable that isLatched()
and isLatching()
eventually report an outdated state due to cache coherency issues, because they do not reflect the full truth anyway if there are mixed latching and non-latching subscribers. But it would be possible to move the implementation to the source and to add a scoped lock of the callbacks_mutex_
without breaking the ABI. There is just no guarantee that code compiled against a previous version of the header did not inline the method and uses the implementation without the lock.
@meyerj Thanks for pointing that out; I've made the suggested change. @dirk-thomas I think this is good to go in, when you're able to look. |
Thanks for the patch! |
Because DEPEND_ABI.ros-comm.noetic?= ros-comm>=1.15 1.15.9 (2020-10-16) ------------------- * Fix deadlock when service connection is dropped (ros/ros_comm#2074) * Update maintainers (ros/ros_comm#2075) * Fix case where accessing cached parameters shuts down another node (ros/ros_comm#2068) * Fix spelling (ros/ros_comm#2066) * Fix Lost Wake Bug in ROSOutAppender (ros/ros_comm#2033) * Fix compatibility issue with boost 1.73 and above (ros/ros_comm#2023) * Gracefully stop recording upon SIGTERM and SIGINT (ros/ros_comm#2038) * Use heapq.merge instead of custom merge sort code (ros/ros_comm#2017) * Fix handling of single quotes in command arguments on Windows (ros/ros_comm#2051) * Clearer error message (ros/ros_comm#2035) * Ignore underscores when parsing literal numeric values for Python 3 compatibility (ros/ros_comm#2022) * Clear cached URI for a node that has gone offline (ros/ros_comm#2010) * Add skip_cache parameter to rosnode_ping() (ros/ros_comm#2009) * Install advertisetest (ros/ros_comm#2046) * Use range instead of xrange for Python 3 compatibility (ros/ros_comm#2013) * Fix to address CVE-2020-16124 (ros/ros_comm#2065) * Fix XmlRpcValue::_doubleFormat being unused (ros/ros_comm#2003) 1.15.8 (2020-07-23) ------------------- * change is_async_connected to use epoll when available (ros/ros_comm#1983) * allow mixing latched and unlatched publishers (ros/ros_comm#1991) * remove not existing NodeProxy from rospy __all_\_ (ros/ros_comm#2007) * fix typo in topics.py (ros/ros_comm#1977) * fix bad relative import (still Python 2 style) (ros/ros_comm#1973) * improve shutdown message with duplicate node name (ros/ros_comm#1992) * remove dependency on rostopic from rostest package (ros/ros_comm#2002) * fix missing reload() function in Python 3 (ros/ros_comm#1968) * add latch param to throttle (ros/ros_comm#1944) * add const versions of XmlRpcValue converting operators (ros/ros_comm#1978) 1.15.7 (2020-05-28) ------------------- * fix Windows build break (ros/ros_comm#1961) * fix NameError in launch error handling (ros/ros_comm#1965) 1.15.6 (2020-05-21) ------------------- * fix a bug that using a destroyed connection object (ros/ros_comm#1950) 1.15.5 (2020-05-15) ------------------- * check if async socket connect is success or failure before TransportTCP::read() and TransportTCP::write() (ros/ros_comm#1954) * fix bug that connection drop signal related funtion throw a bad_weak exception (ros/ros_comm#1940) * multiple latched publishers per process on the same topic (ros/ros_comm#1544) * fix negative numbers in ros statistics (ros/ros_comm#1531) * remove extra \n in ROS_DEBUG (ros/ros_comm#1925) * add option to repeat latched messages at the start of bag splits (ros/ros_comm#1850) * fix bag migration failures caused by typo in connection_header assignment (ros/ros_comm#1952) * fix brief description comments after members (ros/ros_comm#1920) * add --sigint-timeout and --sigterm-timeout parameters (ros/ros_comm#1937) * roslaunch-check: search dir recursively (ros/ros_comm#1914) * sort printed nodes by namespace alphabetically (ros/ros_comm#1934) * remove pycrypto import (not used) (ros/ros_comm#1922) * avoid infinite recursion in rosrun tab completion when rosbash is not installed (ros/ros_comm#1948) * fix bare pointer in topic_tools::ShapeShifter (ros/ros_comm#1722) * clear message queue on simtime jumping back (ros/ros_comm#1518) * use undefined dynamic_lookup on macOS (ros/ros_comm#1923) * check if enough FDs are free, instead counting the total free FDs (ros/ros_comm#1929) 1.15.4 (2020-03-19) ------------------- * restrict boost dependencies to components used (ros/ros_comm#1871) * add exception for ConnectionAbortedError (ros/ros_comm#1908) * fix mac trying to use epoll instead of kqueue (ros/ros_comm#1907) * fix AttributeError: __exit__ (ros/ros_comm#1915, regression from 1.14.4) 1.15.3 (2020-02-28) ------------------- * remove Boost version check since Noetic only targets platforms with 1.67+ (ros/ros_comm#1903) 1.15.2 (2020-02-25) ------------------- * export missing Boost dependency (ros/ros_comm#1898) * add timestamp formatting for rosconsole (ros/ros_comm#1892) 1.15.1 (2020-02-24) ------------------- * fix missing boost dependencies (ros/ros_comm#1895) * use setuptools instead of distutils (ros/ros_comm#1870) * increase time limit of advertisetest/publishtest.test to reduce flakyness (ros/ros_comm#1897) 1.15.0 (2020-02-21) ------------------- * fix dictionary changed size during iteration (ros/ros_comm#1894) * update test to pass with old and new yaml (ros/ros_comm#1893) Packaging changes: - removed patch-an, as there are no more boost version checks - updated patch-ao
This is a naive change (so far) that just removes the check against mixing latched and unlatched publishers and updates the test added in #1544 so that it covers this scenario (it still passes). As far as I can tell, this should be totally safe and in line with the design intent of the original change:
last_message_
pointer, rather than this being at the TopicManager/singleton level.Publication::peerConnect
method iterates its list of callbacks, which will call thepush_latched_message_
std::function for any that have it defined.Would love to hear from @meyerj if there's something lurking here that I've missed.
Fixes #1966.