Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed crash when raycasting while map is updated. #1793

Conversation

StefanFabian
Copy link
Contributor

When using the scene manager raycast in one of our tool rviz crashed when the tool touched the map area while a new map was received.
Apparently, the connection is not queued (anymore?) and therefore the update does not happen on the main thread.
This small change should fix this and is fully ABI compatible.

Fixes #1792

Possibly due to map not being updated on main thread when not explicitly using QueuedConnection.
Copy link
Contributor

@rhaschke rhaschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet convinced that the described bug is related to a threading issue.
The topic subscriber uses update_nh_:

map_sub_ = update_nh_.subscribe(topic_property_->getTopicStd(), 1, &MapDisplay::incomingMap,

which already should ensure that ROS callbacks of that subscriber will be handled in the main GUI thread:

ros::spinOnce();

Can you confirm (via a backtrace) that the crash doesn't occur in the main thread?

@rhaschke
Copy link
Contributor

rhaschke commented May 4, 2023

Just to clarify: I trust you stating that this PR reliably fixes your issue, but I don't yet understand how/why.
Maybe, using a queued connection, the update of map data structures is postponed after the current VisualizationManager::onUpdate() call, such that your tool can either still access the old data structures or will access the new ones in the next cycle? In any case, it would be helpful to see the actual backtrace. Thanks.

@simonschmeisser
Copy link
Contributor

Do you use standalone rviz or some tool build upon it?

@StefanFabian
Copy link
Contributor Author

StefanFabian commented May 8, 2023

I've used a blank default rviz config on my desktop where I don't have the fix. Just added a map display and our waypoint tool which uses the SelectionManager to get a raycasted 3D point from the mouse position in the processMouseEvent method and uses that 3D point to show a mesh at the location of the raycast.

Here's the backtrace:

Thread 1 "rviz" received signal SIGSEGV, Segmentation fault.
0x00000000400b0bf0 in ?? ()
(gdb) bt
#0  0x00000000400b0bf0 in  ()
#1  0x00007fffca651bf9 in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#2  0x00007fffca660985 in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#3  0x00007fffca2b93ce in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#4  0x00007fffc871ba7b in Ogre::GLRenderSystem::_render(Ogre::RenderOperation const&) ()
    at /usr/lib/x86_64-linux-gnu/OGRE-1.9.0/RenderSystem_GL.so.1.9.0
#5  0x00007ffff655a16e in Ogre::SceneManager::renderSingleObject(Ogre::Renderable*, Ogre::Pass const*, bool, bool, Ogre::HashedVector<Ogre::Light*> const*) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#6  0x00007ffff650a679 in Ogre::QueuedRenderableCollection::acceptVisitorGrouped(Ogre::QueuedRenderableVisitor*) const ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#7  0x00007ffff650a78c in Ogre::QueuedRenderableCollection::acceptVisitor(Ogre::QueuedRenderableVisitor*, Ogre::QueuedRenderableCollection::OrganisationMode) const () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#8  0x00007ffff654c93a in Ogre::SceneManager::renderBasicQueueGroupObjects(Ogre::RenderQueueGroup*, Ogre::QueuedRenderableCollection::OrganisationMode) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#9  0x00007ffff654c7d7 in Ogre::SceneManager::renderVisibleObjectsDefaultSequence() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#10 0x00007ffff655e4a7 in Ogre::SceneManager::_renderScene(Ogre::Camera*, Ogre::Viewport*, bool) ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#11 0x00007ffff6398c3a in Ogre::Camera::_renderScene(Ogre::Viewport*, bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#12 0x00007ffff6524805 in Ogre::RenderTarget::_updateViewport(Ogre::Viewport*, bool) ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#13 0x00007ffff6524612 in Ogre::RenderTarget::_updateAutoUpdatedViewports(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#14 0x00007ffff6524363 in Ogre::RenderTarget::updateImpl() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#15 0x00007ffff6524d18 in Ogre::RenderTarget::update(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#16 0x00007ffff650e90f in Ogre::RenderSystem::_updateAllRenderTargets(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#17 0x00007ffff654551e in Ogre::Root::_updateAllRenderTargets() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#18 0x00007ffff6545610 in Ogre::Root::renderOneFrame() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#19 0x00007ffff7f3e120 in rviz::VisualizationManager::onUpdate() () at /opt/ros/noetic/lib/librviz.so
#20 0x00007ffff73d41d0 in QMetaObject::activate(QObject*, int, int, void**) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#21 0x00007ffff73e13ee in QTimer::timeout(QTimer::QPrivateSignal) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#22 0x00007ffff73d4bc5 in QObject::event(QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#23 0x00007ffff77d5a66 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#24 0x00007ffff77df0f0 in QApplication::notify(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#25 0x00007ffff73a880a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#26 0x00007ffff73ff780 in QTimerInfoList::activateTimers() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#27 0x00007ffff740006c in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#28 0x00007ffff45f817d in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#29 0x00007ffff45f8400 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#30 0x00007ffff45f84a3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#31 0x00007ffff7400435 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) ()
    at /lib/x86_64-linux-gnu/libQt5Core.so.5
#32 0x00007ffff73a73ab in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#33 0x00007ffff73af116 in QCoreApplication::exec() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#34 0x00005555555585e2 in main ()

Copy link
Contributor

@rhaschke rhaschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the backtrace. This proves my assumption that everything is handled within the same (main) thread.

I'm still hesitating to merge the PR as is, because I still don't understand why the crash only occurs with your tool. Obviously, normal rendering of an immediately updated map works w/o issues. So, why does it fail when your tools is called in between?
How does your tool modify the internal data structures to cause the crash?
Could you please provide your tool's source for further analysis with address sanitizer tools?

This PR essentially introduces a one-frame delay between receiving map updates and displaying them. If you have a very slowly progressing ROS time (e.g. from sim) this should become noticeable.

src/rviz/default_plugin/map_display.cpp Show resolved Hide resolved
@StefanFabian
Copy link
Contributor Author

StefanFabian commented May 8, 2023

I can upload the whole code but the relevant portion should be this:

void WaypointTool::onInitialize()
{
  rviz::InteractionTool::onInitialize();
  preview_mesh_ = rviz::loadMeshFromResource( WAYPOINT_PREVIEW_RESOURCE );
  if ( preview_mesh_.isNull())
  {
    ROS_ERROR( "Failed to load waypoint tool preview mesh." );
    return;
  }

  arrow_mesh_ = rviz::loadMeshFromResource( WAYPOINT_ARROW_RESOURCE );
  if ( arrow_mesh_.isNull())
  {
    ROS_ERROR( "Failed to load waypoint tool orientation preview mesh." );
    return;
  }

  waypoint_preview_node_ = createWaypointNode();
  waypoint_preview_node_->setVisible( false );
}

Ogre::SceneNode *WaypointTool::createWaypointNode()
{
  Ogre::SceneNode *node = scene_manager_->getRootSceneNode()->createChildSceneNode();
  Ogre::Entity *entity = scene_manager_->createEntity( preview_mesh_ );
  node->attachObject( entity );
  node->setScale( WAYPOINT_PREVIEW_SCALE, WAYPOINT_PREVIEW_SCALE, WAYPOINT_PREVIEW_SCALE );
  Ogre::SceneNode *arrow_node = node->createChildSceneNode();
  entity = scene_manager_->createEntity( arrow_mesh_ );
  arrow_node->attachObject( entity );
  arrow_node->setVisible( false );
  arrow_node->setScale( 30 * WAYPOINT_PREVIEW_SCALE, 30 * WAYPOINT_PREVIEW_SCALE, 30 * WAYPOINT_PREVIEW_SCALE );
  return node;
}

int WaypointTool::processMouseEvent( rviz::ViewportMouseEvent &event )
{
  if ( waypoint_preview_node_ == nullptr ) return rviz::InteractionTool::processMouseEvent( event );

  // Perform raycast
  waypoint_preview_node_->setVisible( false ); // Disable for raycast
  Ogre::Vector3 intersection;
  if ( !raycast3d_property_->getBool() || !context_->getSelectionManager()->get3DPoint( event.viewport, event.x, event.y, intersection ))
  {
    // Fall back to plane intersection
    Ogre::Plane ground_plane( Ogre::Vector3::UNIT_Z, 0.f );
    if ( !rviz::getPointOnPlaneFromWindowXY( event.viewport, ground_plane, event.x, event.y, intersection ))
    {
      return Render;
    }
  }

  // Show preview node at current intersection unless we are in the process of placing a waypoint
  waypoint_preview_node_->setVisible( active_waypoint_ == nullptr, false );
  waypoint_preview_node_->setPosition( intersection );
  return rviz::InteractionTool::processMouseEvent( event );
}

I've removed the branches that are definitely not being hit from the processMouseEvent for easier readability.

@rhaschke
Copy link
Contributor

rhaschke commented May 8, 2023

I modified the PlantFlagTool example of rviz_plugin_tutorials to call context_->getSelectionManager()->get3DPoint instead of getPointOnPlaneFromWindowXY and this works w/o any issues.
I'm afraid your issue is rooted in your source code.

@StefanFabian
Copy link
Contributor Author

I just did the same and it crashes just like with our tool.

diff --git a/rviz_plugin_tutorials/src/plant_flag_tool.cpp b/rviz_plugin_tutorials/src/plant_flag_tool.cpp
index ea2689b..a8e73d0 100644
--- a/rviz_plugin_tutorials/src/plant_flag_tool.cpp
+++ b/rviz_plugin_tutorials/src/plant_flag_tool.cpp
@@ -38,6 +38,7 @@
 #include <rviz/mesh_loader.h>
 #include <rviz/geometry.h>
 #include <rviz/properties/vector_property.h>
+#include <rviz/selection/selection_manager.h>
 
 #include "plant_flag_tool.h"
 
@@ -169,9 +170,8 @@ int PlantFlagTool::processMouseEvent( rviz::ViewportMouseEvent& event )
   }
   Ogre::Vector3 intersection;
   Ogre::Plane ground_plane( Ogre::Vector3::UNIT_Z, 0.0f );
-  if( rviz::getPointOnPlaneFromWindowXY( event.viewport,
-                                         ground_plane,
-                                         event.x, event.y, intersection ))
+  moving_flag_node_->setVisible(false);
+  if( context_->getSelectionManager()->get3DPoint( event.viewport, event.x, event.y, intersection ))
   {
     moving_flag_node_->setVisible( true );
     moving_flag_node_->setPosition( intersection );

Do you also have an updating map? 1Hz should be sufficient.
And moving the mouse across the map.

Same backtrace

Thread 1 "rviz" received signal SIGSEGV, Segmentation fault.
0x00000000400b0bf0 in ?? ()
(gdb) bt
#0  0x00000000400b0bf0 in  ()
#1  0x00007fffca651bf9 in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#2  0x00007fffca660985 in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#3  0x00007fffca2b93ce in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#4  0x00007fffc871ba7b in Ogre::GLRenderSystem::_render(Ogre::RenderOperation const&) ()
    at /usr/lib/x86_64-linux-gnu/OGRE-1.9.0/RenderSystem_GL.so.1.9.0
#5  0x00007ffff655a16e in Ogre::SceneManager::renderSingleObject(Ogre::Renderable*, Ogre::Pass const*, bool, bool, Ogre::HashedVector<Ogre::Light*> const*) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#6  0x00007ffff650a679 in Ogre::QueuedRenderableCollection::acceptVisitorGrouped(Ogre::QueuedRenderableVisitor*) const ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#7  0x00007ffff650a78c in Ogre::QueuedRenderableCollection::acceptVisitor(Ogre::QueuedRenderableVisitor*, Ogre::QueuedRenderableCollection::OrganisationMode) const () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#8  0x00007ffff654c93a in Ogre::SceneManager::renderBasicQueueGroupObjects(Ogre::RenderQueueGroup*, Ogre::QueuedRenderableCollection::OrganisationMode) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#9  0x00007ffff654c7d7 in Ogre::SceneManager::renderVisibleObjectsDefaultSequence() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#10 0x00007ffff655e4a7 in Ogre::SceneManager::_renderScene(Ogre::Camera*, Ogre::Viewport*, bool) ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#11 0x00007ffff6398c3a in Ogre::Camera::_renderScene(Ogre::Viewport*, bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#12 0x00007ffff6524805 in Ogre::RenderTarget::_updateViewport(Ogre::Viewport*, bool) ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#13 0x00007ffff6524612 in Ogre::RenderTarget::_updateAutoUpdatedViewports(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#14 0x00007ffff6524363 in Ogre::RenderTarget::updateImpl() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#15 0x00007ffff6524d18 in Ogre::RenderTarget::update(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#16 0x00007ffff650e90f in Ogre::RenderSystem::_updateAllRenderTargets(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#17 0x00007ffff654551e in Ogre::Root::_updateAllRenderTargets() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#18 0x00007ffff6545610 in Ogre::Root::renderOneFrame() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#19 0x00007ffff7f3e120 in rviz::VisualizationManager::onUpdate() () at /opt/ros/noetic/lib/librviz.so
#20 0x00007ffff73d41d0 in QMetaObject::activate(QObject*, int, int, void**) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#21 0x00007ffff73e13ee in QTimer::timeout(QTimer::QPrivateSignal) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#22 0x00007ffff73d4bc5 in QObject::event(QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#23 0x00007ffff77d5a66 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#24 0x00007ffff77df0f0 in QApplication::notify(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#25 0x00007ffff73a880a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#26 0x00007ffff73ff780 in QTimerInfoList::activateTimers() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#27 0x00007ffff740006c in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#28 0x00007ffff45f817d in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#29 0x00007ffff45f8400 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#30 0x00007ffff45f84a3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#31 0x00007ffff7400435 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#32 0x00007ffff73a73ab in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#33 0x00007ffff73af116 in QCoreApplication::exec() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#34 0x00005555555585e2 in main ()

@rhaschke
Copy link
Contributor

rhaschke commented May 8, 2023

I used the moving (but planar) map from map_test.py. Does your map strongly changes size between updates? Could you please provide a bagfile of your map topic?

@StefanFabian
Copy link
Contributor Author

StefanFabian commented May 8, 2023

Here's a 30s bag file 2023-05-08-19-21-33.zip.
I've also created a new workspace only overlaying the system ros path to exclude the possibility of our other packages interfering. Same result, after a few seconds of moving the mouse over the map it crashes.

@rhaschke
Copy link
Contributor

rhaschke commented May 8, 2023

Hm. Still can't reproduce your segfault. I guess you use the latest rviz from source?
Please also provide your OpenGL hardware details as printed by rviz on startup.

@rhaschke
Copy link
Contributor

rhaschke commented May 8, 2023

Still can't reproduce your segfault. I guess you use the latest rviz from source?

Same result with latest Noetic release. I suggest you prepare a self-contained environment (e.g. a docker image) to reliably reproduce the problem.

@StefanFabian
Copy link
Contributor Author

Hm. Still can't reproduce your segfault. I guess you use the latest rviz from source? Please also provide your OpenGL hardware details as printed by rviz on startup.

The issue contains the details for my laptop.
My desktop:

[ INFO] [1683566557.838896663]: rviz version 1.14.20
[ INFO] [1683566557.838937414]: compiled against Qt version 5.12.8
[ INFO] [1683566557.838947344]: compiled against OGRE version 1.9.0 (Ghadamon)
[ INFO] [1683566557.845395627]: Forcing OpenGl version 0.
[ INFO] [1683566558.512378558]: Stereo is NOT SUPPORTED
[ INFO] [1683566558.512423668]: OpenGL device: NVIDIA GeForce RTX 3070/PCIe/SSE2
[ INFO] [1683566558.512435728]: OpenGl version: 4.6 (GLSL 4.6).

Nvidia driver versions are 530 on the laptop 3060, and 515 on my desktop.

Still can't reproduce your segfault. I guess you use the latest rviz from source?

Same result with latest Noetic release. I suggest you prepare a self-contained environment (e.g. a docker image) to reliably reproduce the problem.

That seems like a lot of work for a minor change, I can also just continue using our fork if you think this change could have negative consequences.

@rhaschke
Copy link
Contributor

rhaschke commented May 8, 2023

We have seen many segfaults due to OpenGL rendering issues on NVIDIA hardware:
https://github.com/ros-visualization/rviz/issues?q=is%3Aissue+is%3Aopen+label%3Aopengl

On your laptop, you probably also have an integrated Intel GPU. Could you please check whether the bug persists on this hardware? To this end use the environment variables:

__NV_PRIME_RENDER_OFFLOAD=1
__GLX_VENDOR_LIBRARY_NAME=(mesa|nvidia)

Logically, the proposed change should not be necessary. If it is for you, I would like to understand why before merging.
As I said, this change introduces a one-frame delay, which should be avoided if possible.

@rhaschke
Copy link
Contributor

rhaschke commented May 9, 2023

I can reproduce the segfault with NVIDIA hardware. I will try to understand the problem now...

@StefanFabian
Copy link
Contributor Author

Any success at finding the cause?
If you think a 33ms delay (at the default 30fps limit) is that much of an issue, we could also switch to the nh_ that is not running on the main thread.

@rhaschke
Copy link
Contributor

I didn't find the root cause of the issue. I agree to merge this work-around. Please merge the suggestion I added before:
#1793 (comment)

@rhaschke rhaschke merged commit 93ac44b into ros-visualization:noetic-devel May 25, 2023
rhaschke added a commit to rhaschke/rviz that referenced this pull request May 15, 2024
The queued connection update introduced in ros-visualization#1793 caused the map display
to first update the pose of the map and then the map itself (in the next update cycle).
This can be perfectly seen when throttling the rviz frame rate.

Updates to the visualization should be handled in update() only.
Thus, now, if a map update is received, a flag is set to perform the costly map update.
If that flag is not set, only the transform is updated.
rhaschke added a commit to rhaschke/rviz that referenced this pull request May 15, 2024
The queued connection update introduced in ros-visualization#1793 caused the map display
to first update the pose of the map and then the map itself (in the next update cycle).
This can be perfectly seen when throttling the rviz frame rate.

Updates to the visualization should be handled in update() only.
Thus, now, if a map update is received, a flag is set to perform the costly map update.
If that flag is not set, only the transform is updated.
rhaschke added a commit that referenced this pull request May 15, 2024
The queued connection update introduced in #1793 caused the map display
to first update the pose of the map and then the map itself (in the next update cycle).
The resulting jittering can be perfectly seen when throttling the rviz frame rate.

Updates to the visualization should be handled in update() only.
Thus, now, if a map update is received, a flag is set to perform the costly map update.
If that flag is not set, only the transform is updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash when using raycast in tool while map is updated.
3 participants