Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concerns about performance and FixedUpdate #66

Closed
AlgoryxMartinT opened this issue May 23, 2018 · 11 comments
Closed

Concerns about performance and FixedUpdate #66

AlgoryxMartinT opened this issue May 23, 2018 · 11 comments

Comments

@AlgoryxMartinT
Copy link

AlgoryxMartinT commented May 23, 2018

Hi!

Earlier I had some thoughts about ROS - Unity communication performance (see #55), and I've now done some more testing and want to share the results. I also have some concerns about one test which I'd like some feedback on if possible.

Scripts

I am appending the scripts used in this issue: 2 C# Unity scripts using ROS#, 1 C++ and 1 CMakeLists.txt file to create a ROS package.

C#/Unity

Files: unity.zip

  • SimplifiedPublisher - Publishes a string. The publication can be set to occur either every time FixedUpdate is called; or to be related to a call to a trigger method. If the PublicationType enum is set to FixedUpdateTrigger, the trigger method will make sure the message is published the next time a FixedUpdate callback is called. If it is set to FastTrigger, the message will be published immediately.
  • SimplifiedPublishSubscriber - Calls the trigger method of the SimplifiedPublisher script when a string message is received.

C++/ROS

Files: ros.zip

  • listen_and_talk.cpp - ROS node that subscribes to the /unity_talker topic, and publishes "ros" to the /ros_talker topic each time a message is received.
  • CMakeLists.txt - CMakeLists

To compile into a ROS node (execute at the top level of a Catkin workspace):

cd src
catkin_create_pkg unity_listener std_msgs roscpp
cp [path to unity_listener.cpp] unity_listener/src
cp [path to CMakeLists.txt] unity_listener
cd ..
catkin_make

and run using

rosrun unity_listener unity_listener

Tests

I think all of these tests should be simple to reproduce using the scripts provided above. All test cases use a Unity scene with a single GameObject, which is described under each subsection. Test cases 2 and 3 also uses the ROS node provided.

All frequencies have been measured using the ROS command:

rostopic hz /topic_name

Test case 1: One-way communication

performance test cases - page 1

Unity scene

RosConnector GameObject with the following components

  • RosConnector
  • SimplifiedPublisher publishing the string "unity" to the /unity_talker topic. Configured to use the FixedUpdate publication type.
    case1

Results

I ramped down the time step using Edit -> Project Settings -> Time, and the maximum stable frequency achieved was about 4000 Hz.

Test case 2: Two-way communication outside of FixedUpdate

performance test cases - page 1 1

Unity scene

RosConnector GameObject with the following components

  • RosConnector
  • SimplifiedPublishSubscriber subscribed to the /ros_talker topic.
  • SimplifiedPublisher publishing the string "unity" to the /unity_talker topic. Configured to use the FastTrigger publication type.
    case2

ROS node

Uses the unity_listener node described under the Scripts section.

Notes

You might have to start the communication by manually sending a message to either topic, using for example:

rostopic pub /unity_talker std_msgs/String "data: 'xxx'" --once

Results

The frequency lies around 1000 Hz.

Test case 3: Two-way communication using FixedUpdate

performance test cases - page 1 2

Unity scene

Same as Test case 2, but with the SimplifiedPublisher configured to use the FixedUpdateTrigger publication type.
case3

ROS node

See Test case 2.

Results

The frequency never goes above 60 hz, even when the time step is ramped down. Setting a time step above 1/60 makes the frequency go down accordingly.

Analysis

Test case 1 shows that FixedUpdate is able to publish at an approximate rate of about 4000 messages/second. Test case 2 shows that it is possible to communicate between ROS and ROS# at a rate of about 1000 message/second. Therefore I do not understand why the results look like they do for Test case 3. I also tried to alter the SimplifiedPublisher script to publish to a second topic during Test case 3, and it was able to publish at a rate similar to Test case 1. Do you have any thoughts on this, or if there are other test cases I can make to shed some light?

@AlgoryxMartinT
Copy link
Author

Noticed I posted this on my work account. My private account is: https://github.com/hassanbot

@AlgoryxMartinT
Copy link
Author

Another observation: I tried changing the refresh rate of my screen during Test case 3, and the frequency of the messages changed accordingly. It seems like there's a connection to the Update() callback somewhere.

@MartinBischoff
Copy link
Collaborator

MartinBischoff commented May 25, 2018

The only idea I have is that physic timing is deactivated as you are having an empty scene.
Would you put a rigidbody in with isKinematic disabled and check if results are the same?

You can also try and simplify the test: Remove ROS# and call a Debug.Log message every FixedUpdate() instead. It should also have that 1/60 frequency if there's a connection to the Update() callback somewhere.

@gris-martin
Copy link
Contributor

I don't think it has to do with FixedUpdate() being called too infrequently, since in Test case 1 I am able to get to about 4000 Hz using the FixedUpdate() callback. I also tried to advertise a second topic in Test case 3 during FixedUpdate, that published as fast as possible without waiting for something to be returned, and it was able to publish at at least 1000 Hz.

I also tried to alter Test case 2 to publish sensor_msgs/Image, and when the data byte length was less than 457 bytes, the rate was about 850 Hz (slowly declining with increased size), but when going above 457, the rate immediately decreased to about 22.7 Hz. Something's off here...

@gris-martin
Copy link
Contributor

gris-martin commented May 30, 2018

I have a theory about the problem now: Nagle's algorithm of TcpClient (can be disabled with the NoDelay property). It seems that others have had similar problems when using WebSockets (eclipse-mosquitto/mosquitto#433). There does not seem to be an option in the WebSocketSharp library to enable the NoDelay property, but there is an open issue: sta/websocket-sharp#327. Maybe the option is available in one of the new libraries discussed in #59.

@gris-martin
Copy link
Contributor

gris-martin commented May 30, 2018

Seems like part of the problem was on the ROS side of things. When I altered my ROS node subscriber to use the tcpNoDelay() transport hint (http://wiki.ros.org/roscpp/Overview/Publishers%20and%20Subscribers#Transport_Hints) I got results almost as expected. At least when either sending small byte arrays (that fits within one WebSocket package of length 1078 bytes), or large ones (that requires three or more such WebSocket packages). When sending packages that require exactly two WebSocket packages, however, I still get the drop in frequency.

So even though I'm now able to work around the problem, I think it might still be worth looking into the possibility of using the NoDelay property on the ROS# side too somehow, but maybe as another issue.

@MartinBischoff If you have nothing more to add I consider this issue solved.

EDIT: Sorry, I spoke too soon. These last two comments has nothing to do with Test case 3, which is what this issue is really about. That problem is still present, even after altering the ROS side subscriber. It would be interesting to see if the RosBridgeClient refactoring and using other communication libraries will have any effect.

@MartinBischoff
Copy link
Collaborator

@hassanbot please feel free to adjust your test scripts to use the new RosBridgeClient code.
Though we have not started updating the Unity3D project yet, the refactoring of RosBridgeClient itself is completed (#59).

The ClientWebSocket ReceiveAsync method requires to specify a buffer size for the answer. This parameter is missing in websocket-sharp (at least I never set it). Message buffer size might be the reason for this issue we are having here.

@MartinBischoff
Copy link
Collaborator

MartinBischoff commented Oct 8, 2018

We also performed some performance tests and noticed that the WebSocketNetProtocol has ratehr strong speed fluctuations. WebSocketSharpProtocol by contrast has a relatively constant perfomance with transfer rates which are qual to the best rates of WebSocketNetProtocol.

We want to learn more about the reason of these speed fluctuations: Is it due to our WebSocketClient implementation? Or due to the ClientWebSocket class?

@viliwonka
Copy link

viliwonka commented Oct 3, 2019

We also performed some performance tests and noticed that the WebSocketNetProtocol has ratehr strong speed fluctuations. WebSocketSharpProtocol by contrast has a relatively constant perfomance with transfer rates which are qual to the best rates of WebSocketNetProtocol.

We want to learn more about the reason of these speed fluctuations: Is it due to our WebSocketClient implementation? Or due to the ClientWebSocket class?

Lots of Garbage Collection going on, this is what can cause enormous performance spikes, since C# program has to be stopped in order to collect all the garbage.

It's produced by JSON serializator.

Or maybe I missed the point, since it's not about serializator but socket client.

@gris-martin
Copy link
Contributor

gris-martin commented Mar 26, 2020

Hi again! Almost 2 years later and I now have a pretty good understanding of what's going on in the examples. The main issue is that Unity's FixedUpdate does not run at a fixed interval; i.e. there is not a fixed time between the start of one FixedUpdate and the next one. Instead the FixedUpdate callbacks are run in batches at the start of each frame, enough times to make the mean frequency of the callback calls correspond to the desired FixedUpdate frequency (1/fixedDeltaTime unless timeScale is used). It should work something like this:

float timer = 0;
while(true)
{
     while(timer > fixedStep)
    {
         FixedUpdate();
         timer -= fixedStep;
     }
     Update();
     timer += deltaTime;
}

The code is taken from this forum post, which also has more discussion on the topic.

Test cases

I have drawn some pictures to try and visualize what's going on in the different test cases. Each picture corresponds to 3 frames, and show the activity in different threads and processes.

Legend

Test case 1: One-way communication

In the first example, each FixedUpdate call sends a message which is picked up by rostopic hz. As long as fixedDeltaTime is high enough (i.e. the desired FixedUpdate frequency is low enough), all FixedUpdate calls will happen at the start of the frame on the main thread. At the end of the frame Unity will perform rendering of active cameras.

In the picture, if the fixedDeltaTime is set to 1 / 180 s (180 Hz), each fram would take 1 / 60 s (60 Hz). In reality the length of each frame varies slightly (or a lot), and the deltaTime might not be evenly divided by fixedDeltaTime, so the number of FixedUpdate calls vary between frames (but the mean frequency of FixedUpdate calls is still the same).

example_1

In this example, rostopic hz sees 9 ROS messages, but it is dependent on the number of FixedUpdate calls (i.e. the fixedDeltaTime).

Test case 2: Two-way communication outside of FixedUpdate

In this example, FixedUpdate is not involved at all, but all of the communication is done on a separate thread. The ROS# client listens for messages on a separate thread, and then calls the Receive callback (from that thread) when a message is received. The Receive callback then calls the ExternalTrigger method, which publishes a new message (still on the same thread).

The Unity main thread is never involved, and the FixedUpdate callback only consists of overhead and some if-statements.

example_2

In this highly hypothetical scenario, rostopic hz receives 11 messages during the same time, and it is completely decoupled from fixedDeltaTime.

Test case 3: Two-way communication using FixedUpdate

In this example, the Unity side Receive method (yellow) just sets a variable that signals that the Publisher should send a message the next time FixedUpdate is called. However, since all the FixedUpdate calls happen in a row (and since they do practically nothing), once a message has been published from Unity, all of the remaining FixedUpdate calls for a particular frame happens before the response from the ROS node reaches the Receive function. Therefore the next publish will not happen until the start of the next frame.

This is where I misinterpreted things, because I thought the FixedUpdate callbacks would be spread out evenly, and not executed in a batch at the start of each frame.

example_3

Here, rostopic hz only see 3 messages, and that they're dependent on the number of frames for the time span. This is consistent with the results from the original post in this thread.

Solutions

I have found some solutions/workarounds that can be used to get around this problem, which are presented below.

Solution 1: Increase frame rate

If the frame rate is increased enough to only have 1 FixedUpdate callback for each frame, the setup in test case 3 can be used. This can be done by disabling v-Sync:

QualitySettings.vSyncCount = 0;

This is the method I have been using, even though I'm not fond of it.

Solution 2: Run simulation on a separate thread

One solution would be to run the physics simulation on a separate thread, and then do the communication similarly to test case 2. This could work for me, since I'm using an external physics engine (AGX Dynamics), but I don't think it's feasible for Unity's built-in physics (PhysX). The built-in physics is executed automatically on the main thread at the same rate as FixedUpdate (see Unity's description of the order of execution for event functions).

Solution 3: Make FixedUpdate wait for a new signal

By making FixedUpdate lock the main thread while waiting for a new message, we can make sure that all the FixedUpdates for a particular frame aren't executed before an answer is received (as in test case 3). This is probably the best approach, if you can't run the simulation as a separate thread.

I made an image of what this scenario could look like as well:

solution

Here the number of messages that rostopic hz sees is again dependent on the fixedDeltaTime (as in test case 1).

Conclusion

The reason I did these test cases was that I wanted to make a simulation in Unity, with a controller running in a separate process, and get a synchronous loop (i.e. step Unity simulation, send sensor signals to controller, step controller, send control signals to Unity simulation, step Unity simulation, and so on). I now have a pretty good understanding of how this can be done, and why my first attempts were unsuccessful, so I think this issue can now be closed.

Thanks again for a great library!

//Martin (AlgoryxMartinT/hassanbot)

@MartinBischoff
Copy link
Collaborator

Hi @hassanbot ,

thank you for reporting your studies and your progress to this issue!

I'd suggest to go for Option 2. It works very well with Unity's built-in physics engine as follows:

  • set Physics.autoSimulation = false
  • trigger the next physics step via Physics.Simulate().
  • use an own timer to tigger physics similar to your code snipplet above

I did that in a non-public project to set up a co-simulation with some automation development software of my company. On the unity side the 3D Mechanics are calculated. If the AGX dynamics works similar to NVidia PhysX in Unity, we could easily switch between both physics engines depending on the application-specific needs.

Though I find this topic very interesting I agree that we have resolved this issue.

Thank you again for supporting this project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants