-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Ringbuffer for Input&output Buffer #157
Conversation
The perf is not good after this change. Maybe we need change output to a ringbuffer as well. Not sure. please review. |
i am thinking we may want to use callback (IOCompletionPort) in USB stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look fine to me. I tested the code on I7-3770 . It runs faster by some units%
There is still the #158 problem that does not depend on this code
unfortunately, it is slower than before on my laptop, which is a I7-8665. I
need more testings. and also waiting the fix of dynamic extio_len.
Oscar Steila <[email protected]>于2021年1月19日 周二下午5:30写道:
… ***@***.**** approved this pull request.
Look fine to me. I tested the code on I7-3770 . It runs faster by some
units%
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#157 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAF3GRFHYMQGXFM3OIQWXTDS2VGNBANCNFSM4WCOO5UQ>
.
|
Please review again. Now it doesn't use dynamic ext_blocklen. so it should work with the current HDSDR. I need help to validate the performance. On my laptop, i cannot see much difference as 32M cannot work well anyway. |
I made some test on I7-3770 ( my Laptop has still temperature problem :-) |
I made a test on my laptop with SR 8 M. The v1.1.0 is faster 23-24 % vs 29-30% of this branch |
no intention to commit as perf regression. |
Thank you for the testing. The result is actually expected. The goal for
this PR is having the input and output decoupled so that I can add the
functions to support multi channels in a cleaner way.
Since we have more threads here and in order to coordinate between threads,
I added some busy loop in the code. I defined the following: The number may
need to adjust to reduce CPU usage. As far as we don't see the overall perf
slow down, it is fine to use a bit more CPU.
const int spin_count= 1000000;
…On Mon, Apr 26, 2021 at 12:24 AM Oscar Steila ***@***.***> wrote:
I made a comparison of the action compiled windows vs the master. I'm
using the old I7-3770 :-)
[image: justarun]
<https://user-images.githubusercontent.com/9883800/116001110-3c2ab580-a5f3-11eb-8719-479d76dcb6c9.jpg>
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#157 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAF3GRHXXIQVZIBDUY6MYK3TKQ64DANCNFSM4WCOO5UQ>
.
--
-Howard
|
Hi Oscar,
I added one more change to reduce spin_count to 100 which seems to help CPU
usage a bit. It will be great if you can test the 64M bandwidth to see if
there is any regression. The focus here is getting the current
implementation to a pipeline solution so that I can add more processing
into the pipeline without complicating the code too much.
I plan to add more channels as the next step so that one channel will be
processed by one thread for iFFT. and also supporting the sample rate down
to 48Khz with some software decimate after the current fft approach. The
decimate will be processed in another thread as well. This requires the
whole process in the pipeline fashion.
…On Mon, Apr 26, 2021 at 8:43 AM Howard Su ***@***.***> wrote:
Thank you for the testing. The result is actually expected. The goal for
this PR is having the input and output decoupled so that I can add the
functions to support multi channels in a cleaner way.
Since we have more threads here and in order to coordinate between
threads, I added some busy loop in the code. I defined the following: The
number may need to adjust to reduce CPU usage. As far as we don't see the
overall perf slow down, it is fine to use a bit more CPU.
const int spin_count= 1000000;
On Mon, Apr 26, 2021 at 12:24 AM Oscar Steila ***@***.***>
wrote:
> I made a comparison of the action compiled windows vs the master. I'm
> using the old I7-3770 :-)
> [image: justarun]
> <https://user-images.githubusercontent.com/9883800/116001110-3c2ab580-a5f3-11eb-8719-479d76dcb6c9.jpg>
>
> —
> You are receiving this because you modified the open/close state.
> Reply to this email directly, view it on GitHub
> <#157 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAF3GRHXXIQVZIBDUY6MYK3TKQ64DANCNFSM4WCOO5UQ>
> .
>
--
-Howard
--
-Howard
|
Please also focus on the performance in additional to CPU usage. This version still cannot play 64M well on my laptop. If no objection, i will first commit this version and start break current r2iq into 3 stages (or maybe 3). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Bravo
thank you for all your testing. My next PR will be even bigger in terms of changes. |
No description provided.