-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cnetthinner seg faults #4883
Comments
I have a somewhat older version of the failing network that has more images, points and and slightly less (-13k) measures than the failing network in the original post that cnetthinner ran on successfully under isis7.1.0 (using ~75G over 7.5 hours). The network creating the seg fault in theory should run so I'm wondering if it is corrupt in some way despite other programs running on it (cnetstats, cnetcheck, jigsaw w/ limited solve for parameters set). If there is interest in the successful network let me know. |
Hi @lwellerastro, I was able to recreate a segfault error with the network referenced in the initial post but not necessarily the same one as I only received a segfault error without these print statements:
Possibly due to a memory issue as I tried running this on my machine. Could I see the successful network you mentioned above? |
Hi @chkim-usgs, I created a subdirectory named SuccessfulNetwork/ in the directory mentioned above (will edit that to remove some detail). It appears I ran cnetthinner a little different as far as min/maxpoints are concerned, but I don't think that's what caused the other version of the network to fail. The input and output network are under SuccessfulNetwork/ as well as the print.prt. Here's my command: I had to send this to the cluster to get adequate memory to run and you will have to do the same. This successful run used about 75G of memory and ran for over 7 hours. Please see proc.scr in the directory to see how to send it there in a single command. I think you should have access even without a directory on /scratch which is not needed here. You will need to be on an astro machine such as astrovm4 or astrovm5 in order to use the cluster. These systems have limited/insufficient memory and are shared resources and should not be used for this particular program and network. |
Based on comments in #5354, I ran cnetedit on the failing network and cnetthinner now runs successfully on it (74G of memory, 8.5 hours). Seems the bug has been identified cleaning up invalid points worked around the issue. A clean network exists under my user work area Isis3Tests/CnetThinner/CleanNet/SouthPole_2020_Merged_Lidar2Image_redo12_Edit.net |
I am preparing to submit a PR for this with the app converted to a callable
function and old Makefile tests converted to gtests as well.
…On Thu, Nov 30, 2023 at 10:32 AM lwellerastro ***@***.***> wrote:
Based on comments in #5354 <#5354>,
I ran cnetedit on the failing network and cnetthinner now runs successfully
on it (74G of memory, 8.5 hours). Seems the bug has been identified
cleaning up invalid points worked around the issue.
A clean network exists under my user work area
Isis3Tests/CnetThinner/CleanNet/SouthPole_2020_Merged_Lidar2Image_redo12_Edit.net
—
Reply to this email directly, view it on GitHub
<#4883 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUGSUQUX4Z5SOHEW2KFMB3YHC7KFAVCNFSM5SQPNWRKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTGQZDIMBQHA4Q>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I agree that that the network should be tested as is when the fix for #5354 becomes available just in case the seg fault I encountered is not related that bug. |
I tested the original posted network under newly released isis8.0.2 and cnetthinner continues to segfault and dump a core despite changes via #5354, so my recent success was a fluke. This is still low priority and perhaps not worth the effort since there is a work around (running cnetedit on the network resulted in a successful cnetthinner run) and this particular network has a complicated past while new, improved versions currently exist. I personally think it's ok to close this post and re-open in the future if a different network runs into a similar issue. I'll leave it open for a bit in case there are any objections closing. |
No longer a need to keep to this open with work around and having moved away from using this particular network for any products. |
ISIS version(s) affected: 5.0.2, 6.0.0
Description
cnetthinner segfaults and dumps an enormous core (20 G+) after "reading control points" and "adding control points to network".
This is a very large network so I initially sent the job to the cluster and asked for all of mem1 since it has the most memory (365 G). The program segfaults in about 1-1.5 hours and only maxes out at about 25 G. I have also run the program directly on astrovm4 which was using the same amount of memory when the program seg faulted so at this point it does not appear to be a memory problem. That being said, if and when this program is able to operate properly on this particular network, it must be run on mem1 and I think we will run up against memory issues because a smaller yet very large Kaguya TC network of the south pole use 265 G for cnetthinner to run.
How to reproduce
Network available under my users work directory Isis3Tests/CnetThinner/.
Additional context
This network was recently made available via #3871 as a semi-successful LROC network where jigsaw can solve for camera velocities but is unable to solve for camera velocities (or acceleration) and spacecraft position. I was trying to trouble-shoot the network outside of jigsaw and was running various programs when I tried cnetthinner and found it does not work (cnetstats, cnetcheck and some other network oriented programs run fine). Cnetthinner fails on both redo12 and redo13 networks that have been recently listed in the jigsaw post.
I am hoping it might be easier to find what the problem is with this network via cnetthinner since it gets right to the point and might be easier to debug for problems that jigsaw.
The text was updated successfully, but these errors were encountered: