Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra characters on the end of menu.json cause Home Page to be blank #3487

Closed
Pseudonym77 opened this issue Mar 8, 2023 · 31 comments
Closed

Comments

@Pseudonym77
Copy link

Everything looked fine until I put my Raspberry Pi 4 IIAB into a new case. When I went to connect after, content willl not display on the home page. Everything on the admin pages appears to be functional and accessible but content is empty regardless of device or browser.

Diagnostics link: http://sprunge.us/4HdW7J

I’ve rebooted, updated, upgraded, git pulled, hard reset, changed and saved settings via the admin portal, downloaded content, tried using various USB slots for my ssd, but nothing is changing.

DB4AC574-3D01-4926-9703-08231E07AFD1
DF513168-C398-4737-8836-1C158A018A87

@holta holta added this to the 8.1 milestone Mar 8, 2023
@holta
Copy link
Member

holta commented Mar 8, 2023

@Pseudonym77 http://sprunge.us/4HdW7J?en#n-570 suggests there may be 3 extraneous lines at the bottom of your /library/www/html/home/menu.json

Was this file manually edited possibly? (Any idea where Lines 570-572 came from?)

 566     "en-edison",
 567     "en-credits"
 568   ]
 569 }
 570 "
 571   ]
 572 }
 573 
 574 

@Pseudonym77
Copy link
Author

@holta no idea. I haven't edited anything. The only other thing I left out was it may have been unplugged during startup. Any thoughts on what could have caused that?

I removed those lines and it fired right up. THANK YOU!

@holta
Copy link
Member

holta commented Mar 8, 2023

@Pseudonym77 priceless feedback, profound thanks for writing.

(I hope @tim-moody can look into possible root causes this week, and if so hopefully you can help him corroborate what exactly may have caused this!)

@holta
Copy link
Member

holta commented Mar 8, 2023

@tim-moody can you see anything in the Admin Console CMDSRV log below, that might have caused this?

(Lines 2068 to 2887 here...)

http://sprunge.us/4HdW7J?en#n-2068

@tim-moody
Copy link
Contributor

Two lines look relevant:

2164 Feb 23 15:53:52 box IIAB-CMDSRV[1115]: IIAB-CMDSRV : Received CMD Message INST-PRESETS {"preset_id":"en-school-256-base"}.
2165 Feb 23 15:53:54 box IIAB-CMDSRV[1115]: IIAB-CMDSRV : Error: Map Tile Set is not in catalog in Command INST-OSM-VECT-SET.

I would speculate that the en-school-256-base preset has an error in its list of osm tiles, and the extra junk got added by a failed update menu. I'll have to investigate further.

But I don't understand how the menu ever worked before you moved to a new case.

@tim-moody
Copy link
Contributor

The map tiles in the en-school-256-base preset are fine.

Looking a little further I find it hard to reconstruct what went on here. Two presets were installed plus a great deal of other content. I notice that modules en-osm-omt-central-am and en-worldmap were requested. I think these should be removed from the catalog as they are really covered by osm maps and are probably outdated.

I wonder if there were uncompleted jobs when the case was changed that started up again and caused the problem, but I only see RUN-ANSIBLE-ROLES restarted, and it failed, followed by reboot.

7 other jobs failed as well, 3 map satellite tiles and 4 modules.

Not sure I would be able to reproduce this.

@holta
Copy link
Member

holta commented Mar 8, 2023

At first I was wondering if an Admin Console CMDSRV job was interrupted, by a loss of power or similar? But that idea doesn't seem to explain... extra lines in menu.json 🤔

The dangling double-quote on Line 570 strongly suggests... a string parsing error... possibly while handling a poorly formed string ???

@jvonau
Copy link
Contributor

jvonau commented Mar 8, 2023

Things to note:
The hash 25e2319effa91eb95edd9d9209eb9f8a584d67be is part of the arm64 branch of pi-gen but yet there are apt packages with 'armhf' during ' install_prereq install ' of asterisk, that is surely wrong as the arch is wrong, should be arm64.

install_date = 2023-02-23T14:29:45Z is way before the git hash of 0292fad, upgrades are not officially supported according to Mr Holt.

/etc/iiab/iiab_local.yml looks to be edited or removed at some point as there are only entries present from admin-console
Edit: I should of said local_vars.yml, iiab_state looks fine as the order appears to be correct in relation to the order of installation in the logs.

iiab-apps-to-be-installed :
munin
pbx
samba
There is a failure to install asterisk & samba in iiab-debug.log and summarized at line 3457 COMMAND: /usr/bin/grep -B2 "SEE ERROR ABOVE" /opt/iiab/iiab/*.log # for skip_role_on_error

Errors related to dpkg during asterisk install which I presume to be during RUN-ANSIBLE-ROLES, job_id: 127

@tim-moody
Copy link
Contributor

Errors related to dpkg during asterisk install which I presume to be during RUN-ANSIBLE-ROLES, job_id: 127

I think that assumes ansible is installed and does not try to install it.

But the other observations are relevant at least to the point that this would be very difficult to reproduce.

My best candidate for messing up menu.json would be update menus, which runs at the end of many downloads and was also run twice from Adm Cons.

@holta
Copy link
Member

holta commented Mar 8, 2023

there are apt packages with 'armhf' during 'install_prereq install' of asterisk, that is surely wrong as the arch is wrong, should be arm64

A separate ticket should likely be opened to investigate http://sprunge.us/4HdW7J?en#n-3460

(Even if Asterisk & FreePBX don't really belong on Raspberry Pi, they did install successfully in the past, so this might turn out to be very easily fixable.)

Just FYI Asterisk & FreePBX were installed successfully on Debian 11 as part of IIAB on x86_64 just last week (i.e. on a regular PC/laptop/VM).

@tim-moody
Copy link
Contributor

re: asterisk I read ansible, so disregard my comment

@jvonau
Copy link
Contributor

jvonau commented Mar 8, 2023

Errors related to dpkg during asterisk install which I presume to be during RUN-ANSIBLE-ROLES, job_id: 127

I think that assumes ansible is installed and does not try to install it.

That doesn't compute for me, ansible is installed before admin-console is run and while the entire iiab-install step is using ansible, never mind I see the clarification.

there are apt packages with 'armhf' during 'install_prereq install' of asterisk, that is surely wrong as the arch is wrong, should be arm64

But the other observations are relevant at least to the point that this would be very difficult to reproduce.
A separate ticket should likely be opened to investigate.

A scrape of /etc/apt/sources.list(.d)/ might be revealing, something is not right.

My best candidate for messing up menu.json would be update menus, which runs at the end of many downloads and was also run twice from Adm Cons.

I'll leave that on your plate.

(Even if Asterisk & FreePBX don't really belong on Raspberry Pi, they did install successfully in the past, so this might turn out to be very easily fixable.)

Until then put some user protection to prevent the installation on RasPiOS, or perhaps fiddle with version 18 or 20 seeing how that goes.

Just FYI Asterisk & FreePBX were installed successfully on Debian 11 as part of IIAB on x86_64 just last week (i.e. on a regular PC/laptop/VM).

Post the logs. The difference would be in what is the result of install_prereq install, http://sprunge.us/4HdW7J?en#n-3460 has the noted above entires for armhf.

@holta
Copy link
Member

holta commented Mar 8, 2023

Until then put some user protection to prevent the installation on RasPiOS

After investigation yes, that might be necessary.

Just FYI Asterisk & FreePBX were installed successfully on Debian 11 as part of IIAB on x86_64 just last week (i.e. on a regular PC/laptop/VM).

Post the logs. The difference would be in what is the result of install_prereq install, http://sprunge.us/4HdW7J?en#n-3460 has the noted above entires for armhf.

I don't have the logs, as this was done by a friend overseas. But definitely worth looking into, on this separate ticket:

@tim-moody
Copy link
Contributor

I don't find anywhere that the menu.json file is written without using the json library to serialize, so I don't think there is any code that would corrupt the file.

However, more than one instance of update menus can run at the same time, and this could possibly corrupt the file. So adding file locking as in make kiwix lib might help.

@holta
Copy link
Member

holta commented Mar 11, 2023

However, more than one instance of update menus can run at the same time, and this could possibly corrupt the file. So adding file locking as in make kiwix lib might help.

Great that we have a pattern to potentially watch for 🕵️

Thanks for having explained!

( Interesting that locking isn't built in, as part of the json library primitives, that edit /library/www/html/home/menu.json )

@jvonau
Copy link
Contributor

jvonau commented Apr 3, 2023

2023-03-07 20:28:22,373 p=6124 u=root n=ansible | TASK [samba : Install 4 packages: samba, samba-common, smbclient, cifs-client] ***
2023-03-07 20:28:34,272 p=6124 u=root n=ansible | fatal: [127.0.0.1]: FAILED! => {"cache_update_time": 1678228358, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"       install 'samba=2:4.13.13+dfsg-1~deb11u5' 'samba-common=2:4.13.13+dfsg-1~deb11u5' 'smbclient=2:4.13.13+dfsg-1~deb11u5'' failed: E: dpkg was interrupted, you must manually run 'dpkg --configure -a' to correct the problem. \n", "rc": 100, "stderr": "E: dpkg was interrupted, you must manually run 'dpkg --configure -a' to correct the problem. \n", "stderr_lines": ["E: dpkg was interrupted, you must manually run 'dpkg --configure -a' to correct the problem. "], "stdout": "", "stdout_lines": []}
2023-03-07 20:28:34,277 p=6124 u=root n=ansible | PLAY RECAP *********************************************************************
2023-03-07 20:28:34,278 p=6124 u=root n=ansible | 127.0.0.1                  : ok=82   changed=18   unreachable=0    failed=1    skipped=26   rescued=2    ignored=0   

The above error from 4HdW7J is in iiab-debug.log, recorded during an admin-console run. The samba and cups roles need the same 'rescue' treatment as the rest of the roles have.

@madlaa
Copy link

madlaa commented Apr 11, 2023

First of all, I have had a lot of fun with IIAB. Thank you for creating and maintaining such a great repository! 😄

I have experienced the same issue that @Pseudonym77 reports with my most recent IIAB install.

The /library/www/html/home/menu.json file contained some extra lines that broke the /home page in an identical manner as reported. I have not edited the file manually as part of the install. See screenshot.

image

After editing the file everything works as expected.

I hope file locking fixes the issue for future installs.

Keep up the good work. 👍

@tim-moody
Copy link
Contributor

The problem is that locking the file as a solution is at best a wild guess. I need to figure out what was different the two times the file was corrupted from the hundreds of times it wasn't. Can you run iiab-diagnostics from a terminal prompt and paste the link into this ticket?

Was there a point in time when the menu worked and then any event you can associate with its stopping to work? A number of the zims you have are not commonly used and have no menu definition out of the box. (several en-zimgit, wikiwel, zdoom) Did you download those using Admin Console? And they displayed properly on the home menu? (the menu definition would have been manually generated and automatically added to the menu)

@holta
Copy link
Member

holta commented Apr 13, 2023

Can you run iiab-diagnostics from a terminal prompt and paste the link into this ticket?

@madlaa can you run iiab-diagnostics ?

To help us understand, Thanks if possible!

🩺

@tim-moody
Copy link
Contributor

Thinking about this some more. In the first example it looked like extra lines were being added to the json file. In the second example it is clear that there are extra bytes on the end, but not complete lines. So I am now wondering if rather than junk being written on the end, it is junk not being truncated when a shorter json string is written over a longer one. Usually the menu grows rather than shrinks, so this problem would only happen occasionally. Perhaps a file open statement has the wrong mode, or the file is not being properly closed.

@tim-moody
Copy link
Contributor

Well, I don't find anywhere that this file is written without using with open, so close should be implicit, and the mode is always 'w', so truncation is supposed be done automatically.

@holta
Copy link
Member

holta commented Apr 14, 2023

I just posted this issue to "Known Issues" here:

https://github.com/iiab/iiab/wiki/IIAB-8.1-Release-Notes#known-issues

(To make its manual workaround more visible, and to encourage anyone affected to run iiab-diagnostics to help solve this for everyone hopefully soon.)

@tim-moody
Copy link
Contributor

Here's what I propose, at least for now.

  1. Add utility to Repair Menu File - reads file, tries to strip off extra bytes, does json loads to see if OK and if so writes to file.
  2. Display message on menu that file is corrupted.

@holta
Copy link
Member

holta commented Apr 16, 2023

Why not?

Can simple validation of /library/www/html/home/menu.json as a legal JSON file be automated in the interim?

@tim-moody
Copy link
Contributor

Can simple validation of /library/www/html/home/menu.json as a legal JSON file be automated in the interim?

where and to what end? silently fixing the current problem is pretty easy, though I am not fond of the silent part.

@tim-moody
Copy link
Contributor

except json.JSONDecodeError as e:
    fixed_str = e.doc[:e.pos-1]

@holta
Copy link
Member

holta commented Apr 16, 2023

Can simple validation of /library/www/html/home/menu.json as a legal JSON file be automated in the interim?

where and to what end? silently fixing the current problem is pretty easy, though I am not fond of the silent part.

I'm not quite following you.

Is automatic detection of the problem (i.e. validation of some kind) possible or not really?

@tim-moody
Copy link
Contributor

I can now patch (not repair) menu.json on the fly. Not sure how loud to make the warning. could print something on the home page or do an alert. The latter could get tiresome, but an alert would direct the user to Adm Cons to do a repair.

@holta
Copy link
Member

holta commented Apr 16, 2023

print something on the home page or do an alert

Whatever's simplest/loudest is probably fine.

i.e. If the home page is confirmed to be broken[*] then any set of tips-or-instructions explaining how to recover will do.

[*] Or am I misunderstanding, e.g. are you working towards a rough heuristic instead, indicating that home page might be broken?

@holta
Copy link
Member

holta commented Apr 17, 2023

@holta
Copy link
Member

holta commented Apr 18, 2023

We might want to re-open this ticket to make it a bit more discoverable?

(But most importantly this is now mitigated with @tim-moody's workaround/advisory, for folks in need!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants