Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully defined apt repository package checking #2554

Merged
merged 5 commits into from
Feb 27, 2024
Merged

Conversation

theofficialgman
Copy link
Collaborator

@theofficialgman theofficialgman commented Feb 24, 2024

resolves 27f5fe1#r139018751

these modified and new functions fully define the checking of an apt repository for installed packages.

This prevents false positives that did occur before if multiple apt repositories shared a URI and SUITE but had different COMPONENTS

This also greatly improves support for .sources files where multiple stanzas are now properly supported and parsed

@theofficialgman theofficialgman marked this pull request as draft February 24, 2024 22:50
@theofficialgman
Copy link
Collaborator Author

theofficialgman commented Feb 25, 2024

just some examples where the output before is wrong

gman@gman-jam:~$ anything_installed_from_repo https://Pi-Apps-Coders.github.io/box64-debs/debian
gman@gman-jam:~$ echo $?
1
gman@gman-jam:~$ anything_installed_from_repo https://Pi-Apps-Coders.github.io/box64-debs/debian/
gman@gman-jam:~$ echo $?
1

no output, there should be debian-tegrax1

correct output now:

gman@gman-jam:~$ anything_installed_from_uri_suite_component https://Pi-Apps-Coders.github.io/box64-debs/debian/ ./
gman@gman-jam:~$ echo $?
0

0 is a package was found

@theofficialgman theofficialgman marked this pull request as ready for review February 25, 2024 00:07
theofficialgman referenced this pull request Feb 25, 2024
…if_unused

In order to run remove_repofile_if_unused on a backports repo, the
search algorithms needed to be redone to consider dist information and
installed package versions. Otherwise backports repos would never be
removed due to false positives.
…ed apt repository for installed packages

instead of checking the apt-cache policy on each package individually, check all installed packages that exist in a repo at once. this results in a multiple factor of magnitude speedup on large repositories (eg: debian/ubuntu main repositories) from minutes to seconds
@theofficialgman
Copy link
Collaborator Author

there were also false negatives. I had sublime text and sublime merge installed. I then uninstalled sublime text and it removed the shared apt repo while sublime merge apt package was still installed.

all these issues are fixed with this new function and its use

api Show resolved Hide resolved
api

local packages_in_repo="$(grep '^Package' "$repofile" | awk '{print $2}' | sort)"
local installed_packages="$(apt list --installed 2>/dev/null | tail -n +2 | awk -F/ '{print $1}')"
local apt_cache_policy_output="$(echo "$packages_in_repo" | list_intersect "$installed_packages" | tr '\n' ' ' | xargs -r apt-cache policy)"
Copy link
Owner

@Botspot Botspot Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With tr used here, it seems to me like the output from this would always be a single line, meaning all matching packages would be passed to apt-cache in one run.
So it seems like the only use for xargs is shorthand to avoid running apt-cache if no packages from the repo are installed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt-cache policy needs the input as individual arguments. it also does not read from stdin. these are the reasons why xargs is used.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained poorly.
If apt-cache is only being run once, then something like this should work if I understand this correctly

local apt_cache_policy_output="$(apt-cache policy $(echo "$packages_in_repo" | list_intersect "$installed_packages" | tr '\n' ' '))"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I correctly interpreted what you said.

apt-cache policy needs the input as individual arguments.

What you just sent puts all output as one input argument I think. Can't check now.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No quotes around the inner $() will make each package into a separate argument.
If nothing else, this is simply something useful to know. I'm not necessarily requesting any changes here.

@Botspot
Copy link
Owner

Botspot commented Feb 26, 2024

I have changed the how the list of installed packages is determined, directly using the dpkg status file which is much faster.
This should be safe because on my system the original and new command return exactly the same output.

pi@raspberrypi:~ $ grep -xF 'Status: install ok installed' /var/lib/dpkg/status -B 2 | grep '^Package: ' | sed 's/^Package: //g' | sort | sha256sum
3ac304de04b498fe60eaaa74122744f08c5318c96eea28523fa860d16bfc40ca  -
pi@raspberrypi:~ $ apt list --installed 2>/dev/null | tail -n +2 | awk -F/ '{print $1}' | sort | sha256sum
3ac304de04b498fe60eaaa74122744f08c5318c96eea28523fa860d16bfc40ca  -

Time improvement: before=0m0.670s after=0m0.014s
After having trouble getting the outputs to match and finding out why, I realized that -B2 is necessary instead of -B1 for the handful of packages that add Essential: yes between the Package: line and the Status: install ok installed line. Meaning that our package_installed() function has been flawed this whole time.

@Botspot
Copy link
Owner

Botspot commented Feb 26, 2024

Also fixed setting installed_packages multiple times. Not sure why you had that inside the loop.

@theofficialgman
Copy link
Collaborator Author

Also fixed setting installed_packages multiple times. Not sure why you had that inside the loop.

that was just an oversight. no other reason

@theofficialgman
Copy link
Collaborator Author

After having trouble getting the outputs to match and finding out why, I realized that -B2 is necessary instead of -B1 for the handful of packages that add Essential: yes between the Package: line and the Status: install ok installed line. Meaning that our package_installed() function has been flawed this whole time.

Yup I discovered that yesterday as well before opening this PR. I intended to fix it in pi-apps but hadn't gotten around to it yet. I see you have already done so.

@theofficialgman
Copy link
Collaborator Author

theofficialgman commented Feb 26, 2024

I have changed the how the list of installed packages is determined, directly using the dpkg status file which is much faster. This should be safe because on my system the original and new command return exactly the same output.

I may change this back. The intention was to only get output for all and the current dpkg architecture. Currently any architecture package (including a non-native architecture package) will return. Pi-Apps package apps install the current dpkg architecture package exclusively (and that is by design) so it would be incorrect to mark one as "installed" if the user had a non-native architecture version of that package installed. Same issue with the package_installed() function.

In order to obtain the architecture information from dpkg, we would need to parse 8 lines for each package, instead of the current 2.

There is actually (at least one) case where the output from the grep command and the apt command differs once "correctly" sorted for architecture

using your dpkg command, it now becomes

grep -xE 'Architecture: ((arm64)|(all))$' /var/lib/dpkg/status -B 7 | grep -xF 'Status: install ok installed' -B 2 | grep '^Package: ' | sed 's/^Package: //g' | sort

using the apt-cache policy, it now becomes

apt list --installed 2>/dev/null | grep -E "/* * arm64|all " | awk -F/ '{print $1}' | sort

the one difference between the output is the dpkg command incorrectly filters out any manually held back packages while the apt command does not. I am currently holding back the qemu-user-static package on my main desktop linux install because of QEMU regressions that cause ARM64 segfaults https://gitlab.com/qemu-project/qemu/-/issues/1913. The output of such a held back package is Status: hold ok installed

to correct that and achieve the same result, now the command needs to be

grep -xE 'Architecture: ((arm64)|(all))$' /var/lib/dpkg/status -B 7 | grep -xE 'Status: ((install)|(hold)) ok installed$' -B 2 | grep '^Package: ' | sed 's/^Package: //g' | sort

now that is getting quite lengthy

I feel like the dpkg status grep, although faster and I knew that, might be prone to breakage in the future.

@Botspot
Copy link
Owner

Botspot commented Feb 27, 2024

I feel like the dpkg status grep, although faster and I knew that, might be prone to breakage in the future.

ok understand

api Outdated Show resolved Hide resolved
@Botspot
Copy link
Owner

Botspot commented Feb 27, 2024

OK once you have uris sorted out I think this is ready to be merged.

theofficialgman and others added 3 commits February 27, 2024 17:05
…from_uri_suite_component`

handles .sources files and .list files with multiple components and multiple suites (as allowed) and filters out disabled sources.

in deb822, it is convention to follow a field separator colon by a space and then the package name but the specification allows for no space or any number of spaces or tab characters. trim all tab and space characters to a single space character for components and suites that can have multiple strings of output
improve readibility, optimize list of installed packages

Co-Authored-By: Botspot <[email protected]>
also replace function in brave install script with `anything_installed_from_uri_suite_component`
@theofficialgman theofficialgman merged commit 7e47d18 into master Feb 27, 2024
3 checks passed
@theofficialgman theofficialgman deleted the repo-use-check branch February 27, 2024 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants