-
Notifications
You must be signed in to change notification settings - Fork 24k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ansible-galaxy fails to install collection from GalaxyNG when there are many versions #77911
Comments
Files identified in the description:
If these files are incorrect, please update the |
I think the big difference between the more modern 5.8.0 Calling Galaxy at https://kneawxalp311.kiewitplaza.com/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/?limit=100&offset=100 2.9.12 Calling Galaxy at https://kneawxalp311.kiewitplaza.com/api/galaxy/content/published/v3/collections/community/vmware/versions/2.5.0/ It seems the newer versions have an HTTP query string limiting the results whereas the older version does not. I cannot find a way to coerce |
I've worked around this by:
My specific refinement uses a version qualifier for the community.vmware collection like so: ---
collections:
- name: community.vmware
version: '>= 1.17.0' This greatly pared down the number of versions of the collection synced to GalaxyNG (I don't use anything older than 1.17.0) and now |
@watsonb Would you be able to curl needs_info |
@s-hertel whilst using {
"meta": {
"count": 9
},
"links": {
"first": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/?limit=100&offset=0",
"previous": null,
"next": null,
"last": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/?limit=100&offset=0"
},
"data": [
{
"version": "2.5.0",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/2.5.0/",
"created_at": "2022-05-25T19:30:47.878318Z",
"updated_at": "2022-05-25T19:30:47.878337Z",
"requires_ansible": ">=2.11.0"
},
{
"version": "2.4.0",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/2.4.0/",
"created_at": "2022-05-25T19:30:47.878318Z",
"updated_at": "2022-05-25T19:30:47.878337Z",
"requires_ansible": ">=2.11.0"
},
{
"version": "2.3.0",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/2.3.0/",
"created_at": "2022-05-25T19:30:47.878318Z",
"updated_at": "2022-05-25T19:30:47.878337Z",
"requires_ansible": ">=2.11.0"
},
{
"version": "2.2.0",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/2.2.0/",
"created_at": "2022-05-25T19:30:47.878318Z",
"updated_at": "2022-05-25T19:30:47.878337Z",
"requires_ansible": ">=2.11.0"
},
{
"version": "2.1.0",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/2.1.0/",
"created_at": "2022-05-25T19:30:47.878318Z",
"updated_at": "2022-05-25T19:30:47.878337Z",
"requires_ansible": ">=2.11.0"
},
{
"version": "2.0.0",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/2.0.0/",
"created_at": "2022-05-25T19:30:47.878318Z",
"updated_at": "2022-05-25T19:30:47.878337Z",
"requires_ansible": ">=2.11.0"
},
{
"version": "1.18.0",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/1.18.0/",
"created_at": "2022-05-25T19:30:47.878318Z",
"updated_at": "2022-05-25T19:30:47.878337Z",
"requires_ansible": ">=2.9.10"
},
{
"version": "1.17.1",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/1.17.1/",
"created_at": "2022-05-25T19:30:47.878318Z",
"updated_at": "2022-05-25T19:30:47.878337Z",
"requires_ansible": ">=2.9.10"
},
{
"version": "1.17.0",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/community/vmware/versions/1.17.0/",
"created_at": "2022-05-25T19:30:47.878318Z",
"updated_at": "2022-05-25T19:30:47.878337Z",
"requires_ansible": ">=2.9.10"
}
]
} That being said, I do have the full {
"meta": {
"count": 152
},
"links": {
"first": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/ansible/netcommon/versions/?limit=100&offset=0",
"previous": null,
"next": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/ansible/netcommon/versions/?limit=100&offset=100",
"last": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/ansible/netcommon/versions/?limit=100&offset=52"
},
"data": [
{
"version": "3.0.0",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/ansible/netcommon/versions/3.0.0/",
"created_at": "2022-05-13T15:55:58.913107Z",
"updated_at": "2022-05-13T15:55:58.913121Z",
"requires_ansible": ">=2.9.10"
},
{
"version": "2.6.1",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/ansible/netcommon/versions/2.6.1/",
"created_at": "2022-05-13T15:55:58.913107Z",
"updated_at": "2022-05-13T15:55:58.913121Z",
"requires_ansible": ">=2.9.10"
},
{
"href": "snipped for brevity"
},
{
"version": "1.0.1-dev7",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/ansible/netcommon/versions/1.0.1-dev7/",
"created_at": "2022-05-13T15:55:58.913107Z",
"updated_at": "2022-05-13T15:55:58.913121Z",
"requires_ansible": ">=2.9.10,<2.11"
},
{
"version": "1.0.1-dev6",
"href": "/api/galaxy/content/community/v3/plugin/ansible/content/community/collections/index/ansible/netcommon/versions/1.0.1-dev6/",
"created_at": "2022-05-13T15:55:58.913107Z",
"updated_at": "2022-05-13T15:55:58.913121Z",
"requires_ansible": ">=2.9.10,<2.11"
}
]
} This 100 limit "feels" to me like one tool Now I feel like I should mention that I went back and did some experimenting with my older GalaxyNG 4.2.1 installation. I installed GalaxyNG 4.5.0 in a new VM "side-by-side" with my old install, because I could never get GalaxyNG 4.2.1 to sync my requirements.yml from galaxy.ansible.com. But I went back to 4.2.1 and started with smaller/simpler requirements.yml files. Lo and behold, with 1 or 2 collections in requirements.yml, GalaxyNG 4.2.1 was able to sync. However, if requirements.yml grew to include several collections (and I don't know what the real number is here), GalaxyNG 4.2.1 would fail to fully sync, often throwing some cryptic timeout and/or SSL errors. I began resource monitoring GalaxyNG 4.2.1 via Now here's the important bit that might help with root cause? GalaxyNG 4.2.1 apparently doesn't honor the Below is a little table showing my success/failures across versions of things:
So this is a very version dependent issue and I'm not sure which tool is actually causing the problem, I'm merely reporting my experiences. Unfortunately, I cannot find a way to open an issue with the GalaxyNG team directly, as they use a different issue tracker. Thanks for your time in looking! |
for any issues with galaxy.ansible.com go to https://github.com/ansible/galaxy/issues (which also hosts the older galaxy code, but it is still main interface for issues with the site). https://github.com/ansible/galaxy_ng is the repo for the new code, but as you can see issues are disabled in github, but README has a link to https://issues.redhat.com/projects/AAH/issues In any case this does look like an issue with the client, which means this is the correct repo to open it in. |
Based on the The next link from your curl seems to match what is failing:
I believe based on what the galaxy team provided us with, that should have instead been:
|
For the record, I've just launched galaxy_ng via this container: https://hub.docker.com/r/pulp/pulp-galaxy-ng I edited the
This is using Can you try adding |
Hmmm, interesting. I'll have to consult my playbook/notes. I'm installing onto a Rocky Linux 8.5 box using the pulp installer Ansible role. It was a challenge getting it to install, finding all of the versions of components that work together without dependency conflicts. Here's what I ran to perform the install: ansible-playbook playbooks/galaxy_ng_gist/enduser-install.yml \
-i ~/workspace/kiewit/ansible/inventories/ae_kiewitplaza/awx3_ng.yml \
--limit kneawxalp311.kiewitplaza.com \
--extra-vars "@playbooks/galaxy_ng_gist/enduser-install-vars.yml" \
-e pulp_version=3.18.5 This is using the playbooks/vars from the latest version of the GH gist. Here are the vars: pulp_default_admin_password: super_secret_password
pulp_install_source: pip
pulp_settings:
secret_key: secret
content_origin: "https://{{ inventory_hostname }}"
x_pulp_api_host: 127.0.0.1
x_pulp_api_port: 24817
x_pulp_api_user: "admin"
x_pulp_api_password: "{{ pulp_default_admin_password }}"
x_pulp_api_prefix: "pulp_ansible/galaxy/automation-hub/api"
galaxy_require_content_approval: "False"
pulp_token_auth_disabled: "True"
galaxy_api_default_distribution_base_path: "published"
pulp_install_plugins:
pulp-ansible: {}
galaxy-ng: {}
pulp-container:
version: 2.10.3
pulp_api_workers: 4
galaxy_importer_settings: {} Using the pulp.pulp_installer collection version 3.19.1. I had to hack pulp.pulp_installer/roles/pulp_common/vars and change # __pulpcore_version: '{{ pulpcore_version | default("3.19") }}'
__pulpcore_version: '{{ pulpcore_version | default("3.18.5") }}' That being said, I hadn't considered a container-based installation. |
This extra debugging may indicate what the issue is:
Also here are the versions installed in the container:
needs_info |
Hi, I am the assigned Red Hat consultant for a customer and we recently hit this bug after an upgrade to AAP 2.2. We have narrowed this issue down to the following stack trace:
As far as I can tell the issue lies with the caching combined with the fact that Automation Hub 4.5.0 has changed the path for the collections. So when traversing >100 versions the dependency resolver fails and returns an empty list of candidates. Possible workarounds we have found:
|
@ephracis if you clear the cache, either by calling Or does it only happen after upgrading to a version that uses a new URL, after already having a cache for an old URL? I guess I'm curious whether this is just a result of an out of date cache, or whether there is an issue with our caching in general. It's not trivial to set up galaxy_ng to test this, so I haven't been able to reproduce. |
Resetting the cache has no effect. I believe the bug is when the cache is set and not read. But fortunately |
@sivel I've been able to reproduce this on my labs with Private Automation Hub 4.5.0 and ansible-galaxy 2.13.0
the first 2 iterations are not failing:
|
This might be a little painful to fix, as we're likely going to have to rewrite the caching to not consolidate paginated requests under the initial path, and instead of short circuiting, we'll have to just keep all url + query string caches, and return those. Which also may change how cache invalidation is handled, since we don't have an easy way to nuke all URL caches for a collection, since they aren't very predictable. I'm not specifically working on this issue, just trying to provide info for the engineer that does end up working on this. |
Another option is we have to inspect the |
I don't know if this is a viable solution in the end, but does require the least number of changes potentially to the caching mechanism: Patch...
diff --git a/lib/ansible/galaxy/api.py b/lib/ansible/galaxy/api.py
index 55434f6d94..3935b188d3 100644
--- a/lib/ansible/galaxy/api.py
+++ b/lib/ansible/galaxy/api.py
@@ -329,25 +329,27 @@ class GalaxyAPI:
should_retry_error=is_rate_limit_exception
)
def _call_galaxy(self, url, args=None, headers=None, method=None, auth_required=False, error_context_msg=None,
- cache=False):
+ cache=False, cache_key=None):
url_info = urlparse(url)
cache_id = get_cache_id(url)
+ if not cache_key:
+ cache_key = url_info.path
query = parse_qs(url_info.query)
if cache and self._cache:
server_cache = self._cache.setdefault(cache_id, {})
iso_datetime_format = '%Y-%m-%dT%H:%M:%SZ'
valid = False
- if url_info.path in server_cache:
- expires = datetime.datetime.strptime(server_cache[url_info.path]['expires'], iso_datetime_format)
+ if cache_key in server_cache:
+ expires = datetime.datetime.strptime(server_cache[cache_key]['expires'], iso_datetime_format)
valid = datetime.datetime.utcnow() < expires
is_paginated_url = 'page' in query or 'offset' in query
if valid and not is_paginated_url:
# Got a hit on the cache and we aren't getting a paginated response
- path_cache = server_cache[url_info.path]
+ path_cache = server_cache[cache_key]
if path_cache.get('paginated'):
- if '/v3/' in url_info.path:
+ if '/v3/' in cache_key:
res = {'links': {'next': None}}
else:
res = {'next': None}
@@ -367,7 +369,7 @@ class GalaxyAPI:
# The cache entry had expired or does not exist, start a new blank entry to be filled later.
expires = datetime.datetime.utcnow()
expires += datetime.timedelta(days=1)
- server_cache[url_info.path] = {
+ server_cache[cache_key] = {
'expires': expires.strftime(iso_datetime_format),
'paginated': False,
}
@@ -392,7 +394,7 @@ class GalaxyAPI:
% (resp.url, to_native(resp_data)))
if cache and self._cache:
- path_cache = self._cache[cache_id][url_info.path]
+ path_cache = self._cache[cache_id][cache_key]
# v3 can return data or results for paginated results. Scan the result so we can determine what to cache.
paginated_key = None
@@ -807,6 +809,7 @@ class GalaxyAPI:
page_size_name = 'limit' if 'v3' in self.available_api_versions else 'page_size'
versions_url = _urljoin(self.api_server, api_path, 'collections', namespace, name, 'versions', '/?%s=%d' % (page_size_name, COLLECTION_PAGE_SIZE))
versions_url_info = urlparse(versions_url)
+ cache_key = versions_url_info.path
# We should only rely on the cache if the collection has not changed. This may slow things down but it ensures
# we are not waiting a day before finding any new collections that have been published.
@@ -826,7 +829,7 @@ class GalaxyAPI:
if cached_modified_date != modified_date:
modified_cache['%s.%s' % (namespace, name)] = modified_date
if versions_url_info.path in server_cache:
- del server_cache[versions_url_info.path]
+ del server_cache[cache_key]
self._set_cache()
@@ -834,7 +837,7 @@ class GalaxyAPI:
% (namespace, name, self.name, self.api_server)
try:
- data = self._call_galaxy(versions_url, error_context_msg=error_context_msg, cache=True)
+ data = self._call_galaxy(versions_url, error_context_msg=error_context_msg, cache=True, cache_key=cache_key)
except GalaxyError as err:
if err.http_code != 404:
raise
@@ -868,7 +871,7 @@ class GalaxyAPI:
next_link = versions_url.replace(versions_url_info.path, next_link)
data = self._call_galaxy(to_native(next_link, errors='surrogate_or_strict'),
- error_context_msg=error_context_msg, cache=True)
+ error_context_msg=error_context_msg, cache=True, cache_key=cache_key)
self._set_cache()
return versions |
@sivel I am also able to reproduce the issue with Ansible Automation Platform Controller 4.1.2 and Automation Hub 4.5.0 Operator based. Project sync from Controller.
collections:
Setting the specific version and syncing the collection using requirements.yml fix the issue. |
@watsonb This issue is waiting for your response. Please respond or the issue will be closed. |
@watsonb You have not responded to information requests in this issue so we will assume it no longer affects you. If you are still interested in this, please create a new issue with the requested information. |
!needs_info |
Summary
Given I have installed GalaxyNG and published custom content and synchronized content from galaxy.ansible.com, when I try to install collections listed in a
requirements.yml
via a modern version ofansible-galaxy
configured to use GalaxyNG,ansible-galaxy
fails when encountering thecommunity.vmware
collection.If you comment out
community.vmware
fromrequirements.yml
, all other collections (custom published and synchronized) install fine.If you uncomment
community.vmware
and use an older version ofansible-galaxy
(e.g., Ansible 2.9.12), all collections install fine.Issue Type
Bug Report
Component Name
ansible-galaxy
Ansible Version
Configuration
OS / Environment
Ubuntu 20.04 running Ansible installed via
pip
into a virtual environment.Steps to Reproduce
GalaxyNG installed, custom collection published, remote collections synchronized to GalaxyNG via the following requirements.yml
Notice there are 316 community.vmware versions:
Expected Results
I expect all listed requirements to be forcefully re-installed.
Actual Results
Code of Conduct
The text was updated successfully, but these errors were encountered: