-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double forward slashes in some Gemini records for files migrated from 7.x #1014
Comments
I don't think they are missing path information, The AUDIT one has path
So we add the sub-path to the repository base in Fedora. So for
it is Probably we have a case where with a sub-path there is no trailing slash and we didn't include smart enough logic to test for a double |
@whikloj thanks for the clarification, my report should have been more specific ("Double forward slashes in some Gemini records..."). This extra forward slash would be in one of the migrate YAML config files? |
@mjordan because the extra forward slash only appears in the Fedora URI I would guess this is something in Gemini that needs to be addressed. |
I'm hacking around in
Even when I change the
Where should I be looking? |
@mjordan I wonder if it is here
|
Hrm... that |
Dunno what's going on. When I added an unexpected character to
I am still getting entries in Gemini that have fedora_uris that don't end in random numbers:
Which doesn't make sense. |
@mjordan ok so I think (having not tested) I have the issue. First
So the base_url always has a following
If
and you have a double slash. |
@whikloj yeah, I had code to check for an empty |
OK, I see it now. It's because we're talking about files, whose paths are essentially mirrored in Fedora. When a file is created, its fedora uri is converted from the Drupal file uri and put into a message here: https://github.com/Islandora-CLAW/islandora/blob/8.x-1.x/src/Plugin/Action/EmitFileEvent.php#L107 That message is put onto the queue using Context and Alpaca picks it up and indexes in Gemini here: https://github.com/Islandora-CLAW/Alpaca/blob/master/islandora-indexing-fcrepo/src/main/java/ca/islandora/alpaca/indexing/fcrepo/FcrepoIndexer.java#L189-L210 It's inconsistent with Milliner because it bypasses Gemini altogether, but it is consistent with Drupal and respects the tokens you give it for a destination path when uploading a file, etc... It'd be nice to find an elegant way to take the best of both worlds and just have one approach, but that's quite a bit larger in scope than just fixing the extra Anyway, pretty sure its the |
@dannylamb thanks, I'll take a look at fixing the extra |
I am pleased to report that simply removing the double $data = parent::generateData($entity);
if (isset($flysystem_config[$scheme]) && $flysystem_config[$scheme]['driver'] == 'fedora') {
$fedora_uri = str_replace("$scheme://", $flysystem_config[$scheme]['config']['root'], $uri);
$data['fedora_uri'] = str_replace('//', '/', $fedora_uri);
}
return $data; Rerunning the migration with this code in place produces the expected entries in Gemini:
If you're OK with that simple fix, I'll open a PR, but I'll wait until we are OK with the suspicious |
@mjordan nice...except
you have |
Well, that was embarrassing, thanks for seeing that @whikloj. Think I got it this time with if (isset($flysystem_config[$scheme]) && $flysystem_config[$scheme]['driver'] == 'fedora') {
// $uri for files may contain 'fedora:///' so we need to replace the three / with two.
if (strpos($uri, '///') !== FALSE) {
$uri = str_replace('///', '//', $uri);
}
$data['fedora_uri'] = str_replace("$scheme://", $flysystem_config[$scheme]['config']['root'], $uri);
}
I'll rebuild a fresh VM and test this to see if the |
On a clean box, I am getting
But the good news is, the following code addresses the double $data = parent::generateData($entity);
if (isset($flysystem_config[$scheme]) && $flysystem_config[$scheme]['driver'] == 'fedora') {
// $uri for files may contain 'fedora:///' so we need to replace the three / with two.
if (strpos($uri, 'fedora:///') !== FALSE) {
$uri = str_replace('fedora:///', 'fedora://', $uri);
}
$data['fedora_uri'] = str_replace("$scheme://", $flysystem_config[$scheme]['config']['root'], $uri);
}
return $data; Should I open a PR to include that or do you want me to wait/help figure out where that |
@mjordan Just the |
Closing since this PR has been merged. |
Looking at the Gemini database records for some files created during a migration from 7.x, we can see that some of the "fedora_uri" fields are missing path information. For example, below we see this in the entries for the OBJ and MODS files:
However, the entry for the AUDIT file appears to be complete. Queries like
select * from Gemini where drupal_uri like "%MODS%"\G
show this pattern exists for all MODS files, etc. Can anyone else replicate this?Files that are created by manually ingesting repository objects do not show this behavior (that is, they have complete paths in Gemini).
The text was updated successfully, but these errors were encountered: