Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider saving content publication date in the ZIM #9

Open
kelson42 opened this issue Feb 19, 2022 · 9 comments
Open

Consider saving content publication date in the ZIM #9

kelson42 opened this issue Feb 19, 2022 · 9 comments

Comments

@kelson42
Copy link
Contributor

And specify it is the ZIM specification.

For the moment, we have only the ZIM creation date, but this might be really different from the content publication date, in particular if the content is really old.

Folllowing a comment from https://github.com/veloman-yunkan at kiwix/libkiwix#702 (comment)

@stale
Copy link

stale bot commented Apr 25, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Apr 25, 2022
@stale stale bot removed the stale label Sep 11, 2022
@kelson42
Copy link
Contributor Author

kelson42 commented Sep 11, 2022

@Popolechien @rgaudin @mgautierfr @veloman-yunkan I would like to move forward with this. We need to have a way to store the original content date as ZIM Medatada. As a reminder, we already have the « Date » metadat which is automatically set and represents the ZIM creation date, see https://wiki.openzim.org/wiki/Metadata

To me there is only questions about the naming:

  • Should we rename « Date » metadata to be more clear?
  • What would be the name of the new metadata?

IMO, we would benefit of using:

  • « Published » in place of « Date », which could act as « default » publishing date. Would still be the ZIM creation date but could be - if needed - superseeded by an other value on decision of the CMS (so in library.xml).
  • « Issued » as new optional Metadata, which would be the content publishing date. For most of our scrapers, this would be the same as « Published ». But for a very old content, we could set it differently and same for zimrecreate for example, which would keep the original ZIM file value.

What do you think?

@kelson42 kelson42 self-assigned this Sep 11, 2022
@rgaudin
Copy link
Member

rgaudin commented Sep 12, 2022

Agrees with the proposal although I think explicit suffix would be better: PublishedOn, IssuedOn for instance.

I am not sure about the implications of renaming Date (to PublishedOn if I understood correctly) given it's a mandatory metadata but if it's not an issue, I'd say yes.

@mgautierfr
Copy link

Why not Created instead of Date ? For me it represent a bit better that the zim file as been created at this time.
I'm not sure about Published and Issued.
There is two different date of "publication":

  • The date on which the content has been initially published (this issue is about that)
  • The data on which the zim file itself has been initially published (added to a catalog). This data cannot obviously be stored in the zim file, but it could be added to library.xml and handled by libkiwix.

It would be nice to define all those therms now, even if we don't implement them. Appart that, I don't have a strong opinion on which term to use for what.
Another potential candidate could be distributed, released or reissued/republished

I am not sure about the implications of renaming Date (to PublishedOn if I understood correctly) given it's a mandatory metadata but if it's not an issue, I'd say yes.

Creator should be updated but there will be no urgency on this as reader will have a lot of "old" zim file to handle anyway.
On reader side, Date is handle as a optional metadata, if the date is missing we use a empty (string) value. I don't know how it behaves when we try to sort books by date...
We would have to adapt the reader part to search on <newName>else use Date, but it seems as small change.

@kelson42
Copy link
Contributor Author

"Date" is not optional says https://wiki.openzim.org/wiki/Metadata, @mgautierfr please open the necessary issues.

I have no strong opinion on "Created" versus "Published". @rgaudin Any opinion?

@mgautierfr Honestly, I believe to have defined "Published" and "Issued" clearly. What is unclear exactly?

@mgautierfr
Copy link

"Date" is not optional says https://wiki.openzim.org/wiki/Metadata, @mgautierfr please open the necessary issues.

libzim never enforce the "Mandatory" property on metadata (except for Counter as it is created by libzim). The mandatory property is enforced at scrapper level (zimwriterfs does it, python-libzim and python-scrapperlib doesn't but maybe it is enforced in other tools). And I'm not sure we need to enforce it. What do we (libzim) do if user doesn't provide it ? We may refuse to create the zim file but we can detect this only at the end of the process, do we really want to wait the end of the creation to fail because Date is not provided and "everything" would work if we accept to create it ?

On reader side, we don't expect Date as mandatory. But same question here, do we refuse to read a working zim only because Date (or any metadata) is missing ?

@mgautierfr Honestly, I believe to have defined "Published" and "Issued" clearly. What is unclear exactly?

I agree with you definitions. My point is that there is another notion of date (when the zim itself is publish). This third notion cannot (and will not) be added to zim file but we should integrate this notion in our nomenclature as we may add it in the catalog (where it will have to be consistent with metadata pushed in zim file)
This is why I propose Created for the creation date of the zim file and Published for when the zim file as been added to the catalog.

@kelson42
Copy link
Contributor Author

I agree with you definitions. My point is that there is another notion of date (when the zim itself is publish). This third notion cannot (and will not) be added to zim file but we should integrate this notion in our nomenclature as we may add it in the catalog (where it will have to be consistent with metadata pushed in zim file)
This is why I propose Created for the creation date of the zim file and Published for when the zim file as been added to the catalog.

I don't think we need an additonal one. "Published" is there and can be (and will be) superseeded in the CMS. Once the CMS takes the leadership on metadata, the time of the technical creation of the ZIM does not really matter anymore.

@kelson42
Copy link
Contributor Author

  • Replace "Date" by "Created"
  • "Created" value can be taken as "Published" value (per default) by Zimfarm and otherwise put an other value.

@stale
Copy link

stale bot commented May 26, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants