Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Basic Image microservice. #179

Closed
whikloj opened this issue Apr 6, 2016 · 46 comments
Closed

Create a Basic Image microservice. #179

whikloj opened this issue Apr 6, 2016 · 46 comments
Assignees

Comments

@whikloj
Copy link
Member

whikloj commented Apr 6, 2016

We need a basic image microservice.

This will generate a pcdm:Object with a UUID and hasURN like the existing Collection microservice.

Questions:

  1. Do we want to create a sub-class of pcdm:Object for basic image? This should be done first.
  2. Will it create any indirect objects to hold the actual binaries, or should this be delayed (possibly only for seconds) until the binary is uploaded.
  3. What interactions do we want?
    1. CRUD a basic image resource
    2. CRUD a binary image on the basic image resource (should this be separate and linked via proxies?)
    3. CRUD metadata about a binary (
@whikloj
Copy link
Member Author

whikloj commented Apr 6, 2016

Regarding 2ii above, by separate I mean two separate actions. So you can create the basic image resource, and then later you can add the binary.

@ruebot
Copy link
Member

ruebot commented Apr 6, 2016

  1. Are our sub-classes our "solution-packs"? If so, yeah. This was our game plan.
  2. Well, we're going to need to create the pcdm:File container, right? Should that be a service this service calls? Assuming we're still going with this model.
  3. ...
    • 👍
    • 👍 (proxies - I guess that depends on 2?)
    • Would this be a separate service?

@whikloj
Copy link
Member Author

whikloj commented Apr 6, 2016

  1. I would say yes, we can let @DiegoPino do some data modelling for us.
  2. I guess, see I'm not up-to-speed on the whole PCDM thing. But okay, we need a files indirect container, that will work like the pcdm:Collection services members indirect container.

About that image though, here is the thing I am wondering about.

If we put a JPG up, then we automatically have a place in Fedora 4 to store metadata about that JPG (the /fcr:metadata endpoint). So every ldp:NonRdfSource also has an ldp:RdfSource associated with it.

Then each binary can have it's FITS (or whatever) metadata applied on it's RdfSource and we don't have to have a separate FITS file. Unless you really want to add a file of metadata, but I'd just stick the metadata as properties.

@ruebot
Copy link
Member

ruebot commented Apr 6, 2016

  1. Cool. That was the idea back at the Islandora Conference.
  2. Sounds about right.

FITS... yeah. That is complicated. I'd honestly leave it as a NonRdfSource; xml file on the file system. But! BUT! @acoburn and I lead this working-group-sub-group, which created a Technical Metadata Application Profile. Maybe we're finally there? 😄

@whikloj
Copy link
Member Author

whikloj commented Apr 27, 2016

@DiegoPino you mentioned you are working on this, yes?

@DiegoPino
Copy link
Contributor

@whikloj, yes i am. So open to suggestions

@DiegoPino DiegoPino self-assigned this Apr 27, 2016
@whikloj
Copy link
Member Author

whikloj commented Apr 27, 2016

Nope, was just going to assign it to you. All good.

@DiegoPino
Copy link
Contributor

@ruebot & @whikloj can you give https://www.ebu.ch/metadata/ontologies/ebucore/ a look and give me some guides/ideas if this is enough to describe tech metadata for now? We can add as many other extra nonRdfSources if needed (like FITS) but i would like to have some base ones as RDF too.

@ruebot
Copy link
Member

ruebot commented Apr 27, 2016

@ruebot
Copy link
Member

ruebot commented Apr 27, 2016

...and my preference would still be to have FITS xml file stored (wearing my preservationista hat). So, I'm sure I'll be create a FITS service sooner or later for that.

@DiegoPino
Copy link
Contributor

@ruebot, yeah that is fine (Hydra links), but there is no agreement right now there right?And this is limiting us to a few ebucore entities? I mean, if we have the whole ontology available, why limit us self to a subset and also put other namespaces in place too to complement? 100% with you on FITS.

@DiegoPino
Copy link
Contributor

Also, what about EXIF?

@ruebot
Copy link
Member

ruebot commented Apr 27, 2016

@DiegoPino the second link is the agreement. That's what the Hydra folks are implementing.

@ruebot
Copy link
Member

ruebot commented Apr 27, 2016

EXIF is in FITS

@DiegoPino
Copy link
Contributor

@ruebot, ok, the agreement has non existing stuff, e.g pcdm:Document can't be the domain of (not a class?), do they mean https://github.com/duraspace/pcdm/blob/master/pcdm-ext/file-format-types.rdf?
pronom also?
Ok, i would vote for this on Claw: lets make FITS happen. Also, cool: http://projects.iq.harvard.edu/fits/news/fits-web-service-v111-released
We need a way to extract the data anyway.
Then transform FITS to ebucore and Exif (exif RDF, not the xml). We will still have the hydrastuff, + our own extras.
Sounds good?

@ruebot
Copy link
Member

ruebot commented Apr 27, 2016

Sorry, I'm confused, is this meant to be in the image service? It seems to be like there should be a standalone/complimentary file characterization/identification service. Because more than image is going to need it. Might be getting ahead of ourselves?

@DiegoPino
Copy link
Contributor

DiegoPino commented Apr 27, 2016

#212 was just keeping the conversation here, out of scope of course but related to any pcdm:Object like

@DiegoPino
Copy link
Contributor

@ruebot and @whikloj, @acoburn, @br2490, @nigelgbanks, @edf, @dltj. Last intervention before coding: Do we really need a difference between basic image, large image, or any other type of visual/2D non moving content modelling? It's all about the derivatives at the end and the viewers. So what about "Let's do an Still Image Microservice?"

@ruebot
Copy link
Member

ruebot commented Apr 27, 2016

http://pcdm.org/2015/10/14/file-format-types#Image

That would be my vote.

@acoburn
Copy link
Contributor

acoburn commented Apr 27, 2016

@ruebot: you mean something like, trigger on:

<> dcterms:format pcdmformat:Image

@ruebot
Copy link
Member

ruebot commented Apr 27, 2016

@acoburn I think it would have to be a combination of pcdm:Image plus mime-type. But, then we hit the problem @daniel-dgi and I talked about early on in the project. An image could be just a plain old jpg like we have now with the Basic Image SP. Or a tiff/jp2, which could be a still image (digitized photograph), or digitized page of a book. So, then, does it get OCR'd as well? Or maybe that can be solved with another predicate... like what we were considering our SPs to become. A combination of predicates on objects, plus services.

@DiegoPino
Copy link
Contributor

@ruebot i will follow your advice (mm...i feel so much pcdm is redefining what is already in place...exact matches everywhere!), but in this case i would go anything. I see Fits service digest almost anything.

@DiegoPino
Copy link
Contributor

DiegoPino commented Apr 27, 2016

@ruebot definitively a predicates + rdf:type match. I would say, lets process only what is a preservation master or is marked to be processed somehow. Avoid derivatives to trigger this. Also: OCR and this type of further processing would (in my preference) not be a data modelling issue, but a decision based on formats. So a image could have OCR triggered if the user wants so. This brings us closer to the reality of RDF versus our old/fixed Content models.

@acoburn
Copy link
Contributor

acoburn commented Apr 27, 2016

@ruebot sure, the matching predicate for this can arbitrarily complex. But remember, we can follow Borges's pattern with https://en.wikipedia.org/wiki/The_Garden_of_Forking_Paths

Binary (mime/type = x) -> endpoint a, b
Binary (mime/type = y) -> endpoint c
Binary (mime/type = z) -> endpoint a, c, d

@ruebot
Copy link
Member

ruebot commented Apr 27, 2016

I guess this my cue for further complicating things by suggesting we make use of the Archivematica Format Policy Registry... https://www.archivematica.org/en/docs/fpr/

@DiegoPino
Copy link
Contributor

@ruebot, no further complication at all. This is what we talked a few months ago:
I think this is fine: So i will add to my workflow Archivematica Format Policy Registry and research that side (i don't have an Archivematica background good enough for this stuff, but will learn)

@ruebot
Copy link
Member

ruebot commented Apr 27, 2016

paging @jhsimpson -- you might be interested in where this conversation is going.

@jhsimpson
Copy link

Is the PCDM use ontology relevant here:
https://github.com/duraspace/pcdm/blob/master/pcdm-ext/use.rdf

The Archivematica Format Policy Registry currently uses PRONOM id's as the key for identifying file formats, and there is no rdf version of pronom (yet). So that is a problem. The current version of the FPR also does not understand linked data.

I am not sure if the timing will work out, but I think work on a new, linked data based version of the FPR will be getting under way soon. There is a short video I made a year ago describing the idea:
https://www.youtube.com/watch?v=dfRtZFiRp6U&feature=youtu.be

We made a format policy registry mailing list last year also, which never really got off the ground:
https://groups.google.com/forum/#!forum/format-policy-registry

@DiegoPino @ruebot I would encourage you to ask questions about the fpr on that list.

@mjordan wrote a proof of concept FPR module for Islandora last year:
https://github.com/mjordan/islandora_fpr

That might be one place to start?

@ruebot
Copy link
Member

ruebot commented Apr 28, 2016

@jhsimpson would you be willing to join us on CLAW Call in the next couple weeks to flesh this out a bit more? I'm happy to devote an entire call to it if need be.

@jhsimpson
Copy link

Yes for sure definitely. Let me know a time.
On Apr 27, 2016 18:55, "Nick Ruest" [email protected] wrote:

@jhsimpson https://github.com/jhsimpson would you be willing to join us
on CLAW Call in the next couple weeks to flesh this out a bit more? I'm
happy to devote an entire call to it if need be.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#179 (comment)

@ruebot
Copy link
Member

ruebot commented Apr 28, 2016

How about the May 18th call?

@jhsimpson
Copy link

jhsimpson commented Apr 28, 2016

May 18th - yep, I will be there.

@ruebot
Copy link
Member

ruebot commented Apr 28, 2016

@jhsimpson awesome! I've updated the agenda, and have a FITS Web Service item on there -- @DiegoPino @acoburn -- since it might be a good overlap discussion.

@jhsimpson
Copy link

Here are some details about how to interact with the Archivematica FPR Server, as it exists right now:

This should give you a list of all the end points:
https://fpr.archivematica.org/fpr/api/v2/?format=json

Lets say you have a file, that you have already identified as a GIF, and you know the pronom id:
https://fpr.archivematica.org/fpr/api/v2/format-version/?pronom_id=fmt/4&format=json
this returns info about GIF 1989a

take the uuid from that and put it in this url:

https://fpr.archivematica.org/fpr/api/v2/fp-rule/?purpose=access&format=json&fmt=6370b72f-4caa-4d90-abc6-4816c8a0a603

take the uuid of that fp-rule and plug it into this :

https://fpr.archivematica.org/fpr/api/v2/fp-command/?format=json&uuid=6957fdac-a1ed-470f-89f7-fb00be42ea13

now you have the command line for convert, the utility in imagemagick that conversts the gif to a jpg.

for additional info, you can get details about the version of the tool (convert) with this:
https://fpr.archivematica.org/fpr/api/v2/fp-tool/?format=json&uuid=8d81cd4f-20ee-4a82-9eca-455699509cd5

@DiegoPino
Copy link
Contributor

@jhsimpson++. This is very cool. Thanks a lot

@axfelix
Copy link

axfelix commented Jun 16, 2016

Since I'm assuming we never ever ever want to do derivative generation at ingest time ever again for scalability reasons, what would be the simplest way to implement this currently? Something like:

  1. Using Drupal's cron(?), query the FPR using Justin's example for any objects which don't already have the "DIP created" flag (or something along those lines, not sure if it makes sense to adopt OAIS language for Islandora derivatives), get command, run command. Run command where? If not wanting to use any Archivematica components other than the FPR, easiest way to ensure all derivative creation utilities are installed on Islandora server would be to pull in Archivematica tools metapackage (which will soon exist for RPM as well as Deb) when installing Islandora. Could scale this all out to different machines but at that point we have to justify why we're not pulling in all of Archivematica as they've sort of worked out this "different pipelines on different servers" scalability issue already. Generated DIP/AIP then added as Fedora datastreams just like in 1.x?
  2. So that we avoid having to ship a bunch of different solution packs for different content types in 2.x (as I understand we want to get away from this), what if we have a master library written in pure js that uses pcdmFormat of DIP datastreams to define+render a default "viewer" for each format type? Would be easily embeddable (thus of interest to Atom/Sufia/others) and could be overridden for individual content types for anyone wanting to customize frontend. This way we could have the 1.x viewer functionality like OpenSeadragon, video.js, etc. all available "out of the box" with minimal overhead.

@ruebot
Copy link
Member

ruebot commented Jun 16, 2016

@axfelix Apache Camel 😄

...we can expand more on this later... since I'm about to give another presentation here at OR.

@axfelix
Copy link

axfelix commented Jun 16, 2016

OK, sure. I do not know anything about Camel at this point so I'm not sure how much of my spitballing is already handled :)

@DiegoPino
Copy link
Contributor

@axfelix we are giving Fedora 4 API-X a look also (had a CLAW API-X FEDORA 4 meeting here at OR2016). And guess what. First prototype will be based on our php MicroServices idea. Async, cross platform and based on existing good practices (Archivematica).I'm look to have a talk about this in the next Claw call and you are very well invited to join us. Thanks for bringing this up1

@axfelix
Copy link

axfelix commented Jun 16, 2016

I'll be there! I'm not sure how many cycles I'll have for this and I'm neither qualified for nor interested in "low-level" PHP architecting (which is why I've stayed out of CLAW discussions so far) but I do want to see how this part shapes up.

@acoburn
Copy link
Contributor

acoburn commented Jun 16, 2016

@ruebot I have already written such a service in camel/OSGi. More on that soon.

@acoburn
Copy link
Contributor

acoburn commented Jun 16, 2016

@ruebot Here's my implementation of the image service: https://gitlab.amherst.edu/acdc/repository-extension-services/tree/master/acrepo-image-service

It streams binary data directly from Fedora through ImageMagick and then back out. It handles format conversions, resizes, etc. Basically, anything convert can handle.

@acoburn
Copy link
Contributor

acoburn commented Jun 16, 2016

Oh, and you can completely ignore the owl inference stuff (i.e. the OPTIONS endpoint). It's completely half-baked and quite possibly wrong -- I'd like to get some feedback from @DiegoPino on that. The idea is that for API-X, services should provide a set of OWL restrictions so that services can be dynamically bound to certain resources, but none of that inference piece has actually been implemented.

@acoburn
Copy link
Contributor

acoburn commented Jun 16, 2016

...and also, we have a FITS metadata extraction service in the works -- should be ready this week. It exposes a REST endpoint and then pipes fedora:Binary resources through the FITS-servlet web application (which must be running somewhere) and returns the FITS xml document.

@ruebot
Copy link
Member

ruebot commented Jun 16, 2016

@acoburn y'all are pretty awesome 😄

...skimming through the other services and issues you have there...

@DiegoPino
Copy link
Contributor

@jhsimpson and @ruebot just got this one from @rosiel (thanks a lot Rosie!). Not sure if this is the right spot for new ideas for provenance ontologies, but since there is still no archivematica github repo for discussing, i will just copy here for posterity. http://www.ics.forth.gr/isl/index_main.php?l=e&c=656 and http://doc.objectspace.org/cidoc/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants