Create a Basic Image microservice. #179

whikloj · 2016-04-06T03:27:31Z

We need a basic image microservice.

This will generate a pcdm:Object with a UUID and hasURN like the existing Collection microservice.

Questions:

Do we want to create a sub-class of pcdm:Object for basic image? This should be done first.
Will it create any indirect objects to hold the actual binaries, or should this be delayed (possibly only for seconds) until the binary is uploaded.
What interactions do we want?
1. CRUD a basic image resource
2. CRUD a binary image on the basic image resource (should this be separate and linked via proxies?)
3. CRUD metadata about a binary (

whikloj · 2016-04-06T03:29:37Z

Regarding 2ii above, by separate I mean two separate actions. So you can create the basic image resource, and then later you can add the binary.

ruebot · 2016-04-06T03:32:34Z

Are our sub-classes our "solution-packs"? If so, yeah. This was our game plan.
Well, we're going to need to create the pcdm:File container, right? Should that be a service this service calls? Assuming we're still going with this model.
...
- 👍
- 👍 (proxies - I guess that depends on 2?)
- Would this be a separate service?

whikloj · 2016-04-06T03:55:21Z

I would say yes, we can let @DiegoPino do some data modelling for us.
I guess, see I'm not up-to-speed on the whole PCDM thing. But okay, we need a files indirect container, that will work like the pcdm:Collection services members indirect container.

About that image though, here is the thing I am wondering about.

If we put a JPG up, then we automatically have a place in Fedora 4 to store metadata about that JPG (the /fcr:metadata endpoint). So every ldp:NonRdfSource also has an ldp:RdfSource associated with it.

Then each binary can have it's FITS (or whatever) metadata applied on it's RdfSource and we don't have to have a separate FITS file. Unless you really want to add a file of metadata, but I'd just stick the metadata as properties.

ruebot · 2016-04-06T04:46:52Z

Cool. That was the idea back at the Islandora Conference.
Sounds about right.

FITS... yeah. That is complicated. I'd honestly leave it as a NonRdfSource; xml file on the file system. But! BUT! @acoburn and I lead this working-group-sub-group, which created a Technical Metadata Application Profile. Maybe we're finally there? 😄

whikloj · 2016-04-27T20:17:48Z

@DiegoPino you mentioned you are working on this, yes?

DiegoPino · 2016-04-27T20:26:58Z

@whikloj, yes i am. So open to suggestions

whikloj · 2016-04-27T20:28:12Z

Nope, was just going to assign it to you. All good.

DiegoPino · 2016-04-27T20:42:26Z

@ruebot & @whikloj can you give https://www.ebu.ch/metadata/ontologies/ebucore/ a look and give me some guides/ideas if this is enough to describe tech metadata for now? We can add as many other extra nonRdfSources if needed (like FITS) but i would like to have some base ones as RDF too.

ruebot · 2016-04-27T20:45:29Z

@DiegoPino how about this?

That's the group @acoburn and I led last year.

ruebot · 2016-04-27T20:46:31Z

...and my preference would still be to have FITS xml file stored (wearing my preservationista hat). So, I'm sure I'll be create a FITS service sooner or later for that.

DiegoPino · 2016-04-27T20:48:45Z

@ruebot, yeah that is fine (Hydra links), but there is no agreement right now there right?And this is limiting us to a few ebucore entities? I mean, if we have the whole ontology available, why limit us self to a subset and also put other namespaces in place too to complement? 100% with you on FITS.

DiegoPino · 2016-04-27T20:50:42Z

Also, what about EXIF?

ruebot · 2016-04-27T20:58:45Z

@DiegoPino the second link is the agreement. That's what the Hydra folks are implementing.

ruebot · 2016-04-27T20:59:13Z

EXIF is in FITS

DiegoPino · 2016-04-27T21:06:07Z

@ruebot, ok, the agreement has non existing stuff, e.g pcdm:Document can't be the domain of (not a class?), do they mean https://github.com/duraspace/pcdm/blob/master/pcdm-ext/file-format-types.rdf?
pronom also?
Ok, i would vote for this on Claw: lets make FITS happen. Also, cool: http://projects.iq.harvard.edu/fits/news/fits-web-service-v111-released
We need a way to extract the data anyway.
Then transform FITS to ebucore and Exif (exif RDF, not the xml). We will still have the hydrastuff, + our own extras.
Sounds good?

ruebot · 2016-04-27T21:11:20Z

Sorry, I'm confused, is this meant to be in the image service? It seems to be like there should be a standalone/complimentary file characterization/identification service. Because more than image is going to need it. Might be getting ahead of ourselves?

DiegoPino · 2016-04-27T21:12:52Z

#212 was just keeping the conversation here, out of scope of course but related to any pcdm:Object like

DiegoPino · 2016-04-27T21:20:56Z

@ruebot and @whikloj, @acoburn, @br2490, @nigelgbanks, @edf, @dltj. Last intervention before coding: Do we really need a difference between basic image, large image, or any other type of visual/2D non moving content modelling? It's all about the derivatives at the end and the viewers. So what about "Let's do an Still Image Microservice?"

ruebot · 2016-04-27T21:23:45Z

http://pcdm.org/2015/10/14/file-format-types#Image

That would be my vote.

acoburn · 2016-04-27T21:25:32Z

@ruebot: you mean something like, trigger on:

<> dcterms:format pcdmformat:Image

ruebot · 2016-04-27T21:29:10Z

@acoburn I think it would have to be a combination of pcdm:Image plus mime-type. But, then we hit the problem @daniel-dgi and I talked about early on in the project. An image could be just a plain old jpg like we have now with the Basic Image SP. Or a tiff/jp2, which could be a still image (digitized photograph), or digitized page of a book. So, then, does it get OCR'd as well? Or maybe that can be solved with another predicate... like what we were considering our SPs to become. A combination of predicates on objects, plus services.

DiegoPino · 2016-04-27T21:30:21Z

@ruebot i will follow your advice (mm...i feel so much pcdm is redefining what is already in place...exact matches everywhere!), but in this case i would go anything. I see Fits service digest almost anything.

DiegoPino · 2016-04-27T21:32:02Z

@ruebot definitively a predicates + rdf:type match. I would say, lets process only what is a preservation master or is marked to be processed somehow. Avoid derivatives to trigger this. Also: OCR and this type of further processing would (in my preference) not be a data modelling issue, but a decision based on formats. So a image could have OCR triggered if the user wants so. This brings us closer to the reality of RDF versus our old/fixed Content models.

acoburn · 2016-04-27T21:33:32Z

@ruebot sure, the matching predicate for this can arbitrarily complex. But remember, we can follow Borges's pattern with https://en.wikipedia.org/wiki/The_Garden_of_Forking_Paths

Binary (mime/type = x) -> endpoint a, b
Binary (mime/type = y) -> endpoint c
Binary (mime/type = z) -> endpoint a, c, d

ruebot · 2016-04-27T21:39:24Z

I guess this my cue for further complicating things by suggesting we make use of the Archivematica Format Policy Registry... https://www.archivematica.org/en/docs/fpr/

DiegoPino · 2016-04-27T21:43:47Z

@ruebot, no further complication at all. This is what we talked a few months ago:
I think this is fine: So i will add to my workflow Archivematica Format Policy Registry and research that side (i don't have an Archivematica background good enough for this stuff, but will learn)

ruebot · 2016-04-27T21:44:59Z

paging @jhsimpson -- you might be interested in where this conversation is going.

jhsimpson · 2016-04-27T22:47:07Z

Is the PCDM use ontology relevant here:
https://github.com/duraspace/pcdm/blob/master/pcdm-ext/use.rdf

The Archivematica Format Policy Registry currently uses PRONOM id's as the key for identifying file formats, and there is no rdf version of pronom (yet). So that is a problem. The current version of the FPR also does not understand linked data.

I am not sure if the timing will work out, but I think work on a new, linked data based version of the FPR will be getting under way soon. There is a short video I made a year ago describing the idea:
https://www.youtube.com/watch?v=dfRtZFiRp6U&feature=youtu.be

We made a format policy registry mailing list last year also, which never really got off the ground:
https://groups.google.com/forum/#!forum/format-policy-registry

@DiegoPino @ruebot I would encourage you to ask questions about the fpr on that list.

@mjordan wrote a proof of concept FPR module for Islandora last year:
https://github.com/mjordan/islandora_fpr

That might be one place to start?

ruebot · 2016-04-28T01:55:23Z

@jhsimpson would you be willing to join us on CLAW Call in the next couple weeks to flesh this out a bit more? I'm happy to devote an entire call to it if need be.

jhsimpson · 2016-04-28T02:20:32Z

Yes for sure definitely. Let me know a time.
On Apr 27, 2016 18:55, "Nick Ruest" [email protected] wrote:

@jhsimpson https://github.com/jhsimpson would you be willing to join us
on CLAW Call in the next couple weeks to flesh this out a bit more? I'm
happy to devote an entire call to it if need be.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#179 (comment)

ruebot · 2016-04-28T10:58:09Z

How about the May 18th call?

jhsimpson · 2016-04-28T19:23:20Z

May 18th - yep, I will be there.

ruebot · 2016-04-28T19:25:49Z

@jhsimpson awesome! I've updated the agenda, and have a FITS Web Service item on there -- @DiegoPino @acoburn -- since it might be a good overlap discussion.

jhsimpson · 2016-05-18T17:50:04Z

Here are some details about how to interact with the Archivematica FPR Server, as it exists right now:

This should give you a list of all the end points:
https://fpr.archivematica.org/fpr/api/v2/?format=json

Lets say you have a file, that you have already identified as a GIF, and you know the pronom id:
https://fpr.archivematica.org/fpr/api/v2/format-version/?pronom_id=fmt/4&format=json
this returns info about GIF 1989a

take the uuid from that and put it in this url:

https://fpr.archivematica.org/fpr/api/v2/fp-rule/?purpose=access&format=json&fmt=6370b72f-4caa-4d90-abc6-4816c8a0a603

take the uuid of that fp-rule and plug it into this :

https://fpr.archivematica.org/fpr/api/v2/fp-command/?format=json&uuid=6957fdac-a1ed-470f-89f7-fb00be42ea13

now you have the command line for convert, the utility in imagemagick that conversts the gif to a jpg.

for additional info, you can get details about the version of the tool (convert) with this:
https://fpr.archivematica.org/fpr/api/v2/fp-tool/?format=json&uuid=8d81cd4f-20ee-4a82-9eca-455699509cd5

DiegoPino · 2016-05-18T18:02:07Z

@jhsimpson++. This is very cool. Thanks a lot

axfelix · 2016-06-16T09:31:31Z

Since I'm assuming we never ever ever want to do derivative generation at ingest time ever again for scalability reasons, what would be the simplest way to implement this currently? Something like:

Using Drupal's cron(?), query the FPR using Justin's example for any objects which don't already have the "DIP created" flag (or something along those lines, not sure if it makes sense to adopt OAIS language for Islandora derivatives), get command, run command. Run command where? If not wanting to use any Archivematica components other than the FPR, easiest way to ensure all derivative creation utilities are installed on Islandora server would be to pull in Archivematica tools metapackage (which will soon exist for RPM as well as Deb) when installing Islandora. Could scale this all out to different machines but at that point we have to justify why we're not pulling in all of Archivematica as they've sort of worked out this "different pipelines on different servers" scalability issue already. Generated DIP/AIP then added as Fedora datastreams just like in 1.x?
So that we avoid having to ship a bunch of different solution packs for different content types in 2.x (as I understand we want to get away from this), what if we have a master library written in pure js that uses pcdmFormat of DIP datastreams to define+render a default "viewer" for each format type? Would be easily embeddable (thus of interest to Atom/Sufia/others) and could be overridden for individual content types for anyone wanting to customize frontend. This way we could have the 1.x viewer functionality like OpenSeadragon, video.js, etc. all available "out of the box" with minimal overhead.

ruebot · 2016-06-16T10:02:00Z

@axfelix Apache Camel 😄

...we can expand more on this later... since I'm about to give another presentation here at OR.

axfelix · 2016-06-16T10:02:52Z

OK, sure. I do not know anything about Camel at this point so I'm not sure how much of my spitballing is already handled :)

DiegoPino · 2016-06-16T10:12:18Z

@axfelix we are giving Fedora 4 API-X a look also (had a CLAW API-X FEDORA 4 meeting here at OR2016). And guess what. First prototype will be based on our php MicroServices idea. Async, cross platform and based on existing good practices (Archivematica).I'm look to have a talk about this in the next Claw call and you are very well invited to join us. Thanks for bringing this up1

axfelix · 2016-06-16T10:19:35Z

I'll be there! I'm not sure how many cycles I'll have for this and I'm neither qualified for nor interested in "low-level" PHP architecting (which is why I've stayed out of CLAW discussions so far) but I do want to see how this part shapes up.

acoburn · 2016-06-16T10:47:09Z

@ruebot I have already written such a service in camel/OSGi. More on that soon.

acoburn · 2016-06-16T15:44:17Z

@ruebot Here's my implementation of the image service: https://gitlab.amherst.edu/acdc/repository-extension-services/tree/master/acrepo-image-service

It streams binary data directly from Fedora through ImageMagick and then back out. It handles format conversions, resizes, etc. Basically, anything convert can handle.

acoburn · 2016-06-16T15:46:16Z

Oh, and you can completely ignore the owl inference stuff (i.e. the OPTIONS endpoint). It's completely half-baked and quite possibly wrong -- I'd like to get some feedback from @DiegoPino on that. The idea is that for API-X, services should provide a set of OWL restrictions so that services can be dynamically bound to certain resources, but none of that inference piece has actually been implemented.

acoburn · 2016-06-16T15:55:53Z

...and also, we have a FITS metadata extraction service in the works -- should be ready this week. It exposes a REST endpoint and then pipes fedora:Binary resources through the FITS-servlet web application (which must be running somewhere) and returns the FITS xml document.

ruebot · 2016-06-16T15:56:59Z

@acoburn y'all are pretty awesome 😄

...skimming through the other services and issues you have there...

DiegoPino · 2016-07-20T20:31:47Z

@jhsimpson and @ruebot just got this one from @rosiel (thanks a lot Rosie!). Not sure if this is the right spot for new ideas for provenance ontologies, but since there is still no archivematica github repo for discussing, i will just copy here for posterity. http://www.ics.forth.gr/isl/index_main.php?l=e&c=656 and http://doc.objectspace.org/cidoc/

ruebot added the PHP Services label Apr 6, 2016

This was referenced Apr 7, 2016

Content model objects in Islandora/F4 #31

Closed

Basic Image does not validate OBJ's mimetype #92

Closed

Implement pcdm:memberOf for Collection and Basic Image #105

Closed

ruebot added the pcdm label Apr 7, 2016

ruebot mentioned this issue Apr 7, 2016

Better thumbnails #73

Closed

DiegoPino self-assigned this Apr 27, 2016

whikloj mentioned this issue May 16, 2016

May sprint; Kick-off call notes #227

Closed

dannylamb closed this as completed Sep 8, 2016

rosiel mentioned this issue Aug 28, 2020

Document using base box and dev branch on the playbook instructions. #1598

Merged

7 tasks

Create a Basic Image microservice. #179

Create a Basic Image microservice. #179

Comments

whikloj commented Apr 6, 2016

whikloj commented Apr 6, 2016

ruebot commented Apr 6, 2016

whikloj commented Apr 6, 2016

ruebot commented Apr 6, 2016

whikloj commented Apr 27, 2016

DiegoPino commented Apr 27, 2016

whikloj commented Apr 27, 2016

DiegoPino commented Apr 27, 2016

ruebot commented Apr 27, 2016

ruebot commented Apr 27, 2016

DiegoPino commented Apr 27, 2016

DiegoPino commented Apr 27, 2016

ruebot commented Apr 27, 2016

ruebot commented Apr 27, 2016

DiegoPino commented Apr 27, 2016

ruebot commented Apr 27, 2016

DiegoPino commented Apr 27, 2016 • edited Loading

DiegoPino commented Apr 27, 2016

ruebot commented Apr 27, 2016

acoburn commented Apr 27, 2016 • edited Loading

ruebot commented Apr 27, 2016 • edited Loading

DiegoPino commented Apr 27, 2016

DiegoPino commented Apr 27, 2016 • edited Loading

acoburn commented Apr 27, 2016

ruebot commented Apr 27, 2016 • edited Loading

DiegoPino commented Apr 27, 2016

ruebot commented Apr 27, 2016

jhsimpson commented Apr 27, 2016

ruebot commented Apr 28, 2016

jhsimpson commented Apr 28, 2016

ruebot commented Apr 28, 2016

jhsimpson commented Apr 28, 2016 • edited Loading

ruebot commented Apr 28, 2016

jhsimpson commented May 18, 2016

DiegoPino commented May 18, 2016

axfelix commented Jun 16, 2016

ruebot commented Jun 16, 2016

axfelix commented Jun 16, 2016

DiegoPino commented Jun 16, 2016

axfelix commented Jun 16, 2016

acoburn commented Jun 16, 2016

acoburn commented Jun 16, 2016

acoburn commented Jun 16, 2016

acoburn commented Jun 16, 2016

ruebot commented Jun 16, 2016

DiegoPino commented Jul 20, 2016

DiegoPino commented Apr 27, 2016 •

edited

Loading

acoburn commented Apr 27, 2016 •

edited

Loading

ruebot commented Apr 27, 2016 •

edited

Loading

DiegoPino commented Apr 27, 2016 •

edited

Loading

ruebot commented Apr 27, 2016 •

edited

Loading

jhsimpson commented Apr 28, 2016 •

edited

Loading