Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YouTube exporter #82

Open
Daniel-Mietchen opened this issue Jun 28, 2013 · 13 comments
Open

YouTube exporter #82

Daniel-Mietchen opened this issue Jun 28, 2013 · 13 comments

Comments

@Daniel-Mietchen
Copy link
Member

This issue serves to bundle work on a pipeline to export videos from PMC to YouTube.

@wrought
Copy link
Member

wrought commented Jun 28, 2013

seems youtube accepts .ogg format and automatically converts URLs to have anchor tags in the description, so possible to easily link back to WMC: http://www.youtube.com/watch?v=JP4hd_PVFSE

@wrought
Copy link
Member

wrought commented Jun 28, 2013

So, working out some specs for this might be:

  1. Connect to Google/Youtube w/ Oauth2
  2. Upload video and write meta data through Youtube API v3 https://developers.google.com/youtube/v3/
  3. Benchmark and throttle if necessary to avoid hitting API limits, which might look something like "200 video uploads, 7000 write operations, and 200,000 read operations that each retrieve three resource parts" per day, totalling approximately 5,000,000 "units" over the api.
  4. Potentially communicate with youtube / google about lifting throttle if it becomes an impediment.
  5. Create manual procedure to update an updated, failed, or otherwise incorrect upload.

This seems to me that there are two important components if this is to be integrated with the current service:

  1. Upload to Youtube after video is deposited in Commons, so the proper URL can be used to link back, provide DOI and direct-download link as well.
  2. Need to be able to potentially update either the Youtube or Commons media independent from one another. Need to investigate how the application is currently used in case of error, update, etc.

@Daniel-Mietchen
Copy link
Member Author

This looks good to me so far.

Some background on what I have in mind with this YouTube exporter:

  1. outreach to the YouTube community (and potentially that of other video sharing sites) about (a) Wikimedia Commons, (b) research and (c) Open Access;
  2. checking the OAMI workflows and readying them for routine operation (keeping track of articles that have suitable materials, and of what has been uploaded when and where) and possibly further plugins;
  3. for those videos that failed to convert through Gstreamer, there is a good chance that YouTube does have a way to ingest them, and we could then import the WebM from there into Commons;
  4. outreach to scholarly authors and editors about the benefits (and pitfalls, if any) of reuse-friendly licenses through more comprehensive inclusion of reuse in altmetrics;
  5. testing whether YouTube's "related" material can be of any use in improving categorization of the videos on Commons;
  6. testing the technical and community aspects of sharing media from Commons with other sites and their respective communities (e.g. Flickr, sound archives);
  7. testing the legal ground of such multi-layer reuse in a commercial context (e.g. https://twitter.com/EvoMRI/status/350790898092752896 ).

@wrought
Copy link
Member

wrought commented Jul 1, 2013

After reading through more of the tool it seems like it would make the most sense to add:

  1. A few fields to model.py, including especially an uploaded_to_youtube bit.
  2. an action to oa_put to upload_media_to_youtube akin to upload_media
  3. The requisite helper functions for YouTube akin to mediawiki.py
  4. Possibly a second plot to view stats on successful uploads to youtube

// Matt

----- Reply message -----
From: "Daniel Mietchen" [email protected]
To: "erlehmann/open-access-media-importer" [email protected]
Cc: "Matt Senate" [email protected]
Subject: [open-access-media-importer] YouTube exporter (#82)
Date: Sat, Jun 29, 2013 03:52
This looks good to me so far.

Some background on what I have in mind with this YouTube exporter:

outreach to the YouTube community (and potentially that of other video sharing sites) about (a) Wikimedia Commons, (b) research and (c) Open Access;
checking the OAMI workflows and readying them for routine operation (keeping track of articles that have suitable materials, and of what has been uploaded when and where) and possibly further plugins;
for those videos that failed to convert through Gstreamer, there is a good chance that YouTube does have a way to ingest them, and we could then import the WebM from there into Commons;
outreach to scholarly authors and editors about the benefits (and pitfalls, if any) of reuse-friendly licenses through more comprehensive inclusion of reuse in altmetrics;
testing whether YouTube's "related" material can be of any use in improving categorization of the videos on Commons;
testing the technical and community aspects of sharing media from Commons with other sites and their respective communities (e.g. Flickr, sound archives);
testing the legal ground of such multi-layer reuse in a commercial context (e.g. https://twitter.com/EvoMRI/status/350790898092752896 ).


Reply to this email directly or view it on GitHub.

@Daniel-Mietchen
Copy link
Member Author

Yes, this makes good sense for a start, but what I have in mind is more complicated.

  1. I see no easy way to tell precisely which articles on PMC have
    (1) already been checked by the two OAMI crawlers (cf. Cache incomplete? #85 and Naming of cached files #83 )
    (2) caused problems with conversion or upload (cf. File not uploaded #22 and ERROR: GStreamer encountered a general stream error. #21 and https://github.com/erlehmann/open-access-media-importer/issues?labels=GStreamer&page=1&state=open ).
    Perhaps it's worth checking and fixing the workflows here first before we expand to YouTube.
  2. YouTube does not accept sound-only files, nor videos below or above certain sizes.
  3. The uploading under CC licenses option is only available for trusted users, but we do not even have an account yet (I would prefer to have a separate one, perhaps named "WikiProject Open Access" or so).
  4. I would like the YouTube entry to link back to the file and paper on
    (1) the journal's website
    (2) Wikimedia Commons (if both conversion and upload worked)
    (3) PMC
    and to the original license (which is not always CC BY 3.0, nor even CC BY (cf. https://twitter.com/invisiblecomma/status/345101287580385280 )

@wrought
Copy link
Member

wrought commented Jul 2, 2013

From my point of view, while there are some bugs and edge cases, there are no technical "blockers" for developing the feature to deposit these videos on youtube as well as a mediawiki instance. While there are some issues with identifying materials and whether they have been uploaded, I think that is an existing bug that requires its own development. I would rather not go too far down that path now, but I would be interested in helping later on. For the time being, I think it is reasonable and within scope technically to extend to youtube as I mentioned above.

As for audio, this should be caught in error handling of some fashion, so no worries. Same for video sizing, this should be logged as errors and there should be a queue of backlogged uploads.

@Daniel-Mietchen
Copy link
Member Author

Agreed.

There are a number of design decisions we made early on that make life difficult with the OAMI now that we are handling thousands of files from hundreds of journals. Some of these decisions would have been different if we had known about the problems in the XML (see also http://chrismaloney.org/notes/OAMI%20JatsCon%20Submission,%202013 - accepted by now).

So I think the point for this weekend is to get a demo for a PMC-to-YouTube workflow going (possibly via Commons), with a few files (say, on the order of 100), then finetune that workflow over the coming weeks, with the goal of having the channel in full operation by OA week.

@wrought
Copy link
Member

wrought commented Jul 5, 2013

Some work put in so far on this branch: https://github.com/wrought/open-access-media-importer/tree/youtube

Will see about throttling, this is currently being done with a single sleep() function, can do the same ;)

Thought about metadata--all will be posted in the Youtube description, which automatically converts links. Are there any cases where supplemental files are given their own DOI? We don't seem to be accessing that information currently if so. If it's a fringe case, not worth it anyhow.

@Daniel-Mietchen
Copy link
Member Author

The PLOS ONE YouTube channel has 1.2M views from ca. 100 files
http://www.youtube.com/user/channelplosone/videos?view=0&sort=p&flow=grid .
1.1M of these are of one video:
http://www.youtube.com/watch?v=g1y7ASI3ZkQ

The Pensoft YouTube channel has 500k views from ca. 20 videos:
http://www.youtube.com/user/PensoftPublishers/videos?flow=grid&view=0&sort=p

BMC: 100k views on ca. 200 videos, mostly about open access
http://www.youtube.com/user/BioMedCentral/videos?sort=p&view=0&flow=grid

@Daniel-Mietchen
Copy link
Member Author

There is currently no tool to expose how often a video or audio file embedded in a Wikipedia article has actually been played.

@Daniel-Mietchen
Copy link
Member Author

A snapshot of the files uploaded by the bot to Commons that get most views via Wikipedia:
http://www.webcitation.org/6HvxubbRD ,
calculated via
http://tools.wmflabs.org/glamtools/glamorous.php?doit=1&category=Uploaded+with+Open+Access+Media+Importer&use_globalusage=1&ns0=1&show_details=1

@wrought
Copy link
Member

wrought commented Jul 7, 2013

The tests should show up here http://youtube.com/wikiprojectoatest and the live will be http://youtube.com/wikiprojectoa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants