Upload-by-URL for test.wikipedia.org

Since we increased the upload filesize limit to 100MB on the main wikis a few months ago it’s been easier to upload large images and medium-size video clips, but there’s always something that’s just a leeeeetle over the limit… MediaWiki’s upload form does have an option for pulling a file from an external web site, which wouldn’t be restricted to the HTTP post limits in the Squid->Apache->PHP system.

We hadn’t been able to deploy it initially on Wikimedia sites because the web servers are walled off and don’t have direct access to the internet; further we were worried about safety given security reports about how the CURL library can follow malicious redirects to local filesystem resources.

On investigation, Tim found that CURL is safe in the default case — you need to explicitly enable redirect following to be exploited, which we don’t. We also have an HTTP proxy which our internal servers can use to reach outside files… I’ve made some tweaks to Special:Upload to support the proxy setting, and it’s now enabled on test.wikipedia.org:

upload-url-form

My very first URL-uploaded file was a screenshot from one of my blog posts, Spiffy!

The default configuration limits URL uploads to sysops, so for now you’ll need to be a sysop on Test Wikipedia to try it out. If everything seems fairly problem-free we’ll start rolling this out a bit more widely for Commons and other sites.

The upload-by-URL functionality is also needed for future-facing work Michael Dale is working on to allow an on-wiki media picker to fetch freely-licensed files from Flickr, Archive.org, and other places.

12 Show

12 Comments on Upload-by-URL for test.wikipedia.org

purodha 6 years

Having this feature available to the ordinary user on commons will of course make it much more likely that people move files from their respective wikis to commons, which is imho good & desirable.

brion 6 years

Currently the experimental limit for upload-by-URL is 500MB … but I think it has to download in 30 seconds. :) Needs more work.

Jake Wartenberg 6 years

So, will the limit still be 100MB? Or is it raised if this method is used?

brion 6 years

With the source URL recorded as metadata, it would probably actually help in identifying those — if people are more likely to plop in a URL instead of saving and uploading, we know the source right off.

lucasbfr 6 years

I’m afraid it would make the problem of uploading copyrighted pictures from news sources and blogs worse, whough…

Diaa 6 years

Would this allow multiple uploads? Instead of clicking upload many times it would be nice having this URL images fetcher have a queue upload feature which would upload the images after each other and report to the user after uploading them all.

brion 6 years

Simply transcluding the remote file might sound nice in a web-centric way, but there are a few big problems:

* We want to be able to redistribute files used in our articles including for offline use; if we don’t have them, that’s a lot harder.
* The source site might not be able to handle the level of hotlinking traffic, causing performance problems or cost to the site owner. (Some sites disallow hotlinking outright.)
* The file might be changed without warning by the owner of the remote site; we need to retain a change history.
* The file might get deleted later or the site might be rearranged or cease to exist, leaving no compatible URL
* The URL may be explicitly transitory — a temporary staging location of the file which will cease to exist after the upload is complete.

And of course to get file metadata, check for duplicates, and create thumbnails we need to fetch the file — it makes no sense to throw it away at that point.

kduke 6 years

Instead of wasting server space with duplicate data, why not just transclude the image from the URL onto Wikipedia.

brion 6 years

@Church of emacs —

No, currently the source URL isn’t retained as metadata. Would be a good thing to add, especially if we open it to general usage!

pfctdayelise 6 years

Yay! That’s exciting news…

Church of emacs 6 years

Is the URL saved and displayed after the upload? Thad would be a nice feature, as it simplifies the search for the source if no source is explicitly specified. It may be even possible to automatically fill out parts of the Information template…

Leave a Reply

Your email address will not be published. Required fields are marked *