Photo by Stefan Krause, Free Art License.

More than a year after we first raised the issue, a problematic proposal for a new copyright directive is coming to a vote in the European Parliament next week.

Even in its amended version, this proposed directive would require that websites with large amounts of user uploaded content add mandatory upload filters. What this means is that websites would be required to have algorithms check all user uploads against a database of content and block those that are detected as infringing copyrights from appearing online.

We are very concerned about potential impacts that the proposal would have on Wikipedia, which largely addresses infringing content through community mechanisms. Such proposals put too much weight in the power of technologies for automatic content detection—be it artificial intelligence (AI), machine learning, or hash-based file identification—without considering the impact on human-driven models of content moderation.

In addition to mandating content filtering in Article 13 of the proposed copyright directive, the European Commission is also proposing a new Initiative Against Illegal Content. In the corresponding announcement, “automatic detection and filtering technologies” are presented as an important factor in fighting extremist and other illegal content online. Both proposals demonstrate an increasing reliance on technology to make decisions about the legality of online content. As AI becomes more ubiquitous and machine learning improves, we expect these calls for automatic content detection will only continue to grow in volume.

One important aspect where this belief in AI and mandatory automatic content detection falls short is the reliance on technology as the only solution to the issues facing platforms today. However, this is neither the best nor the only way to deal with illegal content, let alone with content that is in other ways controversial or problematic.

The Wikimedia Foundation believes that technology, including AI, will be a useful tool in the future of content evaluation, but it should not be confused with a catch-all solution that would fix all problems. The volunteer editors on Wikipedia and its sister projects currently use a machine learning tool called Objective Revision Evaluation Service (ORES) to flag vandalism on the projects and to predict article quality.  Importantly, ORES itself does not make final decisions, but instead it provides a service which can help humans and bots improve Wikipedia and its sister projects. Once flagged by ORES, review and removal of content is handled entirely though community processes. This relationship acknowledges the limitations of machine learning while harnessing its strengths.

In general, volunteer contributors monitor new contributions for compliance with the community’s rules as well as copyright and other laws, and collaboratively resolve disputes over content. This system is effective and issues with content on the Wikimedia projects are rarely elevated to the Wikimedia Foundation. Of the small number of copyright complaints that are directed at the Foundation, an even smaller number are valid. The large, crowdsourced community processes allow the Wikimedia projects to be dynamic and flexible in handling a consistent flow of edits, about 10 per second. Importantly, the community governance system provides crucial safeguards for participation, freedom of expression, and collaboration. Obviously, not every platform that hosts content uploaded by its users is like Wikimedia. However, human-centered moderation allows for less arbitrary decision-making.

In contrast, any sort of law which mandates the deployment of automatic filters to screen all uploaded content using AI or related technologies does not leave room for the types of community processes which have been so effective on the Wikimedia projects. As previously mentioned, upload filters as they exist today view content through a broad lens, that can miss a lot of the nuances which are crucial for the review of content and assessments of legality or veracity. Even where improvements have been made, such as YouTube’s Content ID system for identifying copyrighted works and Google’s Cloud Vision API which detects “inappropriate” content, these improvements have cost a significant amount of money and often result in false positives by failing to take into account context or nuances in the law like fair use or copyright exceptions. While improvements can be made to fix these issues, they highlight the need for caution when attempting to apply automatic detection technologies as a blanket solution for even more complicated types of content like terrorism or misinformation.

As we continue to explore new ways to use machine learning technology to improve Wikimedia projects, the Wikimedia Foundation recognizes that this growth must leave room for the participation of all internet users and respect human rights. Therefore we signed onto the Toronto Declaration on Machine Learning, which uses the framework of international human rights law as a guide for the development of future machine learning technology. As the Wikimedia movement looks toward 2030, we do so with the knowledge that progress must be measured, inclusive, and protect freedom of expression. We urge policymakers in the EU to uphold these values and human rights as they consider the proposal for a copyright directive for the digital single market.

Allison Davenport, Technology Law and Policy Fellow, Wikimedia Foundation
Anna Mazgal, EU Policy Adviser, Wikimedia Germany (Deutschland)