Using deep learning to predict article quality

Reviewed by Morten Warncke-Wang

A short paper presented at the Joint Conference on Digital Libraries titled “Quality Assessment of Wikipedia Articles Without Feature Engineering”[1] uses deep learning to predict the quality of articles in the English Wikipedia. As the paper’s title alludes, previous research on article quality has used a specific set of features to represent the articles, whereas the promise of deep learning is that the machine learner will determine the best representation on its own.

Some representation of the articles still requires to be chosen, and the paper uses “Doc2Vec”, an extension of Word2vec that uses unsupervised machine learning to learn vector representations of the articles. A benefit of this approach is that it is language neutral, whereas other approaches might utilize features that are language-specific. These vectors are learned from a training set based on the Wikimedia Foundation’s dataset of 30,000 English articles. A deep neural network using Google’s TensorFlow library is then trained using these vectors with the aim to predict to which of the English Wikipedia’s assessment classes an article belongs.

The performance of the classifier is compared to the current state of the art, which at the time of writing is the WMF’s own Objective Revision Evaluation Service (ORES) (disclaimer: the reviewer is the primary author of the research upon which ORES’ article quality classifier is built). Since the number of articles in each class is fairly balanced, the proportion of correctly classified instances (accuracy) is used as the performance measure. ORES is reported to be 60% accurate (it currently reports 61.9% accuracy), and the deep neural network was found to be 55% accurate. As pointed out in the paper, this work is a first step towards using deep learning for this task, meaning that slightly lower performance is acceptable. The authors describe a couple of changes that will most likely improve the classifier and aim to do so in future work. Deep learning is an area where interesting things are happening, and if it can be used to improve our ability to automatically assess Wikipedia articles, a service that is already useful to many Wikipedians through services like WikiProject X and SuggestBot, that is only for the better!

Taiwanese researchers develop tool using Chinese Wikipedia to help primary and secondary schoolchildren

By Liang (WMTW) and Tsung-Ho Liang (Tainan, Taiwan)

Dr. Tsung-Ho Liang (梁宗賀)[supp 1] is a systems analyst in the information center at the Tainan City Government’s Bureau of Education. He currently studies big data in education, especially dealing with unstructured data and natural language processing techniques. In 2013, he started a project to integrate the contents of Chinese Wikipedia with the Chinese Knowledge and Information Processing (CKIP) technology and established a new search engine for Chinese Wikipedia,[supp 2]WikiSeeker (維基嬉客).

WikiSeeker is a tailor-made search system based on the Wikipedia corpus to leverage search effectiveness by providing structured association graphs with related Wikipedia articles for students’ queries in Chinese. First, it produces a knowledge map with clear relationships among each field of knowledge, so students can easily identify the most important keywords among contents. Second, the search bar of WikiSeeker is capable of using natural language to search instead of typing keywords. You can see a tour of WikiSeeker on Youtube.

The above two features make WikiSeeker intuitive and easy to use for K-12 students. According to the research essay “WikiSeeker─The Study of the Impact of a Search System with Structured Association Graphs on Learning Effectiveness” [2] by the researcher Sheng-Nan Cheng (鄭盛南), two experimental groups were adopted in this study: one asks students to use Chinese Wikipedia directly to answer questions, and another asks students use the WikiSeeker website to answer the same questions. The results showed that the students who used WikiSeeker were 10.8% more correct in their answers (on average, 13.73 out of 19, compared to 15.8 out of 19 questions). Moreover, it was found that girls and middle-achieving students reached the highest learning improvement when using WikiSeeker. The conclusion suggests that WikiSeeker is suitable for students to acquire knowledge in Chinese Wikipedia.

Sentiment analysis applied to adminship votes and talk page comments

Reviewed by Tilman Bayer

Sentiment analysis – the automated extraction of subjective information expressed in text – has been applied to Wikipedia research in several recent papers.

Four researchers from Stanford University analyzed[3] all (non-neutral) votes in the English Wikipedia’s request for adminship process cast from its inception in 2003 until 2013. These form a directed, signed graph with around 11,000 nodes (users) and 160,000 edges (votes). They removed the actual vote text (“support” and “oppose”) and tried to reconstruct the vote by applying sentiment analysis to the remaining comment text (where e.g. “I’ve no concerns, will make an excellent addition to the admin corps” indicates a positive vote). The performance of the resulting prediction model is described as “remarkably high, […] as a consequence of the highly indicative, sometimes even formulaic, language used in the comments”. It performed much better than a model trying to predict votes based on network characteristics alone (patterns of other support/oppose votes, using e.g. ideas from balance theory like “an enemy of my enemy is my friend”).

Is the editing frequency of Wikipedians influenced by negative or positive comments they receive on their user talk pages?

A student course project at the same university[4] tried to examine this question by analyzing the user talk pages of all users (around 620,000) who signed up in 2013 and made at least one article edit on the English Wikipedia, together with “thanks” messages received via the new software feature introduced during that year. They related this data to the number of article edits per week. The authors report that “while we found some predictive value for future behavior in the sentimental content of messages received by Wikipedia editors, we do not have evidence to establish a causal relationship between these variables… we were able to detect macro-level patterns of behavior that appear to discredit the hypothesis that the sentimental content of user talk pages is a main driver of user churn on Wikipedia”. As a limitation of their application of sentiment analysis in this situation, they note that “Most messages exchanged through user talk pages are not sentimentally-loaded, but rather talk about the Wikipedia guidelines and policies in a neutral manner”, calling for the use of more sophisticated natural language processing techniques.

These results are somewhat in contrast to those of a paper titled “The Impact of Sentiment-driven Feedback on Knowledge Reuse in Online Communities”,[5] which investigated “whether affective communication […] in form of sentiment-driven feedback in discussions between Wikipedia editors motivates collaborative work”, by analyzing a complete history dump of the Simple English Wikipedia (until 2011). The researchers focus on the “knowledge reuse” aspect of this collaborative work, quantified for “any two consecutive revisions of the same article page as the ratio of the number of words reused from the previous revision (e.g., copied, moved elsewhere, or restored) to the number of words newly created in the current revision.” By relating the positivity or negativity of article talk page comments to editing activity in the article itself, the authors found that:

“receiving (especially positive, rather than negative) feedback in form of sentiments that are expressed in inter-editor conversations is beneficial in terms of sustaining knowledge reuse in Wikipedia; moreover, giving either positive or negative feedback appears to be more effective than providing no feedback at all.”

Besides observing that public positive feedback may have a positive effect on editor motivation, they also note that “non-public negative peer feedback could increase one’s likelihood to engage in online social production by correcting inherent problems, behaviors, and attitudes in private peer conversations, which also strongly suggests that mechanisms for providing non-public negative feedback should be designed, incorporated, and tested in collaborative platforms such as wikis.”

See also our earlier coverage of sentiment analysis research, and a current research collaboration of the Wikimedia Foundation and other researchers that aims “to use machine learning and statistics to understand how attacking or ‘toxic’ language affects the contributor community on Wikipedia. The focus of our analysis is initially on talk page comments that exhibit harassment, personal attacks and aggressive tone.”


Conferences and events

Wikimania 2016, the annual global Wikimedia conference, took place in June in Esino Lario, Italy. The programme contained various research-related session, including the annual “State of Wikimedia Research” presentation highlighting some of the most interesting scholarship from the past year (slides).

See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

  • “50/50 Norm in Massive Online Public Good: The Case of Wikipedia”[6] From the abstract: “This paper shows the existence of a strong social norm in [the Wikipedia community], demonstrated by the choice of the equal split in the Dictator Game (DG). With the help of the French Wikimédia Foundation [sic], we questioned a large sample of Wikipedia users and contributors on their practices, and then asked them to play the DG. The results are statistically significant and show how people respect (or not) social norms. […] Regular, long-term users, who declare a strong attachment to the platform, are more likely to choose the 50/50 split in the DG.”
  • “Dynamics of Disagreement: Large-Scale Temporal Network Analysis Reveals Negative Interactions in Online Collaboration” [7] From the abstract: “We analyze sequences of reverts of contributions to Wikipedia […]. We find evidence that individuals systematically attack the same person and attack back their attacker […]. We also establish that individuals come to defend an attack victim but we do not find evidence that attack victims ‘pay it forward’ or that attackers collude to attack the same individual. We further find that high-status contributors are more likely to attack many others serially, status equals are more likely to revenge attacks back, while attacks by lower-status contributors trigger attacks forward; yet, it is the lower-status contributors who also come forward to defend third parties.”
  • “Wikipedia: Access and participation in an open encyclopaedia”[8] From the abstract: “This thesis […] found participation is shaped by different understandings of openness, where it is constructed as either a libertarian ideal where ‘anyone’ is free to edit the encyclopaedia, or as an inclusive concept that enables ‘everyone’ to participate in the platform. The findings therefore problematise the idea of single user community, and serve to highlight the different and sometimes competing approaches actors employ to enable and constrain participation in Wikipedia.”
  • “Cultural Anthropology through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-based Sentiment”[9] From the abstract: “we study the differences in historical World View between Western and Eastern cultures, represented through the English, the Chinese, Japanese, and German Wikipedia. In particular, we analyze the historical networks of the World’s leaders since the beginning of written history […]. We also identify the most influential female leaders of all times in the English, German ( both Elizabeth II ), Spanish ( Michelle Bachelet ), and Portuguese ( Maria II of Portugal ) Wikipedia. As an additional lens into the soul of a culture we compare top terms, sentiment, emotionality, and complexity of the English, Portuguese, Spanish, and German Wikinews.” (cf. earlier coverage of a related paper coauthored by some of the same authors: “Most important people of all times, according to four Wikipedias“)
  • “Prediction of influenza outbreaks by integrating Wikipedia article access logs and Google flu trend data”[10]
  • “Is There a Weekly Pattern for Health Searches on Wikipedia and Is the Pattern Unique to Health Topics?”[11](TL;DR: Yes/No)
  • “Automatic linking of wikipedia pages by their semantic similarity”[12] From the abstract: “In this study, by using the Natural Language Processing techniques, the linking system [between Wikipedia articles]] has been tried to be automatized. Initially, the approach has been designed for Turkish Wikipedia, then in the second step, it has been tried for English Wikipedia and the results have been compared, evaluations are promising.” (cf. m:Research:Improving_link_coverage)
  • “Studying the Role of Diversity in Open Collaboration Network: Experiments on Wikipedia”[13] From the abstract: “We introduce a concept of diversity of interests or versatility of a Wikipedia editor and Wikipedia teams and examine how it is correlated with the quality of their production. Our experiments indicate that editor’s and team’s diversity seems to have bigger impact on quality of their work than other properties.”

Other student project writeups from the fall 2015 CS229 course at Stanford (see also above):

  • “Relevance Analyses and Automatic Categorization of Wikipedia Articles”[14] From the abstract: “Given a set of article pairs, we found a linear correlation between similarity index outputted from [the author’s machine learning] algorithm and human’s ratings on similarity. We then applied hierarchical clustering on a group of articles based on their similarity indices and construct a categorization binary tree. We evaluated the tree by asking humans to play odd-one-out games – given an instance of a triplet of articles, choosing one article that is the most different, and we found that the tree correctly classified approximately 80% of the odd-one-out instance compared to the data from humans.”
  • “An AI for the Wikipedia Game[15]
  • “Predicting and Identifying Hypertext in Wikipedia Articles”[16] (about a machine learning algorithm to decide which words in an article should be “turned blue”)


  1. Dang, Quang Vinh; Ignat, Claudia-Lavinia (2016). “Quality Assessment of Wikipedia Articles Without Feature Engineering”. Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries. JCDL ’16. New York, NY, USA: ACM. pp. 27–30. doi:10.1145/2910896.2910917. ISBN 9781450342292. 
  2. 鄭盛南 (Sheng-Nan Cheng): 維基嬉客(WikiSeeker) 一個結構化關聯圖之搜尋系統對於學生學習成效之研究 (WikiSeeker─The Study of the Impact of a Search System with Structured Association Graphs on Learning Effectiveness). Thesis, National University of Tainan 2015 (in Chinese)
  3. West, Robert; Paskov, Hristo S.; Leskovec, Jure; Potts, Christopher (2014). Exploiting Social Network Structure for Person-to-Person Sentiment Analysis (PDF). Topology, Algebra and Categories in Logic. p. 14.  Transactions of the Association for Computational Linguistics, 2 (2014) 297–310. Supplementary proofs and dataset
  4. Martinez-Ortuno, Sergio; Menghani, Deepak; Roemheld, Lars. Sentiment as a Predictor of Wikipedia Editor Activity (PDF). Stanford University. p. 4. 
  5. Grigore, Mihai; Rosenkranz, Christoph; Sutanto, Juliana (2015-12-01). “The Impact of Sentiment-driven Feedback on Knowledge Reuse in Online Communities”. AIS Transactions on Human-Computer Interaction 7 (4): 212–232. ISSN 1944-3900. 
  6. Nguyen, Dang; Godefroy; Dejean, Sylvain; Jullien, Nicolas (2016-01-19). 50/50 Norm in Massive Online Public Good: The Case of Wikipedia. Rochester, NY: Social Science Research Network. 
  7. Tsvetkova, Milena; García-Gavilanes, Ruth; Yasseri, Taha (2016-02-04). “Dynamics of Disagreement: Large-Scale Temporal Network Analysis Reveals Negative Interactions in Online Collaboration”. arXiv:1602.01652 [physics]. 
  8. Osman, Kim Y. (2015). “Wikipedia: Access and participation in an open encyclopaedia”. Queensland University of Technology.  PhD thesis
  9. Gloor, Peter A.; Marcos, Joao; de Boer, Patrick M.; Fuehres, Hauke; Lo, Wei; Nemoto, Keiichi. “Cultural Anthropology through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-based Sentiment”. 
  10. Bardak, Batuhan; Tan, Mehmet (November 2015). “Prediction of influenza outbreaks by integrating Wikipedia article access logs and Google flu trend data”. 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE). 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE). pp. 1–6. doi:10.1109/BIBE.2015.7367640. 
  11. Gabarron, Elia; Lau, Annie YS; Wynn, Rolf (2015-12-22). “Is There a Weekly Pattern for Health Searches on Wikipedia and Is the Pattern Unique to Health Topics?”. Journal of Medical Internet Research 17 (12): e286. doi:10.2196/jmir.5038. ISSN 1438-8871. 
  12. Ikikat, F.Y.; Gurhan, B.; Diri, B. (September 2015). “Automatic linking of wikipedia pages by their semantic similarity”. 2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA). 2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA). pp. 1–5. doi:10.1109/INISTA.2015.7276789.  Closed access
  13. Baraniak, Katarzyna; Sydow, Marcin; Szejda, Jacek; Czerniawska, Dominika (2016-01-11). “Studying the Role of Diversity in Open Collaboration Network: Experiments on Wikipedia”. In Adam Wierzbicki, Ulrik Brandes, Frank Schweitzer, Dino Pedreschi (eds.). Advances in Network Science. Lecture Notes in Computer Science. Springer International Publishing. pp. 97–110. ISBN 978-3-319-28360-9.  Closed access
  14. Supaniratisai, George Pakapol; Bhumiwat, Pakapark; Pongsiri, Chayakorn (2015-12-11). Relevance Analyses and Automatic Categorization of Wikipedia Articles (PDF). Stanford University. p. 5. 
  15. Barron, Alex; Swafford, Zack. An AI for the Wikipedia Game (PDF). Stanford University. p. 5. 
  16. Guha, Neel; Hu, Annie; Wang, Cindy. Predicting and Identifying Hypertext in Wikipedia Articles (PDF). Stanford University. p. 6. 
Supplementary references:

Wikimedia Research Newsletter
Vol: 6 • Issue: 6 • June 2016
This newletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email WikiResearch on Twitter[archives] [signpost edition] [contribute] [research index]