<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Announcing the WikiChallenge Winners</title>
	<atom:link href="http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/</link>
	<description>News from the Wikimedia Foundation and about the Wikimedia movement</description>
	<lastBuildDate>Fri, 24 May 2013 22:56:42 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.6-beta2-24176</generator>
	<item>
		<title>By: Teh IP</title>
		<link>http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/comment-page-1/#comment-35280</link>
		<dc:creator>Teh IP</dc:creator>
		<pubDate>Fri, 18 Nov 2011 15:53:10 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wikimedia.org/?p=7083#comment-35280</guid>
		<description><![CDATA[thanks.  That was at least an insight.  ;-)

I wonder what input factors you had to consider?  If it is just a giant number churning exercise based off of the &quot;easy to get out of the database&quot; aspects, than you may not be considering factors that would give you the strongest insights.  Demographics for instance.  Also perhaps interaction (are there certain people that are &quot;attractors/motivators&quot; and others that are &quot;driving people away&quot;)?

I would not give up at finding this info out either.  Perhaps fancy shmancy neural nets are not the way to get key insights, but instead doing some semi-quantitative work based on cases, interviews, etc. would lead to better insights.  I also find that doing this sort of thing can often generate new hypotheses and ideas.  Because you just learn things and generate new &quot;factors&quot; from getting a little dirty looking up close and personal at some cases.

It&#039;s comparable to doing a manufacturing optimization problem.  We could just use what data we have (now, easy, and in big numbers) from the shop floor computer system.  That will give us some things...it will probably be robust (if you do out of sample confirmations).  And you MIGHT learn something counter-intuitive, just because tribal wisdom doesn&#039;t match the numbers.  But pretty decent chance you might not be tracking a key feature.  Also a good chance that there may be some problems somewhere in data input/tracking etc. that get missed...if it is all a hands length math game.

The next step up from that, might be to get on the white board and brainstorm, list ideas, diagram the process, make those fishbone causation charts, etc. and just list some possible factors.  Then after doing that consider which ones have easy data to check on and which don&#039;t (and if there are some that you think may be very critical...but lack data on...then you figure out some workarounds (e.g. cases) to at least improve your Bayesian hunches and try to validate/invalidate the hypotheses.

Perhaps even more of a step up is to do a walkthrough of the assembly line (or even take a shift or two and build the product personally!)  This will probably give a few new ideas...and it may give insights on how to prioritize which factors to investigate, spark ideas for how to do workarounds where the factors are not tracked in the database, etc.

Of course, all the processes are important.  And really it is a feedback loop as well.

that said...I&#039;m still blown away how remote from actionable insights, the work here was.  I went to the winner page and the discussion was all matrices and rows and set theory and such...nothing like...F=MA or V=IR or the like.  ;-)

Sorry...if this is all motherhood and apple pie, but I felt it needed to be said.  And I DO appreciate one person (you) boiling down some of the learning.]]></description>
		<content:encoded><![CDATA[<p>thanks.  That was at least an insight.  ;-)</p>
<p>I wonder what input factors you had to consider?  If it is just a giant number churning exercise based off of the &#8220;easy to get out of the database&#8221; aspects, than you may not be considering factors that would give you the strongest insights.  Demographics for instance.  Also perhaps interaction (are there certain people that are &#8220;attractors/motivators&#8221; and others that are &#8220;driving people away&#8221;)?</p>
<p>I would not give up at finding this info out either.  Perhaps fancy shmancy neural nets are not the way to get key insights, but instead doing some semi-quantitative work based on cases, interviews, etc. would lead to better insights.  I also find that doing this sort of thing can often generate new hypotheses and ideas.  Because you just learn things and generate new &#8220;factors&#8221; from getting a little dirty looking up close and personal at some cases.</p>
<p>It&#8217;s comparable to doing a manufacturing optimization problem.  We could just use what data we have (now, easy, and in big numbers) from the shop floor computer system.  That will give us some things&#8230;it will probably be robust (if you do out of sample confirmations).  And you MIGHT learn something counter-intuitive, just because tribal wisdom doesn&#8217;t match the numbers.  But pretty decent chance you might not be tracking a key feature.  Also a good chance that there may be some problems somewhere in data input/tracking etc. that get missed&#8230;if it is all a hands length math game.</p>
<p>The next step up from that, might be to get on the white board and brainstorm, list ideas, diagram the process, make those fishbone causation charts, etc. and just list some possible factors.  Then after doing that consider which ones have easy data to check on and which don&#8217;t (and if there are some that you think may be very critical&#8230;but lack data on&#8230;then you figure out some workarounds (e.g. cases) to at least improve your Bayesian hunches and try to validate/invalidate the hypotheses.</p>
<p>Perhaps even more of a step up is to do a walkthrough of the assembly line (or even take a shift or two and build the product personally!)  This will probably give a few new ideas&#8230;and it may give insights on how to prioritize which factors to investigate, spark ideas for how to do workarounds where the factors are not tracked in the database, etc.</p>
<p>Of course, all the processes are important.  And really it is a feedback loop as well.</p>
<p>that said&#8230;I&#8217;m still blown away how remote from actionable insights, the work here was.  I went to the winner page and the discussion was all matrices and rows and set theory and such&#8230;nothing like&#8230;F=MA or V=IR or the like.  ;-)</p>
<p>Sorry&#8230;if this is all motherhood and apple pie, but I felt it needed to be said.  And I DO appreciate one person (you) boiling down some of the learning.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Yang</title>
		<link>http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/comment-page-1/#comment-35157</link>
		<dc:creator>Yang</dc:creator>
		<pubDate>Fri, 18 Nov 2011 10:53:27 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wikimedia.org/?p=7083#comment-35157</guid>
		<description><![CDATA[Teh IP: That&#039;s a thoughtful question. It&#039;s generally the case that more complex models (which many of these winning entries use) tend to be less explainable/understandable. More importantly for the Wikichallenge, though, is that prediction is almost entirely dominated by features based on the times/rates/etc. at which these editors have historically made edits, essentially capturing (obvious) intuitions like, &quot;editors who have been in recent months making a steady rate of edits are more likely to stay.&quot; (I&#039;m a competitor who ended in 8th place and read the winners&#039; writeups.)]]></description>
		<content:encoded><![CDATA[<p>Teh IP: That&#8217;s a thoughtful question. It&#8217;s generally the case that more complex models (which many of these winning entries use) tend to be less explainable/understandable. More importantly for the Wikichallenge, though, is that prediction is almost entirely dominated by features based on the times/rates/etc. at which these editors have historically made edits, essentially capturing (obvious) intuitions like, &#8220;editors who have been in recent months making a steady rate of edits are more likely to stay.&#8221; (I&#8217;m a competitor who ended in 8th place and read the winners&#8217; writeups.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Teh IP</title>
		<link>http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/comment-page-1/#comment-33748</link>
		<dc:creator>Teh IP</dc:creator>
		<pubDate>Wed, 16 Nov 2011 04:19:58 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wikimedia.org/?p=7083#comment-33748</guid>
		<description><![CDATA[So wait.  What did we learn about the editors?  This post is really detailing all the fancy shmancy models and such.  But what were the key insights about &quot;types of editors likely to leave/stay&quot;?  Imagine I am a CEO and want the &quot;take away&quot;?  Is there something that you can express in terms of easy to understand descriptions of behavior?  And perhaps it gives us even deeper insights about why X leads to Y?]]></description>
		<content:encoded><![CDATA[<p>So wait.  What did we learn about the editors?  This post is really detailing all the fancy shmancy models and such.  But what were the key insights about &#8220;types of editors likely to leave/stay&#8221;?  Imagine I am a CEO and want the &#8220;take away&#8221;?  Is there something that you can express in terms of easy to understand descriptions of behavior?  And perhaps it gives us even deeper insights about why X leads to Y?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wikimedia blog &#187; Blog Archive &#187; Wikimedia Foundation Report, October 2011</title>
		<link>http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/comment-page-1/#comment-31415</link>
		<dc:creator>Wikimedia blog &#187; Blog Archive &#187; Wikimedia Foundation Report, October 2011</dc:creator>
		<pubDate>Thu, 10 Nov 2011 15:41:31 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wikimedia.org/?p=7083#comment-31415</guid>
		<description><![CDATA[[...] [1] http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/ [...]]]></description>
		<content:encoded><![CDATA[<p>[...] [1] <a href="http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/" rel="nofollow">http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: neitway</title>
		<link>http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/comment-page-1/#comment-28011</link>
		<dc:creator>neitway</dc:creator>
		<pubDate>Sat, 29 Oct 2011 08:08:34 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wikimedia.org/?p=7083#comment-28011</guid>
		<description><![CDATA[“I have a lot of respect for what Wikipedia has done for the accessibility of information. Any small contribution I can make to that cause is in my opinion time well spent.” Good!]]></description>
		<content:encoded><![CDATA[<p>“I have a lot of respect for what Wikipedia has done for the accessibility of information. Any small contribution I can make to that cause is in my opinion time well spent.” Good!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Announcing the WikiChallenge Winners &#124; My Blog</title>
		<link>http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/comment-page-1/#comment-28002</link>
		<dc:creator>Announcing the WikiChallenge Winners &#124; My Blog</dc:creator>
		<pubDate>Sat, 29 Oct 2011 06:21:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wikimedia.org/?p=7083#comment-28002</guid>
		<description><![CDATA[[...] and is filed under Data analytics, Research. You can follow any responses to this entry through the RSS 2.0 [...]]]></description>
		<content:encoded><![CDATA[<p>[...] and is filed under Data analytics, Research. You can follow any responses to this entry through the RSS 2.0 [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Asaf</title>
		<link>http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/comment-page-1/#comment-27705</link>
		<dc:creator>Asaf</dc:creator>
		<pubDate>Thu, 27 Oct 2011 23:28:29 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wikimedia.org/?p=7083#comment-27705</guid>
		<description><![CDATA[The winning algorithm&#039;s page directs to the source code on dumps.wikimedia.org, but that URL yields a 404. :(]]></description>
		<content:encoded><![CDATA[<p>The winning algorithm&#8217;s page directs to the source code on dumps.wikimedia.org, but that URL yields a 404. :(</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: KTH</title>
		<link>http://blog.wikimedia.org/2011/10/26/announcing-the-wikichallenge-winners/comment-page-1/#comment-27381</link>
		<dc:creator>KTH</dc:creator>
		<pubDate>Wed, 26 Oct 2011 17:53:57 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wikimedia.org/?p=7083#comment-27381</guid>
		<description><![CDATA[Congrats to everyone involved, a fun competition, and I hope of great benefit to Wikipedia.

I want to point out a typo though, Benjamin and Fridolin Roth did not use random forests, but rather vanilla/standard linear regression.  Unfortunately there turned out to be a mistake in the data set construction, it was not properly randomized.  

As such Benjamin and Fridolin, no fault to them, stumbled upon this error and used it as the primary input to their model.   This error happens to be equivalent to knowing a large fraction of the answer, future edits, before predicting it.  Unfortunately this means that their model is invalid and can&#039;t be used by Wikipedia to understand participation.  See the following link for more detail:

http://www.kaggle.com/forums/t/980/wikipedia-participation-challenge-an-unfortunate-ending]]></description>
		<content:encoded><![CDATA[<p>Congrats to everyone involved, a fun competition, and I hope of great benefit to Wikipedia.</p>
<p>I want to point out a typo though, Benjamin and Fridolin Roth did not use random forests, but rather vanilla/standard linear regression.  Unfortunately there turned out to be a mistake in the data set construction, it was not properly randomized.  </p>
<p>As such Benjamin and Fridolin, no fault to them, stumbled upon this error and used it as the primary input to their model.   This error happens to be equivalent to knowing a large fraction of the answer, future edits, before predicting it.  Unfortunately this means that their model is invalid and can&#8217;t be used by Wikipedia to understand participation.  See the following link for more detail:</p>
<p><a href="http://www.kaggle.com/forums/t/980/wikipedia-participation-challenge-an-unfortunate-ending" rel="nofollow">http://www.kaggle.com/forums/t/980/wikipedia-participation-challenge-an-unfortunate-ending</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

 Served from: blog.wikimedia.org @ 2013-05-25 14:11:51 by W3 Total Cache -->