Andrew Otto

  1. Get live updates to Wikimedia projects with EventStreams

    Our new public service that exposes live streams of Wikimedia projects is already powering several visualizations, like DataWaltz.... Read more

  2. Importing JSON into Hadoop via Kafka

    Our three key players are Hadoop, the defacto distributed batch data processing platform; JSON, a ubiquitous data format; and Kafka, which is becoming the system of choice for transporting streams of data. However, much of the data that flows into Kafka is in JSON format, and there isn’t good community support around importing JSON data from Kafka into Hadoop. This article summarizes some comm... Read more