Category Archives: Graphing

Plotting Xively timeseries data using Rickshaw

Xively (formerly Cosm, formerly Patchube) now has a Javascript library for their API. I have been meaning to learn the Rickshaw library for awhile as it comes highly recommend from several people. It turned out to be a doddle to use. I just mixed in the moment.js library to do some date munging and was able to knock up some graphs in an hour or so.

Air Quality Egg

AirQualityEgg NO2 data
AirQualityEgg NO2 data

Current cost electricity usage

 

Current cost electricity data
Current cost electricity data

These two data source aren’t pumping out data anymore but you can see an example here.

 

The code

The code is also on github.

Plotting Air Quality Egg data using R

I recently got an Air Quality Egg. The Air Quality Egg is a open source hardware project to measure air quality hyper-locally. My egg is located on the balcony of my flat in Brixton, London. The Air Quality Egg device has sensors which read NO2, CO, humidity & temperature and posts the data to Cosm. You can view readings and simple graphs on the Cosm feed page.

Air Quality Egg on the balcony of my flat
Air Quality Egg on the balcony of my flat

Plotting sensor data using R

I wanted to see the trends in the data so I wrote a script in R to curl the data from the Cosm API and plot it using GGPlot.

Here is the script:

Here are some example graphs of NO2 over a single day and several days. (Note I don’t have the latest sensor updates so the readings may be a bit off)

Plot of NO2 for a single day
Plot of NO2 for a single day
Plot of NO2 for a 4 day period
Plot of NO2 for a 4 day period

Graphing YouTube Viewing Figures for the SICP Lecture Series with R

Yesterday at Forward our resident data analysis and stats guru Alex Farquhar gave an introductory lunchtime session on the R programming language. R is an interesting data analysis and statistical language and seems like a programmer friendly alternative to Excel for that type of work. In this post I describe how I used a combination of Ruby and R to graph the viewing figures for the Structure and Interpretation of Computer Programs Lecture Series on YouTube.

Extracting View Statistics from YouTube

I was quite keen to try out R on a problem before I forgot everything I had learned! Luckily I had this Ruby script that I knocked up recently lying around:

require 'rubygems'
require 'net/http'
require 'rexml/document'
require 'active_support'

url = 'http://gdata.youtube.com/feeds/api/videos?q=MIT+6.001+Structure+and+Interpretation+Lecture&max-results=50'

xml_data = Net::HTTP.get_response(URI.parse(url)).body

doc = REXML::Document.new(xml_data)
titles = ActiveSupport::OrderedHash.new

doc.elements.each("//entry") do |ele|
  title = ele.elements["media:group"].elements["media:title"].text
  view_count = ele.elements["yt:statistics"].attributes["viewCount"]
  titles[title] = view_count
end

keys = titles.keys.sort{ |a,b|  String.natcmp(a,b) }.select{ |title| title =~ /Lecture/}

csv_file = File.new("sicp.csv", "w+")
csv_file.puts("Title, Year, Views")

keys.each do |title|
   view_count = titles[title]
   csv_file.puts("#{title}, #{view_count}\n")
end

This script queries the YouTube API and fetches the viewing statistics for the Structure and Interpretation of Computer Programs Lecture Series. The script is pretty simple and consist of the following steps:

  1. Perform a http request against the YouTube API
  2. Extract the viewing statistics from the returned XML
  3. Sort the result (using natural sort) and filter out non-lecture videos
  4. Write the results out to a CSV file

This returns the following data:

Title, Year, Views
Lecture 1A | MIT 6.001 Structure and Interpretation, 1986, 51327
Lecture 1B | MIT 6.001 Structure and Interpretation, 1986, 12240
Lecture 2A | MIT 6.001 Structure and Interpretation, 1986, 8858
Lecture 2B | MIT 6.001 Structure and Interpretation, 1986, 5561
Lecture 3A | MIT 6.001 Structure and Interpretation, 1986, 4393
Lecture 3B | MIT 6.001 Structure and Interpretation, 1986, 2935
Lecture 4A | MIT 6.001 Structure and Interpretation, 1986, 3115
Lecture 4B | MIT 6.001 Structure and Interpretation, 1986, 2558
Lecture 5A | MIT 6.001 Structure and Interpretation, 1986, 2540
Lecture 5B | MIT 6.001 Structure and Interpretation, 1986, 2933
Lecture 6A | MIT 6.001 Structure and Interpretation, 1986, 2692
Lecture 6B | MIT 6.001 Structure and Interpretation, 1986, 6275
Lecture 7A | MIT 6.001 Structure and Interpretation, 1986, 2157
Lecture 7B | MIT 6.001 Structure and Interpretation, 1986, 1363
Lecture 8A | MIT 6.001 Structure and Interpretation, 1986, 2940
Lecture 8B | MIT 6.001 Structure and Interpretation, 1986, 1887
Lecture 9A | MIT 6.001 Structure and Interpretation, 1986, 3795
Lecture 9B | MIT 6.001 Structure and Interpretation, 1986, 3636
Lecture 10A | MIT 6.001 Structure and Interpretation, 1986, 2068
Lecture 10B | MIT 6.001 Structure and Interpretation, 1986, 2672

Plotting the YouTube Statistics with R

Now we have the data from YouTube, it is time to load the data into R and plot it. I used R Studio which offer a friendly easy to use environment to learn and work with R. Here is the R script to load the CSV data and plot it:

sicpdata = read.csv("sicp.csv")

plot(sicpdata$Views, type="b", main="YouTube views of the SICP lecture series", xlab="Lecture", ylab="Number of views")

As you can see the code looks quite similar to a traditional programming language such as JavaScript. The interesting bit is the sicpdata variable which is a Data Frame object. Data Frames act like tables or Excel workbooks and contain the rows and columns of data you are working with. Drawing the graph is simply a call to the plot function passing the data we want to plot. Here we plot sicpdata$Views which is the Views column from our dataset.

Next, I wanted to annotate some of the points on the graph to show the names of some of the lectures:

titles = c()
for (title in sicpdata$Title) ( titles = c(titles, strtrim(title, 11)))
identify(sicpdata$Views, labels=titles)

There isn’t enough room on the graph to show the full titles (Lecture 1A | MIT 6.001 Structure and Interpretation etc), so instead only the lecture numbers (Lecture 1A) are shown.The titles are copied and trimmed from the sicpdata$Title data frame into a vector (New vectors are created by using the c() function.). The identify function is then used to annotate the graph. At this point things got weird – the R Studio environment goes into interactive mode and you select the points you want to annotate.

Annotating point using RStudio

When you click on the points the corresponding title from the titles vector is drawn on the graph.

The Graph

After all that here is the finished graph:

SICP Lecture Series Views on YouTube
SICP Lecture Series Views on YouTube

Unsurprisingly the first video has the most views (50,000+). After that there is a steep decline until the number of views evens out at around 2-3,000 views per lecture. There is a weird spike half way through for Lecture 6B with 6,000 views. Finally, the last video has 2,672 views. This is more views than several of the other lectures in the series which suggest people were skipping to the end!

Programming language stereotypes venn diagram

Inspired by Blame it on the voices’
Religions venn diagram of the day, I put together a programming language stereotypes venn diagram. Stereotypes are supplied by the Google instant search results for the questions: why is language? and why is language so?:
I drew the venn diagram together using a little processing.js script which was pretty straightforward and fun.

  var circleSize = 300;
  void setup()
  {
    background(255)
    size(900, 600)
    textAlign(CENTER);
    stroke(204, 102, 0);
    smooth();
  }

  void draw()
  {
    noFill();
    ellipse(400, 200, circleSize, circleSize)
    ellipse(300, 400, circleSize, circleSize);
    ellipse(500, 400, circleSize, circleSize);

    ellipse(800, 200, 150, 150);

    language = createFont("Helvetica",12,true);
    textFont(language);
    textSize(24);

    fill(0, 0, 0);
    text("Lisp", 800, 100)
    text("Java", 400, 30)
    text("Ruby", 100, 400)
    text("Javascript", 720, 400)

    textSize(20);
    text("Great", 800, 180)
    text("Used for AI", 800, 220)

    text("slow", 400, 330)

    text("hard", 400, 170)
    text("always updating", 400, 210)

    text("popular", 330, 300)
    text("red", 280, 400)

    text("used", 470, 300)

    text("bad", 520, 380)
    text("important", 520, 420)

  }