Detecting web applications that aren’t converting with Riemann

We have been playing around with Riemann at uSwitch to do some of our monitoring. One of the core metrics we track is the number of customers who are currently converting on our website. If this number drops to zero it usually indicates something is broken. Each time a customer converts an event is sent to our Riemann server.

I added a stream to our riemann config to count recent conversions, create a new summary event and notify us whenever that number drops to zero.

  (streams
    (where (and (service"app") (tagged"conversion"))
      (moving-time-window
        30
        (smap (fn [events]
                (let [conversion-count (count events)]
                   {:service"app-conversions"
                    :time (unix-time)
                    :metric conversion-count
                    :state (if (> conversion-count 0) "ok" "warning")
                    :description"Conversions in the last 30 seconds"
                    :ttl 30}))
                 index
                 (changed-state
                  (fn [event]
                    (warn "Conversion state has changed to: " event)))))))

Sadly this didn’t work as I expected! If there are no conversions than no events are sent to Riemann and the moving-time-window code block is not executed. This is discussed further here.

You can work around this by however by using expired events. Events in Riemann have a time to live (TTL) associated with them. If an updated event is not received within the TTL the event will be expired – indicating in this case that no further conversion have taken place since it was last fired. You can add a stream to catch this event expiration and notify whoever is interested.

Just like above we add a stream to sum recent conversions and create a summary event:

  (streams
   (where (and (service "app") (tagged "conversion"))
          (moving-time-window
           30
           (smap (fn [events]
                   (let [conversion-count (count events)]
                     {:service "app-conversions"
                      :time (unix-time)
                      :metric conversion-count
                      :state "ok"
                      :description "Conversions in the last 30 seconds"
                      :ttl 30}))
                 index
                 (changed :state
                          (fn [event]
                            (info "notify that conversions are happening again" event)))))))

Then we add a stream that waits for the summary event to expire:

  (streams
   (expired
    (where (service "app-conversions")
           (with {:state "warning"
                  :metric 0
                  :ttl 30
                  :description "No conversions in the last 30 seconds"}
                 (fn [event]
                   (index event)
                    (warn "Notify about no conversions" event)))))))

The above config is on github.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s