blog
data vs time . com
experiments in time-series data visualization
Detailed Visibility
  • Quickly spot aberrations or patterns in large data sets by visualizing and interacting with many time-series simultaneously.
  • When you see something interesting, zoom in to investigate further.
  • Perfect for anomaly detection and crisis diagnosis.
new! there is some currency exchange rate data exposed at: http://datavstime.com:5004. Use the proxy adapter.
Overview
  • data vs time is a new tool for visualization of time-varying data.
  • The concept is a bit novel. Time series are represented as blocks which can be grouped, re-grouped, moved, aggregated up, expanded out etc. along the axes of the multi-dimensional time-series labels.
  • DvT makes use of WebGL for highly performant rendering. This will allow for more than a thousand time-series to be displayed and interacted with simultaneously at a high refresh rate (after I've optimized a few things a bit more...).
  • The ability to visualize many related time-series is useful because aggregates can sometimes mask aspects of the data that are important. Visualizing many disparate time-series simultaneously can also be useful - even if you don't immediately understand everything being displayed, you can still spot interesting aberrations or patterns quickly and when you see something of interest, you can zoom in to investigate further. DvT was built with anomaly detection / crisis diagnosis in mind.
  • Blocks will be able to display different visualizations with different strengths and weaknesses. Currently, blocks can display filled area charts or a generalization of this - horizon charts - which are a good visualization at smaller scales.
  • Easily connect to a prometheus, influxdb or a custom data source.
  • My starting point for the concept was Mike Bostock's cubism but DvT adds navigation, multiple columns, grouping and additional interactivity.
Current Limitations
  • This is an early release. It's useful, but still buggy and incomplete.
  • The only supported browser is Chrome.
  • For the sake of implementation simplicity, there are a number of places in the code that effectively iterate over the entire selected set of time-series and then proceed to throw most of this work away. As a consequence, performance when large sets of time-series are selected is not very good yet. This should be relatively easy to fix up, I just haven't done it.
Feedback
  • All feedback (good and bad) is greatly appreciated and questions encouraged. Use the google group where appropriate, or send me an email directly.
  • This project will live or die based on how useful I think people find it.. so if you like where it's headed and want to see the vision completed, spread the word!
Non-Obvious Things
There are no docs yet, but here are a few notes on some things that might not be obvious:
  • I'm still forming an opinion on horizon charts. I find them to be a good visualization at "height 1" (7 css pixels) and possibly "height 2" (15 css pixels). At this scale, colour intensity changes seem to dominate and this is an easy thing for your brain to interpret. At the same time, higher precision information is there on the screen if you want it - you just need to concentrate for a bit longer at the area of interest. At "height 3" (31 css pixels) and above, I find that I notice both color intensity changes and spatial changes simultaneously and that the cognitive overhead in interpreting what's going on outweighs the benefit of the extra precision in most (though not all) cases.
  • Try dragging the time axis to scroll through time.
  • Data is retrieved from the data store at the (CSS) screen resolution. A benefit of specifying a timeframe as a "pixel size" rather than a "window size" (you have the choice) is that if you re-size the window, the data doesn't need to be re-fetched.
  • Typically, series have many associated labels and usually, displaying all of them is more confusing than helpful. DvT employs a heuristic to decide what labels/values are overlaid on the series by default and their ordering. You can change this.
  • Series that were specified using the Labels panel are retrieved one-by-one from the data store. The order in which data is retrieved is prioritized based on current view region, the current timeframe, and the the data already retrieved for each time-series.
  • You can enter a query manually - possibly one that returns many series. It is important to realize that data for these series is NOT retrieved individually. Be careful moving a small portion of series from such a query onto a custom page - you may end up fetching much more data from the database than is actually required.
  • To specify what series appear on a custom page, drag a series or series group you want on the page onto the page name in the Pages panel. You can also create a new page out of a group or series by dragging it onto the Pages panel title bar.
  • The "counter → rate" button automatically detects any monotonically increasing series in the currently selected set and applies the rate function to them. This is very useful (though you'll probably wish it applied the irate function instead - i plan to make this configurable). Grouping by __name__ is also often very useful.
  • Functions and aggregations are not supported by the Proxy adapter - you need to connect to a Prometheus data source. Aggregations apply to groups - if you have none, selecting a global aggregate function will do nothing. UI related to this needs improving ...
Privacy
  • The time-series data you are visualizing as well as meta-information about this data (e.g. label names, number of labels etc) remains private - does not leave your browser.
  • Anonymous information about what features of the software you are using is sent to the datavstime.com server. Note that this is not associated with a session in any way - infact DvT does not use (currently has no need for) sessions. Collecting this information allows me to better understand how people are using DvT so as to make better decisions about where to direct future development effort.
License
Screen Shots

The data set I've been playing with isn't very interesting, so these examples are not particularly compelling (anyone want to help me out here?)

Currency exchange rates
~ 500 time-series (it is often very useful to group by __name__)
© Matt Howlett 2015