Friday, March 27, 2009

Social Collider - the most seriously cool, and heavyweight twitter visualisation yet

Thanks guys for the post and the work. Stunning and, just possibly, very very useful

Social Collider is a new collaboration with Sascha Pohflepp, a JavaScript visualization to reveal cross-connections between conversations on Twitter. The project launched just 2 days ago and has been commissioned by Google for their Chrome Experiments collection and was produced by the friendly peeps at Instrument. Social Collider acts as a metaphorical instrument which can be used to visualize how memes are created and how they propagate. Ideally, it might catch the Zeitgeist at work.

Social Collider in action

Concept

In December Sascha and me were both independently contacted about contributing to the Google Chrome Experiment project. We decided to rack our heads together on this one, since we both much rather liked to build something which would qualify as a browser experiment as per brief, but also could become something bigger & more worthwhile over time. In several meetings over coffee we slowly narrowed down Sascha's general visualization idea to the level of trying to show one's own data traces in context or contrast to things going on around us, which would possibly influence our moods and actions without us realizing consciously. Initially we wanted this context to be relatively removed both thematically and in terms of scale (personal vs. societal) and were thinking about plugging into energy consumption, other environmental datasets, the weather or also news headlines. The problem with the first two still is obtaining data in a sufficiently granular format to be actually meaningful (i.e. not summed values per year or aggregated per country).Massive uptake of grassroots data portals like Usman Haque's Pachube will hopefully change that in the not too distant future. In hindsight, not focusing on the weather also proved to be a good thing since Use All Five already covered that topic with their fantastic smalltalk experiment, which is also part of the Chrome collection. Since we both have been vivid Twitter users for quite a while now, we knew that people are using it to discuss really anything, incl. the topics mentioned above. The realtime granularity combined with the ad-hoc discussion element is something I've been increasingly treasuring on Twitter because it adds personal contexts, opinions and feedbacks to the “data”, be it the weather, music, politics, geekery etc. Of course other platforms like Facebook have that too, but for our purposes Twitter was the better option since it does allow for socially far more widespread conversations (it doesn't have any concept of groups, tribes or networks. Anyone can talk & reply to anyone they wish without jumping through any hoops). It also has the benefit of well (better) thought-out APIs.

Visualization

Technically, as well as for time reasons, we decided to create a pure clientside JavaScript visualization. This decision provided a great creative challenge for us, but also limited our choice of easy-to-access compatible webservices even more. To satisfy the instant gratification part of being a browser experiment, we also had to exclude any data coming from APIs requiring 2-step authentication and we too made a conscious decision to avoid the dreaded Password Anti-pattern. The last missing key ingredient needed was a strong metaphor. As any hobby psychologist knows, good metaphors are a key enabler for (successful) visualizations. On the other hand the majority of network visualizations today are based on the ”rocks & sticks” metaphor (thanks Mike & Tom! :), basically assuming nodes as particles and connecting them with lines.This visual language has been culturally lifted straight off mathematical graph theory text books and of course it's hard (if not impossible) to totally break free of that established mental image, especially when the data we're dealing with is literally particular (microcontent) and loosely connected. Yet to add a twist to this classic, we decided to approach the visualization more like the creation of a painting. We would use slow reveals to give the user more time to better trace all identified connections, as well as place it in a conceptual environment and use a visual language which directly references particles. With the Large Hadron Collider launch from only a few months earlier still glimmering on our mental horizons, this became the perfect (if obvious) metaphor…

Mapping

Within this space, particles are mapped two dimensionally based on their position in time (vertical axis) and search query ID (horizontally). Search results of each query are automatically connected vertically via smooth, curvy B-splines (using a JS port of this) in the same color. If results from different queries are somehow related (see below), a spiral is first drawn around the older particle but will eventually connect the other related particles horizontally.The size of the spiral corresponds to the number of cross-connections the related message/tweet has accumulated. Hovering with the mouse pointer over particles displays their related message. Clicking on a node opens the selected tweet on twitter.com…

Data mining

Since Twitter messages have a hard limit of 140 characters, the community has come up with various syntactic sugar to add meaning & metadata. At current, there're 5 major potential connection axes in Twitter messages: @usernames, direct @-replies, #hashtags, retweets (RT), URLs posted. Using exclusively Twitter's Search API (via JSONP), we initially allow users to search for usernames, generic phrases or hashtags.These original search results are then analyzed for each of the 5 data axes and if matched, queued for secondary search requests. As this “spidering search” is ongoing the visualization space is being decimated into columns based on the number of successful search queries. If a query did not return anything its column is being removed to maximize available screen space. Once all queries have been executed, the connections between retrieved messages are slowly revealed.

Features & Shortcomings

Any visualization has strong points and shortcomings. Our aim was to fill a current niche and provide the means to create a fairly macroscopic picture of Twitter activity, by attempting to trace how content & memes spread through the network. Unlike the more ubiquitous line and bar charts of other Twitter visualizations, ours was supposed to give a qualitative, not necessarily quantative, overview. I also believe we have somewhat succeeded with this as these examples clearly show (click on the images to see bigger versions on flickr):

Social Collider 1h after launch

Social Collider 1 hour after launch

Social Collider 16 post launch

Social Collider 16 hours post launch

SXSW panel by John Tolva

SXSW panel by John Tolva

Guardian Open Platform launch

Guardian Open Platform launch by jaggeree

 BookCamp & PaperCamp weekend

BookCamp & PaperCamp weekend marked by the pink & red clusters near the top

Visually, the spirals have the effect of pen scribbles to mark hotspots, messages which have resonated in the community and have triggered re-tweets, replies or generally just kickstart a new meme (e.g. identified by a new #hashtag). The maps also show how quickly some of these trends propagate, spawning a multitude of messages in close succession (in time) and so causing clusters. However, we're also aware of various shortcomings of the visualization. These mainly become obvious when one wants to drill further down into the data. Things like filtering or zooming are not possible at the moment, but would certainly add a whole new level of functionality & usefulness… For example, the above mentioned clusters caused by events (e.g. #sxsw) or “major” news can be identified easily, but currently not easily examined without zooming functionality. Yet without attempting to sound defensive, it's good to remember this so far was primarily just a browser experiment after all… In fact, depending on the complexity of the returned data set, it's quite easy to bring your browser to its knees (especially Firefox… Sorry guys, I still love you! :).This in turn has most likely to do with the multitude of setTimeOut() threads spawned to slowly draw the connection curves and the sheer number of nodes in the SVG canvas used to create the visualization. Creating all particle nodes (sometimes several thousands) with 3 mouse event listeners attached each doesn't help performance either! So my cheeky side is quite happy to have created a challenging environment for the browser(s) too and I honestly was blown away how well Chrome kept its calm, regardless… Furthermore there's also room for improvement on the data analysis side: Because of the various URLis.gd creator) in use, sometimes links pointing to the same URL are not matched. Also the Twitter search API is not case sensitive so it can also happen that the wrong shortened URLs are associated in secondary queries. Both issues can be overcome, but again it's one of those things we simply didn't had time to implement so far. shortners (e.g. see my own

Future plans

I'm still thinking about adding support for other fairly ubiquitous services like flickr & del.icio.us, not only because there're strong overlaps with Twitter, but also because it would give potentially interesting insights contrasting/complementing Twitter messages with photos taken at similar times or links saved on delicious, which might provide further reading to links posted on Twitter… For example, one of our plans for flickr integration was to use the fantastic Pixstatic library to create colour fields in the background of the visualization, taking the average color of each retrieved image and blending them into each other, similar in style of a heatmap visualization.However, the stable version of Google Chrome currently hasn't got any support for ImageData access to pull off this feature. But the good news is that it's being worked on and is already implemented in the Chrome 2.0 beta version… It would be great to save & share generated visualizations by storing them in a link with original search term and timestamp, so they can be recalled anytime… (like: http://socialcollider.net/?q=from:toxi&t=1237561532) There're lots of other little ideas floating around and we're currently scoping out details how to take development further in practical terms, i.e. picking a license & hosting and preparing source code for an open source release… Please stay tuned!

No comments: