Tue, Feb 13, 2007
I got to put on my hacker hat for a day (a very rare occurrence for me these days) last Wednesday at the Wikithon. After trolling around for ideas, I decided to work on WikiAnalytics with MatthewOConnor. We ended up dominating the competition and winning the contest for best hack. (So what if there were only two teams eligible for two prizes?) (LRI)
Our driving question was: How can we measure the health of a Wiki? I don't think there is one best way to use a Wiki, but there might only be three or four. If we can start teasing out patterns of Wiki usage, we can better understand how people collaborate with Wikis, which will help us better facilitate Wiki communities and improve Wiki software. Our goal was to tease out the patterns. (LRK)
We used data from 266 public SocialText workspaces and SocialText's internal corporate workspace. You can read the details of our brainstorming and work on the SocialText STOSS Wiki. Our approach was to simplify our tasks so that we could have something to show at the end of the day. It was decidedly practical, but it also reflected a deeper philosophy about WikiAnalytics. Start simple and evolve. You can learn interesting things from even simple measurements. (LRL)
We chose to focus on two types of analysis: page name and graph (link) analysis. I hacked on the former; Matthew on the latter. (LRN)
Frequent followers of this blog have heard me say it before: LinkAsYouThink is what makes Wikis powerful. The better your page names, the more interlinked your repository will be as you LinkAsYouThink. In order to see if I could measure "good" page names, I looked at three things: (LRO)
The hypotheses are straightforward. Shorter names are better. Names with fewer tokens (words) are better. Names without non-alphanumeric characters are better. (This last hypothesis is complicated by internationalization.) (LRS)
You can read the results of my analysis. The workspaces on the index page are ordered largest to smallest. The top two workspaces are full of spam and can be safely ignored. The numbers on the index page are buggy; click through to the individual pages to see the correct numbers. (LRT)
Matthew studied the graph characteristics of the Wikis, specifically: (LRU)
Islands of one are orphan pages (not linked to anywhere) and are undesirable. Large islands are better (or at least more interesting) than small ones. (LRX)
You can view Matthew's results on his site. (LRY)
To give you an idea of what the stats mean, let's look at four Wikis: (LS0)
The mean number of characters and number of tokens for page names on each Wiki were: (LS5)
On the surface, the two Wikis in the middle -- stoss and speakers -- seem to have hit the sweet spot for page names: between two to three words per name. Since stoss is meant to be a collaborative workspace for a larger community, this seems to be a healthy number. The speakers Wiki is a repository of potential speakers. Since the majority of pages consists of people's names, the numbers (two, sometimes three words in a page name) make sense. (LSA)
The remaining two Wikis diverge enough from this minute data set that we can infer some different patterns of usage. st-rest-docs documents SocialText's REST API, so there are a lot of one word page names representing method names. Even though the average number of tokens is smaller, the average name length is comparable to the two Wikis in the middle. This also makes sense, given that the methods in a REST API are actually URI paths, which can get somewhat long. (LSB)
On the surface, ivrwiki seems to exhibit the classic signs of a newbie dumping ground, with page names that are too long to be useful. However, if you dig deeper, you can see that that's not the case. The standard deviation of number of tokens is quite large (4.2), indicating a flat distribution curve. In other words, while there are a lot of long names, there are also a lot of short names. If you dig even further, you'll see that the community is using the Wiki as a question repository, and questions naturally have lots of words. Additionally, there seems to be a lot of more traditionally "Wiki-like" behavior on that Wiki. (LSC)
This was no accident. The reason I'm showcasing ivrwiki is that Matthew identified it as an "interesting" Wiki from his graph analysis. Look at the numbers. There are three sizes of islands: 19 of one page, one of 16 pages, and one of 353 pages! That's one big island! It indicates a fairly tight set of linkages across the majority of the pages on a Wiki. Dig a bit deeper, and you can see the hub of the cluster: the Knowledge Base Index page. It links to every page in the knowledge base, and every page in the knowledge base links back to this page. (LSD)
The st-rest-docs Wiki exhibits similar behavior -- one big island of 81 pages. This makes sense, given that this Wiki represents documentation, which is structured in a similar way to the ivrwiki knowledge base. (LSE)
The stoss Wiki is the most Wiki-like of the four when you dig into the graph analysis. There are five sizes of islands, the largest containing 10 pages. The distribution is fairly regular -- based on my guess of what "regular" should be, at least. To really get a sense of what "regular" should be, we'll need to identify several Wikis that we consider to be "Wiki-like," and examine those numbers. (LSF)
Finally, look at the numbers for the speaker Wiki. The numbers are in reverse of the other Wikis. There is basically no clustering; all of the pages consist of islands in and of themselves. At first glance, this is surprising. You would expect it to look somewhat like ivrwiki and st-rest-docs. The reason for the lack of clustering is that this Wiki relies on SocialText's tagging interface for navigation. Tags could be treated as a type of link, but we don't treat them that way in our analysis. (LSG)
As with any simplified analysis, there are always caveats. A lot of them are specific to the Wiki implementation. For example, several people at SocialText use the stoss Wiki as a blog, which creates long page names and thus skews the statistics. Other Wikis may be similar to the speakers Wiki in that they use tags as navigational links. (LSI)
There's an open question as to whether or not to consider a Wiki a directed graph or not. We chose the former, but you can make a good argument that the SocialText Wiki acts as a non-directed graph, or at least a bidirectional one, because BackLinks are displayed on the page itself. The same holds true with any other Wiki depending on the navigational context. If I start at the home page and start navigating around, I can often use the browser back button to go back, or at worst, I can click on "BackLinks" to figure out the context. (LSJ)
I'm not sure the page name analysis is that interesting by itself. I think it gets very interesting when applied to the specific islands on a Wiki. People may be using a Wiki in a number of different ways, as demonstrated by the ivrwiki. Analysis on each individual cluster will potentially surface the different kinds of behaviors on a Wiki, which is more appropriate than trying to slap on a single archetype if one does not exist. (LSK)
Finally, what level of clustering is healthy? In systems theory, networks that are either too tightly clustered or too lightly clustered are problematic. With enough analysis, we may be able to speculate on the right number for Wikis. (LSL)
Matthew and I will release our code at some point, and we'll hopefully have some time to follow up on it as well. Specifically, I'd like to examine a lot of other Wikis, starting with the ones that BlueOxenAssociates hosts. (LSM)
There were a lot of other hacks at the Wikithon that were cool. My favorites were IngyDotNet's Social Zork (which was not only hilarious, but is actually potentially useful) and ShawnDevlin's Word Cloud, which I hope to use on other Wikis. ChristineHerron wrote a good summary of the day's festivities. (LSN)
/tech/wiki | Posted at 12:42pm
A blog about collaboration, community-building, and the various goings-on at Blue Oxen Associates, with occasional digressions on food and other vital matters.
May 2009 (3)
April 2009 (2)
March 2009 (3)
February 2009 (4)
December 2008 (1)
October 2008 (2)
August 2008 (1)
June 2008 (2)
April 2008 (1)
March 2008 (2)
February 2008 (10)
November 2007 (14)
October 2007 (4)
September 2007 (3)
August 2007 (7)
July 2007 (2)
June 2007 (7)
May 2007 (10)
April 2007 (14)
March 2007 (17)
February 2007 (12)
January 2007 (9)
December 2006 (3)
November 2006 (11)
October 2006 (23)
September 2006 (20)
August 2006 (22)
July 2006 (5)
June 2006 (19)
May 2006 (8)
April 2006 (5)
March 2006 (12)
February 2006 (10)
January 2006 (6)
November 2005 (14)
October 2005 (14)
September 2005 (10)
August 2005 (21)
July 2005 (2)
May 2005 (10)
April 2005 (7)
March 2005 (3)
February 2005 (7)
January 2005 (8)
December 2004 (5)
November 2004 (11)
October 2004 (7)
September 2004 (1)
August 2004 (9)
July 2004 (16)
June 2004 (1)
May 2004 (3)
April 2004 (8)
March 2004 (8)
February 2004 (12)
January 2004 (8)
December 2003 (12)
November 2003 (12)
October 2003 (3)
August 2003 (15)
July 2003 (20)
Blue Oxen Associates
The Watering Hole
Hyperscope
Blog Roll
(via Bloglines)
extisp.icio.us
Comments
Comments disabled until future notice. If you'd like to contact me, use my i-name (=eekim).