eekim.com > Software > TPVortex

Why Backlinks?    (01)

Reputation Management    (02)

In the scientific world, one way to measure the importance of a paper is to calculate how often it is cited by other papers. Certain publications, like the New England Journal of Medicine, track this information diligently. Determining this information is quite a chore, however. When reading a paper, it's easy to see what papers it references, but impossible to see what papers reference it. In order to calculate these numbers, you have to take an enormous collection of papers, and construct this information yourself.    (03)

Similarly, Google determines the relevancy of a Web page by how often it is referenced by other Web pages. However, Google has the same problem as the editors of the New England Journal of Medicine: If you look at a Web page, you can see what other pages it references, but not what pages reference it. In other words, you can see the forward links, but not the backlinks. Google resolves this problem the same way as the scientific community: It scans the Internet, and it constructs its own picture of the Web.    (04)

In a sense, Google has constructed the Internet's largest backlink database. However, it's not the only one. Others have done the same thing as Google on a smaller scale. DayPop and MIT's BlogDex collect backlinks from various blogs to determine the most referenced news items.    (05)

Thread Views    (06)

E-mail archiving programs use backlink databases to construct thread views of message archives. Properly constructed e-mails have unique IDs that are stored in a header called "Message-ID". When you respond to an e-mail, your software usually adds an "In-Reply-To" header to your message, with the ID of the message to which you are responding. This can be thought of as a link to the message to which you are responding.    (07)

When trying to show a tree of message that respond to a particular e-mail, once again, the backlink problem crops up. If you look at an e-mail, you can see the message to which it is responding, but not the messages that are responding to it. In order to construct a thread view of an archive, the software has to build a backlink database of e-mail messages.    (08)

Annotations    (09)

One of the criticisms of the Web is that it is difficult to annotate Web pages. Various systems have been designed to overcome this defiency, such as W3C's RDF-based Annotea. Annotea creates links between commentary and Web pages. To view annotations, you query Annotea's database for commentary linked to a particular page. In essence, you're doing a search on a backlink database.    (010)

An ad-hoc annotation system can be built by constructing backlink databases from e-mail discussion lists or blogs. Much of the content in these mediums is devoted to commentary about other Web pages. If you assume that a link from an e-mail or a blog to a Web page could be considered commentary on that page, you can construct a list of annotations by tracking all of the backlinks.    (011)

For example, suppose you send the following e-mail to a mailing list:    (012)

  From: joe@schmoe.com
  To: dodger-fans@lists.baseball.net
  Subject: Dusty Baker -- best ex-Dodger manager?

  Anybody read this article?

  http://fakenews.com/2002/worldseries.html

  It claims that Mike Scioscia is the best ex-Dodger manager in
  baseball.  Although I love Scioscia, that's bogus!  Clearly, Dusty
  Baker is the best!

  -Joe    (013)

Suppose this e-mail is archived at http://baseball.net/lists/dodger-fans/0103.html, and that the archiving software extracted the URL from the above message and stored the following link in a database:    (014)

  http://baseball.net/lists/dodger-fans/0103.html
    => http://fakenews.com/2002/worldseries.html    (015)

If you are reading the article, and you want to see all annotations on that article, you can query the database for backlinks and see that the above e-mail had some commentary about the article.    (016)

I have created a filter to the mail archiving program, MHonArc, called mhpurple.pl, which extracts link information into a text file. I've also created a demonstration of a tool that uses the resulting backlink data to display annotations to a Web page.    (017)

Categories    (018)

Many of today's standards for categorizing documents, such as Topic Maps, essentially create links between a document and a category. Many communities using Wikis for collaboration take advantage of this link-like behavior to create an ad-hoc categorization system.    (019)

Wikis are one of the few applications today that take advantage of explicit backlink databases. You can view all of the Wiki pages that link to another Wiki page, often by clicking on the title of that page.    (020)

Wikis do not have built-in categorization systems. To overcome this, many communities create a new Wiki page for each category, prefixed by the term "Category". They then mention that page on every Wiki page that falls under that category. To view all pages that belong to a particular category, you simply view all of the backlinks to the corresponding Category Wiki page.    (021)

For example, suppose there are two Wiki pages -- LosAngelesDodgers and SanFranciscoGiants. Both of those pages belong to the category "baseball." To categorize these pages, you would create the Wiki page CategoryBaseball, and put a link to this category on both the LosAngelesDodgers and SanFranciscoGiants pages. When you viewed the backlinks to CategoryBaseball, these two pages would appear.    (022)

TPVortex    (023)

Clearly, backlinks are a useful knowledge management construct. Unfortunately, those who want to exploit backlinks must generally build their own backlink databases. The backlink databases that do exist do not interoperate with each other.    (024)

I'm currently designing an open source, distributed backlink database, called TPVortex. The preliminary version is designed to store backlinks generated by a single Web site. Future versions will share this backlink data with other backlink databases distributed throughout the Internet, using a peer-to-peer protocol.    (025)

My intention is to take applications that use backlinks -- from Wikis to e-mail archivers -- and modify them to use TPVortex as its backlink database. TPVortex could also be used to add backlink functionality to tools that do not already have this capability, and would enable entirely new kinds of tools, like the aforementioned annotation engine.    (026)

Having a single API for retrieving any kind of backlink data would offer additional benefits to information repositories. It would provide a standard mechanism for retrieving threads from any kind of forum application, be it an archived mailing list or a Web-based bulletin board system. It would also allow you to apply useful visualization tools, such as TouchGraph and Conversation Map, and apply them towards any kind of backlink data, be it Wiki backlinks or mailing list threads.    (027)