If you're a writer, you hope your words will be etched in stone for eternity. If you're a blogger, you're happy if someone stumbles on your writings a few days after you posted them. Blogs, partly because they often consist mainly of commentary on things that have just happened, and partly because of the way they are structured (most recent postings first, making it easy to ignore everything you wrote before), are a transient medium. Rarely is a blog post treated as permanent. We write, then we forget.
Take me, for instance. I've been writing a blog since 2002. Every day, more or less, I get up early and read something online or in a newspaper until my blood boils. Then I sit down and write until it's out of my system. Then I usually go back to bed. This is something of a ritual: At the time of writing, I've composed 2,006 posts.
Not surprising, then, that I can barely remember what I wrote yesterday, let alone a year ago. Multiply this by 1.5 million (the volume of global blog posts a day, according to Technorati, a blog tagging and search service) and you get some idea how much is being written and promptly forgotten about, even by its authors. While not every blog is a stream of consciousness, the journal-like approach means they can look that way over time.
Of course, a blog post isn't lost. You can always find it with a search engine such as Google. And tagging -- where users label their photos, blog posts or favorite music with single- or multiple-word tags -- has made it easier for everyone to find and group stuff together. These tags can be specific to the blog or they can be lumped together with tags from other blogs, using special search engines such as Technorati.
But this works only up to a point. My tags may be different to your tags. And I may (as I have in recent months) gotten a bit lazy about the tags I add to what I write. Frankly, my tags are a mess and not something I like to think about too much. Result: Readers are unlikely to find them useful and therefore don't flit from new articles to old ones as much as I'd like. And the chances of someone stumbling upon my blog because of tags remain remote. Blog posts, left to themselves, tend to have a short shelf life.
Briton Nigel Cannings thinks he has the solution to this: automatic tagging. He sees value in all those old blog posts of mine (he may be the only one) and reckons all that old content out there is a repository of wisdom that just needs to be sorted out better. Tagging it ourselves, he thinks, just isn't enough because we don't always see what we've written in a broader context. 'Manual tagging is the first step' to sorting and storing blogs and other online content better, he says, 'but it still relies upon people understanding themselves -- whatever they've already written about, and how their content fits in with other people's content.'
His idea is to mine the words on blogs and other unstructured online content, to extract from them headings, categories and names that could then create an alternative set of tags -- an index of sorts -- that could help readers find related articles both from the same blog and elsewhere. Using software to delve down into postings to grab the important words and the topics they refer to, a sort of table of contents of blogs is created, making it easier for readers to browse. 'What we want to do,' Mr. Cannings says, 'is use people's ordinary words and the way they write to create tags and therefore open up information to a wider readership over time.'
So he's come up with something called Jiglu (www.jiglu.com), with 'tags that think' as its, ahem, tagline. By adding a few lines of code to your Web site, wiki (collaborative Web site, such as Wikipedia) or blog, Jiglu will sift through your content and dig out what it thinks are key topics, people and links on a particular page. These headings will appear in a box on the page. Click on any topic and a list of the articles (blog posts or whatever) that address that particular topic pops up in a separate window. You can also view an overall treemap (a sort of mosaic of labeled squares, the size of each square determined by the frequency that the topic it refers to appears in the Web site's content) of all the main topics on that particular site. In the text itself, dotted blue lines appear under words that are included in the index: Click on one and a window pops up with links to and the first paragraphs of those posts that match that category or word.
All of this works pretty smoothly. Installing the code on a blog hosted by one of the main blog services such as TypePad or Blogger is pretty easy (other services will follow, Jiglu says). The service takes a few hours to get round to trawling your site, but once it has, the topics appear in a little box that you can place anywhere on the page. The pop-up window works pretty well too.
But the key issue, of course, is how good a job Jiglu makes of automating the process of assigning tags to your content. The answer is: It's not what I'd expected. I guess I'd been looking for Jiglu to do the work I'm too lazy to do: to assign, for example, the tags 'tagging,' 'Web 2.0', 'taxonomy,' 'Jiglu,' 'start-up,' 'categorization,' 'term extraction,' 'search' and so on to an article like this, so I don't have to. But it doesn't. Instead, in tagging my blog, it came up with stuff like company names, products names, and topics like a 'bad idea' (it's not clear whether this is what I've been writing about or having), 'tipping point' (a term I use way too much) and, for some reason, 'universe.' In the 'People' category, it did a good job of extracting names, but missed Aung San Suu Kyi, tagging her as a topic rather than a person, perhaps because she insists on having four words in her name, and misinterpreting the old Bangkok airport Don Muang as a person.
This may sound like a failure, but I don't think it is. The simplicity of setting it up means that it can complement existing tags. Tags are subjective, best served hot by the author; a sort of table of contents a la Jiglu is probably best done by someone or something else, and served cold. True, the Jiglu tagging engine needs to get smarter before I would get really excited about it, but I think there's potential in anything that helps link what we write today to what we (and others) might have written yesterday.
Blogging, and the Web, may be time-critical, but that doesn't mean we should forget the wisdom of what we wrote in the past. Blogs may still be regarded as ephemeral but I've learned enough to know that some insights are timeless (but not necessarily mine).
翻译见下页
[1] [2] 下一页