Sunday, 20 September 2009

Top Web Trend 1 of 5: Structured Data

Structured data has always played a role in libraries. Think MARC and MARCXML. Therefore, any increase in the importance of, growth of, and reliance on, structured data will have an impact on libraries. If the process of adding structure to data is increasingly automated, or becomes a seamless part of building online content, this will also have a huge impact on libraries.

Using structured data libraries will be able to build content that is richer as well as more accurate. It also means that information can be harvested and reused in more meaningful ways. For example, XBRL is structured data for financial reporting. Using XBRL companies can code up their financial reports and all the various regulatory authorities can automatically harvest these reports and process the information they contain without the need for humans to "read" and decipher them. A lot of companies and governments are hoping this will significantly reduce reporting and compliance costs. For libraries it means we could more deeply and accurately harvest mashed up information.

Structured data by its very nature is created through establishing links (structures) between bits of data. These links are based on meaningful associations and as a result they help turn data into information. One example cited in the ReadWrite Web top 5 web trends is Calais. The library I work for is already using Calais to categorise content relating to specific people, places, companies, facts, and events.

But what does this means, how does it work, and why is it important? Well, there is a good description on the Drupal OpenCalais project site which says "Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well. The web service is free for commercial and non-commercial use."

Another good example which helps explain Calias (and structured data) relates to the Calais Wordpress blog plugin which is called Tagaroo. With Tagaroo, as you are write your post, is automatically analyzes it and suggests both tags and images from Flickr to enhance your blog. Other applications would include linking relevant geospatial information to information on an entity or event.

There is also a promotional video on the Calias web site. Yes it is a promotional video, but it does provide an easy to understand overview. And by the way, Calais has been developed by Thomson Reuters so there is some serious money been thrown at building structured data on the web.

No comments: