Quick Links

The Ellogon Language Engineering Platform
www.ellogon.org

BlogBuster Demo
BlogBuster

Contact Us
Contact

BlogBuster PDF Print E-mail

BlogBuster is a tool that can be used to extract corpus from the blogosphere. BlogBuster is based on Ellogon natural language engineering platform and it integrates with in it, Mozilla's Gecko rendering engine.

BlogBuster is a cross-platform tool which means that it shouldn't be bind with a specific operating system. It is quite robust because it can handle invalid HTML code and dynamic content quite adequately. In addition BlogBuster gives you the ability to download and store locally a blog web page, by applying the proper character encoding conversions. With BlogBuster you can also retrieve and store content that is generated dynamically (i.e. as a result of JavaScript execution).

Results have shown that BlogBuster is very accurate when extracting corpora from blogs hosted in the Blogger and Wordpress, while exhibiting a reasonable precision when applied to blogs not hosted in these two popular blogging platforms.