Automatic extraction of structure, content and usage data statistics of web sites

Title	Automatic extraction of structure, content and usage data statistics of web sites
Publication Type	Conference Paper
Year of Publication	2010
Authors	Paparrizos, Ioannis K., Vassiliki A. Koutsonikola, Lefteris Angelis, and Athena Vakali
Editor	Chignell, Mark H., and Elaine G. Toms
Book Title	HT
Publisher	ACM
ISBN Number	978-1-4503-0041-4
Keywords	classification, Crawling, Structure Content and Usage data, Web Mining Algorithm
Abstract	In this paper we present a web mining tool which automaticallyextracts the structure, content and usage data statistics of websites. This work inspired by the fact that web mining consists ofthree axes: web structure mining, web content mining and webusage mining. Each one of those axes is using the structure,content and usage data respectively. The scope is to use thedeveloped multi-thread web crawler as a tool to automaticallyextract from web pages data that are associated with each one ofthose three axes in order afterwards to compute several usefuldescriptive statistics and apply advanced mathematical andstatistical methods. A description of our system is provided aswell as some experimentation results.

PDF:

Useful Links