Title | Automatic extraction of structure, content and usage data statistics of web sites |
Publication Type | Conference Paper |
Year of Publication | 2010 |
Authors | Paparrizos, Ioannis K., Vassiliki A. Koutsonikola, Lefteris Angelis, and Athena Vakali |
Editor | Chignell, Mark H., and Elaine G. Toms |
Book Title | HT |
Publisher | ACM |
ISBN Number | 978-1-4503-0041-4 |
Keywords | classification, Crawling, Structure Content and Usage data, Web Mining Algorithm |
Abstract | In this paper we present a web mining tool which automaticallyextracts the structure, content and usage data statistics of websites. This work inspired by the fact that web mining consists ofthree axes: web structure mining, web content mining and webusage mining. Each one of those axes is using the structure,content and usage data respectively. The scope is to use thedeveloped multi-thread web crawler as a tool to automaticallyextract from web pages data that are associated with each one ofthose three axes in order afterwards to compute several usefuldescriptive statistics and apply advanced mathematical andstatistical methods. A description of our system is provided aswell as some experimentation results. |
Automatic extraction of structure, content and usage data statistics of web sites
PDF: