Web crawler profiling and containment through navigation pattern mining

By Louren?o, A.; Belo, O.

Proceedings of the IADIS International Conference WWW/Internet 2009, ICWI 2009



Web profiles may support the analysis of Web site popularity as well as the detection of unwanted and illegitimate activities such as fraud. Yet, profiling techniques often fail to account for different usage, processing regular sessions, crawler sessions and proxy sessions in a similar way. This paper proposes an integrated approach to Web crawler profiling and containment. A data Webhousing embracing standard crawler detection techniques supplies the profiles to be further analysed through navigation pattern mining. The ability to adapt crawler identification to particular Web scenarios, the incremental analysis of navigation patterns, and the capacity of monitoring server performance and preventing crawler-related hazards are considered main strengths of this approach. Experiments over six-month Web server logs of a non-commercial Web site evidence the benefits of focused Web profiling and, in particular, of this approach.



Google Scholar: