Detection of unsolicited web browsing with clustering and statistical analysis

PhD thesis

Chwalinski, P. 2014. Detection of unsolicited web browsing with clustering and statistical analysis. PhD thesis Middlesex University School of Science and Technology
TypePhD thesis
TitleDetection of unsolicited web browsing with clustering and statistical analysis
AuthorsChwalinski, P.

Unsolicited web browsing denotes illegitimate accessing or processing web content. The harmful activity varies from extracting e-mail information to downloading entire website for duplication. In addition, computer criminals prevent legitimate users from gaining access to websites by implementing a denial of service attack with high-volume legitimate traffic. These offences are accomplished by preprogrammed machines that avoid rate-dependent intrusion detection systems. Therefore, it is assumed in this thesis that the only difference between a legitimate and malicious web session is in the intention rather than physical characteristics or network-layer information. As a result, the main aim of this research has been to provide a method of malicious intention detection. This has been accomplished by two-fold process. Initially, to discover most recent and popular transitions of lawful users, a clustering method has been introduced based on entropy minimisation. In principle, by following popular transitions among the web objects, the legitimate users are placed in low-entropy clusters, as opposed to the undesired hosts whose transitions are uncommon, and lead to placement in high-entropy clusters. In addition, by comparing distributions of sequences of requests generated by the actual and malicious users across the clusters, it is possible to discover whether or not a website is under attack. Secondly, a set of statistical measurements have been tested to detect the actual intention of browsing hosts. The intention classification based on Bayes factors and likelihood analysis have provided the best results. The combined approach has been validated against actual web traces (i.e. datasets), and generated promising results.

Department nameSchool of Science and Technology
Institution nameMiddlesex University
Publication dates
Print20 Feb 2015
Publication process dates
Deposited20 Feb 2015
Output statusPublished
Accepted author manuscript
Permalink -

Download files

Accepted author manuscript
  • 13
    total views
  • 7
    total downloads
  • 0
    views this month
  • 1
    downloads this month

Export as