URI::Sequin - Extract information from the URLs of Search-Engines
use URI::Sequin qw/se_extract key_extract log_extract %log_types/;
$url = &log_extract($line_from_log_file, 'NCSA');
$log_types{'MyLogType'} = '^(.+?) -> .+$'; $url = &log_extract($line_from_log_file, 'MyLogType');
$keyword_string = &key_extract($url);
($search_engine_name, $search_engine_url) = @{&se_extract($url)};
This module provides three tools to aid people trying to analyse Search-Engine URLs. It’s meant mainly for those who want to analyse referrer logs and pick out key information about site visitors, such as which Search-Engine and keywords they used to find the site.
The functions and globals provided (and exported by default) from this module are:
This will pick out the referring URL from a line of a logfile. The ‘type’
can be one of the built in types or can be a user-created one. For more
information, see %log_types
below. This subroutine accepts a
scalar, and returns a scalar.
This will try and determine the keywords used in $url. It accepts a scalar and returns a scalar. Should nothing be found, it returns an undefined value.
This will try and determine the name of the Search-Engine used and its URL. It accepts a scalar, and returns an array containing firstly the Search- Engine’s name and secondly the Search-Engine’s URL. Should the URL appear not to be from a Search Query, it returns a reference to an empty array.
There are five built-in logfile types already in this hash. They are:
It’s easy to add another one. Simply add a key to the hash, with a value
that is a regex. Parenthesise the part that is the referring URL, as the
script uses $1
to obtain the URL. (see the example in the
Synopsis section).
Peter Sergeant <pete_sergeant@hotmail.com>
Copyright 2000 Peter Sergeant.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.