| 统计蜘蛛详细的方法在百度上和GG上搜索了好多,但是都没有找到好的解决方法,因为自己的主机是PHP的,所以自然选择了awstats 来做日志统计。

awstats本身的扩展就可以扩展强大的功能,例如说蜘蛛的统计,以及蜘蛛路径的统计,这些功能在国外的主机上都可以很好的支持,但是对于国内站长来说,国外的虚拟主机有一定得风险,万一你的主机IP地址在被封的IP范围内,恐怕你的网站,再好,别人也访问不了。
言归正传:
首先下载并安装Awstats。
[root@sample ~]#wgethttp://prdownloads.sourceforge.net/awstats/awstats-6.8-1.noarch.rpm
--15:34:59-- http://nchc.dl.sourceforge.net/s ... ts-6.5-1.noarch.rpm
=> `awstats-6.5-1.noarch.rpm'
Resolving nchc.dl.sourceforge.net... 211.79.61.10
Connecting to nchc.dl.sourceforge.net|211.79.61.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1,095,131 (1.0M) [text/plain]
100%[====================================>] 1,095,131 155.28K/s ETA 00:00
15:35:06 (158.94 KB/s) - `awstats-6.5-1.noarch.rpm' saved [1095131/1095131]
[root@sample ~]#rpm -ivhawstats-6.8-1.noarch.rpm ← 安装Awstats
Preparing... ########################################### [100%]
1:awstats ########################################### [100%]
----- AWStats 6.5 - Laurent Destailleur -----
AWStats files have been installed in /usr/local/awstats
If first install, follow instructions in documentation
(/usr/local/awstats/docs/index.html) to setup AWStats in 3 steps:
Step 1 : Install and Setup with awstats_configure.pl (or manually)
Step 2 : Build/Update Statistics with awstats.pl
Step 3 : Read Statistics |
注意 在这个时候/etc/httpd/conf/httpd.conf 这个文件的属性一定要是755或者777.我就在这里犯过错误。
接下来对Awstats进行初始配置。
[root@sample ~]#/usr/local/awstats/tools/awstats_configure.pl ← 运行初始配置脚本 ----- AWStats awstats_configure 1.0 (build 1.6) (c) Laurent Destailleur -----
This tool will help you to configure AWStats to analyze statistics for
one web server. You can try to use it to let it do all that is possible
in AWStats setup, however following the step by step manual setup
documentation (docs/index.html) is often a better idea. Above all if:
- You are not an administrator user,
- You want to analyze downloaded log files without web server,
- You want to analyze mail or ftp log files instead of web log files,
- You need to analyze load balanced servers log files,
- You want to 'understand' all possible ways to use AWStats...
Read the AWStats documentation (docs/index.html).
-----> Running OS detected: Linux, BSD or Unix
Warning: AWStats standard directory on Linux OS is '/usr/local/awstats'.
If you want to use standard directory, you should first move all content
of AWStats distribution from current directory:
/root
to standard directory:
/usr/local/awstats
And then, run configure.pl from this location.
Do you want to continue setup from this NON standard directory [yN] ?y ← 选择y,同意安装到默认目录下
-----> Check for web server install
Enter full config file path of your Web server.
Example: /etc/httpd/httpd.conf
Example: /usr/local/apache2/conf/httpd.conf
Example: c:\Program files\apache group\apache\conf\httpd.conf
Config file path ('none' to skip web server setup):
>/etc/httpd/conf/httpd.conf ← 指定Apache配置文件所在位置
-----> Check and complete web server config file '/etc/httpd/conf/httpd.conf'
Add 'Alias /awstatsclasses "/root/wwwroot/classes/"'
Add 'Alias /awstatscss "/root/wwwroot/css/"'
Add 'Alias /awstatsicons "/root/wwwroot/icon/"'
Add 'ScriptAlias /awstats/ "/root/wwwroot/cgi-bin/"'
Add '<Directory>' directive
AWStats directives added to Apache config file.
-----> Update model config file '/etc/awstats/awstats.model.conf'
File awstats.model.conf updated.
-----> Need to create a new config file ?
Do you want me to build a new AWStats config/profile
file (required if first install) [y/N] ?y ← 选择y,同意创建一个新的对象配置文件
-----> Define config file name to create
What is the name of your web site or profile analysis ?
Example: www.mysite.com
Example: demo
Your web site, virtual server or profile name:
>www.zuobama.com ← 为统计对象创建代号(推荐以网站URL作为代号)
-----> Define config file path
In which directory do you plan to store your config file(s) ?
Default: /etc/awstats
Directory path to store config file(s) (Enter for default):
> ← 直接按回车,接受将Awstats的配置文件置于/etc/awstats下
-----> Create config file '/etc/awstats/awstats.www.centospub.com.conf'
Config file /etc/awstats/awstats.www.centospub.com.conf created.
-----> Restart Web server with '/sbin/service httpd restart' ← HTTP服务重新启动
Stopping httpd: [ OK ]
Starting httpd: [ OK ]
-----> Add update process inside a scheduler
Sorry, configure.pl does not support automatic add to cron yet.
You can do it manually by adding the following command to your cron:
/root/wwwroot/cgi-bin/awstats.pl -update -config=www.centospub.com
Or if you have several config files and prefer having only one command:
/root/tools/awstats_updateall.pl now
Press ENTER to continue... ← 按回车继续进行配置
A SIMPLE config file has been created: /etc/awstats/awstats.www.centospub.com.conf
You should have a look inside to check and change manually main parameters.
You can then manually update your statistics for 'www.centospub.com' with command:
> perl awstats.pl -update -config=www.zuobama.com
You can also read your statistics for 'www.zuobama.com' with URL:
>http://localhost/awstats/awstats.pl?config=www.zuobama.com
Press ENTER to finish... ← 按回车结束初始配置 |
然后对刚刚创建的对象配置文件进行配置。
[root@sample ~]#vi /etc/awstats/awstats.www.zuobama.com.conf ← 修改对象配置文件(文件名与上一步初始配置中设置的域名对应)
LogFile="/var/log/httpd/mylog.log" ← 找到这一行,修改日至文件所在位置(指定Apache的日志文件)
↓
LogFile="/var/log/httpd/access_log" ← 变为此状态
DirData="/var/lib/awstats" ← 找到这一行,更改Awstats数据的保存位置
↓
DirData="." ← 变为此状态,让数据与awstats.pl脚本在同目录下
Lang="auto" ← 找到这一行,将auto改为cn
↓
Lang="cn" ← 变为此状态,让语言默认为中文
SkipHosts="" ← 找到这一行,在""之间添加无效访问规则
↓
SkipHosts="127.0.0.1 REGEX[^192\.168\.]" ← 变为此状态,本地及内部的访问不做分析统计
LevelForWormsDetection=0 ← 找到这一行,将0改为2
↓
LevelForWormsDetection=2 ← 变为此状态,将来自Worm的访问也不做分析统计 |
接下来,再对httpd.conf中面向Awstats的部分进行必要设置。
[root@sample ~]#vi /etc/httpd/conf/httpd.conf ← 编辑Apache的配置文件
#
# Directives to allow use of AWStats as a CGI
#
----------------------------------------------------
Alias /awstatsclasses "/root/wwwroot/classes/"
Alias /awstatscss "/root/wwwroot/css/"
Alias /awstatsicons"/root/wwwroot/icon/"
ScriptAlias /awstats/ "/root/wwwroot/cgi-bin/"
----------------------------------------------------
找到以上水平线间的语句群,对相应路径进行正确修改。变为下面水平线间状态:
----------------------------------------------------
Alias /awstatsclasses "/usr/local/awstats/wwwroot/classes/"
Alias /awstatscss "/usr/local/awstats/wwwroot/css/"
Alias /awstatsicons "/usr/local/awstats/wwwroot/icon/"
ScriptAlias /awstats/ "/usr/local/awstats/wwwroot/cgi-bin/"
----------------------------------------------------
#
# This is to permit URL access to scripts/files in AWStats directory.
#
<Directory "/root/wwwroot"> ← 找到这一行,对相应路径进行正确修改
↓
<Directory "/usr/local/awstats/wwwroot"> ← 变为此状态 |
然后,重新启动HTTP服务,使新的设置生效。
[root@sample ~]#/etc/rc.d/init.d/httpd restart ← 重新启动HTTP服务,使设置生效
Stopping httpd: [OK]
Starting httpd: [OK] |
[root@sample ~]#vi /root/awstats.sh ← 建立统计用脚本如下:
#!/bin/bash
/usr/local/awstats/wwwroot/cgi-bin/awstats.pl -update -config=www.zuobama.com
(以上紫色字体部分的域名请根据实际情况设置)
[root@sample ~]#chmod 700 /root/awstats.sh ← 赋予脚本可被执行的属性
[root@sample ~]#/root/awstats.sh ← 运行脚本,开始进行统计(如日志量比较大的话,要花一段时间)
Update for config "/etc/awstats/awstats.www.centospub.com.conf"
With data in log file "/var/log/httpd/access_log"...
Phase 1 : First bypass old records, searching new record...
Searching new records from beginning of log file...
Phase 2 : Now process new records (Flush history on disk after 20000 hosts)...
Jumped lines in file: 0
Parsed lines in file: 55
Found 52 dropped records,
Found 0 corrupted records,
Found 0 old records,
Found 3 new qualified records. |
然后在客户端的浏览器上访问 http://服务器IP地址(或你的域名)/awstats/awstats.pl?config=www.zuobama.com即可看到详细的分析统计资料,如下图:

下面是本文最主要的问题了,给蜘蛛加入详细的统计信息,让他可以统计详细蜘蛛的访问路径,看看他是不是只访问首页,里面的页面不去访问。
[root@Centos5 ~]# cd /etc/awstats/ -----转到蜘蛛的默认配置文件的文件夹
[root@Centos5 awstats]# ls -----查看目录
awstats.www.zuobama.com.conf wwwroot ---------里面有网站的默认配置文
[root@Centos5 awstats]# vi awstats.www.zuobama.com.conf
找到
# There is also a global parameter ExtraTrackedRowsLimit that limits the
# number of possible rows an ExtraSection can report. This parameter is
# here to protect too much memory use when you make a bad setup in your
# ExtraSection. It applies to all ExtraSection independently meaning that
# none ExtraSection can report more rows than value defined by ExtraTrackedRowsLimit.
# If you know an ExtraSection will report more rows than its value, you should
# increase this parameter or AWStats will stop with an error.
# Example: 2000
# Default: 500
下面加入:
ExtraTrackedRowsLimit=20000
ExtraSectionName1="Baidu crawls - Top 10"
ExtraSectionCodeFilter1="200 304"
ExtraSectionCondition1="UA,(.*Baiduspider.*)"
ExtraSectionFirstColumnValues1="URL,(.*)"
ExtraSectionFirstColumnFormat1="<a href='%s' title='Item Crawled' target='_blank'>%s</a>"
ExtraSectionStatTypes1=PHBL
ExtraSectionAddAverageRow1=0
ExtraSectionAddSumRow1=6
MaxNbOfExtra1=10
MinHitExtra1=1
ExtraSectionName2="Google crawls - Top 10"
ExtraSectionCodeFilter2="200 304"
ExtraSectionCondition2="UA,(.*Googlebot.*)"
ExtraSectionFirstColumnValues2="URL,(.*)"
ExtraSectionFirstColumnFormat2="<a href='%s' title='Item Crawled' target='_blank'>%s</a>"
ExtraSectionStatTypes2=PHBL
ExtraSectionAddAverageRow2=0
ExtraSectionAddSumRow2=5
MaxNbOfExtra2=10
MinHitExtra2=1
然后保存 退出
重新执行/root/awstats.sh
即可以查看搜索引擎的详细蜘蛛信息了
|
发表评论