|
How
can you tell if the Spiders are visiting?
You
have to be able to get into the access logs for your server and see who
has visited your site. You should have direct access to these, but may
be provided with a more 'user friendly' front-end or control panel type
thing. As long as you can get the information on EACH visitor to your
site, including who they are and what they looked at, you should be cookin'.
There
are log analysis tools out there for free and for sale. We won't review
them here, but the industry leader and the best that PC Mojo has seen
is WebTrends. Be prepared to spend some money to purchase the product.
WebTrends is a GREAT product, it's the one PC Mojo uses.
It's not cheap, but it isn't even close to being the most expensive software
package out there. You should go take a look at their stuff, try it out,
and buy it if you want to do your own analysis. OR... you can just have
PC Mojo do it for you!
What
are the names of the Spiders?
It
should be fairly obvious when you look at your logs 'who' the spiders
are and who the real people are. Depending on the log analysis software
you're using, you may be able to filter exclusively for the robots if
you know what their names are.
There are other people who stay right on top of all this stuff, PC Mojo
reads their pages now and then, as well as using other resources, so we
might as well send you over there. Just don't stay TOO long, come on back
to get your Mojo workin'!
Spider
Chart
http://searchenginewatch.com/webmasters/spiderchart.html
This is a great site, you should bookmark their home page, subscribe to
their newsletter, and stop in now and then to see what's cookin' search
engine-wise.
The
Web Robots Database
http://www.robotstxt.org/wc/active.html
This place has a bunch of techie info on the whole spider deal, you just
have to cruise around a little to dig out the good stuff.
How
can you kill the Spiders?
If
the idea of spiders crawling through your web site is just too icky for
you, you can squash the better behaved ones. It's simple to do, actually,
IF you have access to the root directory of your server. If you don't,
if your web is co-hosted on somebody else's domain, you can ask them to
set your web site up so that spiders are squashed right out of your site.
If they won't cooperate, and having NO spiders rooting around in your
site is important to you, it's time to shop for another place to host
your site. PC Mojo, for example. <== blatant plug,
Basically, if you DO have access to the root directory of your server,
place a file called 'robots.txt' in there with the following lines:
User-agent:
*
Disallow: /
The first
line tells robots that ALL of them are affected by the next line, which
tells them that ALL directories of your site are off limits. Easy, huh?
If you want to delve further into spider control via the robots.txt file,
please check out the following link:
http://ihttp://www.robotstxt.org/wc/exclusion.html
You should
also use the following meta tag in pages you do NOT want indexed:
<META
NAME="ROBOTS" CONTENT="NOINDEX">
|