Is there a way to direct crawlers to look at and index only links in the sitemap.html or sitemap.xml ?
I have a php driven site and it has hundreds of repeat links and links I do not want or need to be indexed.
Is there a way to direct crawlers to look at and index only links in the sitemap.html or sitemap.xml ?
I have a php driven site and it has hundreds of repeat links and links I do not want or need to be indexed.
calculator posted this at 09:58—28th November 2007.
They have: 40 posts
Joined: Nov 2007
You could use the robots.txt to block the spiders from spidering the pages you don't want them to spider. What URL format do your pages have? Is there a common 'feature' that can be used in the robots.txt.
For example are they all in the /product/ folder or do they all have .php?product=1
I think that you can block all the pages with a ? from being spidered by using:
User-agent: *Disallow: /*?
'
Hope this helps.
Websites for Accountants
Eagle-Mark posted this at 03:03—29th November 2007.
They have: 17 posts
Joined: Apr 2005
Hmm, that may get rid of most of them, but not as accurate as I want. There is a mod rewrite that leaves the url with a .html on the pages that count. So Maybe...:jump:
Eagle-Mark posted this at 03:26—29th November 2007.
They have: 17 posts
Joined: Apr 2005
That didn't work, for that to work wouldn't have to be the last letter? Mine end like this:
.com/auction.php?a=28&b=136
Is there a way to exclude url that end in a number? Or to exclude everything except a .html ?
calculator posted this at 09:51—29th November 2007.
They have: 40 posts
Joined: Nov 2007
Hi Eagle-Mark,
How about:
Disallow: /*.php$'