Ezilon.com - Target Your Audience, be Seen in Your Region

robots.txt and meta for sitemap only?

You are viewing this site as a guest. Join our community to get your questions answered and share knowledge. Active members may advertise and ask for a website critique.

They have: 17 posts

Joined: Apr 2005

Is there a way to direct crawlers to look at and index only links in the sitemap.html or sitemap.xml ?

I have a php driven site and it has hundreds of repeat links and links I do not want or need to be indexed.

calculator's picture

They have: 40 posts

Joined: Nov 2007

You could use the robots.txt to block the spiders from spidering the pages you don't want them to spider. What URL format do your pages have? Is there a common 'feature' that can be used in the robots.txt.

For example are they all in the /product/ folder or do they all have .php?product=1

I think that you can block all the pages with a ? from being spidered by using:

User-agent: *
Disallow: /*?

'

Hope this helps.

They have: 17 posts

Joined: Apr 2005

Hmm, that may get rid of most of them, but not as accurate as I want. There is a mod rewrite that leaves the url with a .html on the pages that count. So Maybe...:jump:

They have: 17 posts

Joined: Apr 2005

That didn't work, for that to work wouldn't have to be the last letter? Mine end like this:
.com/auction.php?a=28&b=136

Is there a way to exclude url that end in a number? Or to exclude everything except a .html ?

calculator's picture

They have: 40 posts

Joined: Nov 2007

Hi Eagle-Mark,

How about:
Disallow: /*.php$'