Remove or edit the crawler block on Textpattern index pages

admin-ajax

If you’re using the default Textpattern CMS theme included from 4.5.0 onwards, it’s likely that some of your index pages will not be included in reputable search engine results. That’s not necessarily a reflection of the quality/quantity of your content, it’s more likely to be be the way the template code is constructed. When I refer to index pages, that typically refers to URLs where multiple articles are listed in succession. There’s a school of thought in search engine optimisation (SEO) that dictates you should minimise duplicate content in your website. These reputable search engines – and I’m generalising a little – prefer unique content, and if you have a website with a front page of articles, sections with articles and individual articles, you could have three or more instances of the same article.

To that end, Textpattern’s default theme has directives for search engines to index certain URLs and ignore others. The code in question is:

<txp:if_search>
<meta name="robots" content="none">
<txp:else />
<txp:if_category>
<meta name="robots" content="noindex, follow, noodp, noydir">
<txp:else />
<txp:if_author>
<meta name="robots" content="noindex, follow, noodp, noydir">
<txp:else />
<meta name="robots" content="index, follow, noodp, noydir">
</txp:if_author>
</txp:if_category>
</txp:if_search>

Let’s figure this out line by line. The opening tag, `<txp:if_search>`, will run its contents if the URL being viewed is search results – the contents in this case being `<meta name=”robots” content=”none”>`, which is a directive for search engines and crawlers (robots, if you like) to neither index or follow the links on the page, essentially ignoring the content and treating it as transient. The `<txp:else />` tag that follows starts a new condition that runs the innards if the URL is not search results. So, essentially, if it’s search results – ignore it, and if it’s not search results then start processing the new tags after the `<txp:else />` part.

This is a great example of tags in tags, or tag nesting if you prefer. There’s no official term for it, so choose whatever works for you. If the URL is not search results, there’s a check to see if the page is a category listing – if it is, then the directive instructs the search engine (robot) to not index the content, but to follow the links elsewhere, opt-out of the Open Directory Project at dmoz.org and opt-out of the Yahoo! Directory. The same goes for the next tag in a tag – if the URL is the result of an author article listing, it does the same thing: no indexing, follow the links, opt-out of Open Directory Project and Yahoo! Directory. The final part of the tag tree is, essentially, if none of the above are true, index the content, follow links, but still opt-out of Open Directory Project and Yahoo! Directory.

Any or all of the above can be flipped, edited or completely removed. If you’d prefer to let the search engine, reputable or otherwise, figure out what’s best for its listings, then deleting the above code from your page template will achieve this result.

Looking for quality Textpattern Hosting? Look no further than Arvixe Web Hosting and use coupon TEXTPATTERN for 20% off your first invoice!

Tags: , , , , , , , , , | Posted under Textpattern | RSS 2.0

Author Spotlight

Pete Cooper

Pete Cooper has been using Textpattern since 2005. Textpattern is his preferred CMS weapon of choice. Its logical and flexible approach to content management makes Pete happy, as does its lightweight core and helpful user community. Pete's website - petecooper.org - runs on top of Textpattern and chronicles his day-to-day experiences from his home near the Atlantic in north Cornwall, United Kingdom.

Leave a Reply

Your email address will not be published. Required fields are marked *