Post Reply 
How to block content scrapers
03-05-2009, 09:55 AM
Post: #1
How to block content scrapers
One possible strategy is to detect them as a bad bot. What this means is a robot script that crawls your site to find the content and disobeys the directives in your robots.txt file.

In the robots.txt file, you can ban robots from certain parts of your site as follows:

Code:
User-agent: *
Disallow: /private.php

This instructs all robots to ignore the file private.php

But a content scraper, may well use the robots.txt file to find places to look.

So, the private.php file could be used as a Honey Pot to capture the IP address of the bot and add it to a list of IP's to block. Also, the script could output some random text that may be automatically published on the scraper's web site.

Then in the header code of your template, have a php script that checks the visitor IP against the ban list and terminates the script.

This method is just one strategy to help prevent content scraping. In practice it is likely to be difficult to prevent it completely.

If you don't want your content to be re-published, at least apply a copyright notice in the footer and avoid publishing full RSS feeds of the content (where there is no copyright notice).

MLM Hosting
Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump: