|
How to block content scrapers
|
|
03-05-2009, 09:55 AM
Post: #1
|
|||
|
|||
|
How to block content scrapers
One possible strategy is to detect them as a bad bot. What this means is a robot script that crawls your site to find the content and disobeys the directives in your robots.txt file.
In the robots.txt file, you can ban robots from certain parts of your site as follows: Code: User-agent: *This instructs all robots to ignore the file private.php But a content scraper, may well use the robots.txt file to find places to look. So, the private.php file could be used as a Honey Pot to capture the IP address of the bot and add it to a list of IP's to block. Also, the script could output some random text that may be automatically published on the scraper's web site. Then in the header code of your template, have a php script that checks the visitor IP against the ban list and terminates the script. This method is just one strategy to help prevent content scraping. In practice it is likely to be difficult to prevent it completely. If you don't want your content to be re-published, at least apply a copyright notice in the footer and avoid publishing full RSS feeds of the content (where there is no copyright notice). MLM Hosting |
|||
|
« Next Oldest | Next Newest »
|

Search
Member List
Calendar
Help






