How to Block Bots using Robots.txt File?

 

The robots.txt file is a simple text file placed on your web server which tells web crawlers that if they should access a file or not. The robots.txt file controls how search engine spiders see and interact with your webpages. Due to improperly configured ROBORTS.TXT files, the search engine is prevented from indexing a website. Also, the robots file is used to block the search engine from indexing a website. It can also prevent a website from being listed on a search engine.

In some cases, this bots hits on your website and then it consumes a lot of bandwidth and as a result of this your website will slow down. It is very important to block such bots to prevent such situations. There is a chance to get a lot of traffic on your website which can cause problems such as heavy server load and unstable server. Installing Mod Security plugins will prevent these types of issues.

 

Correcting the Robots.txt from Blocking all websites crawlers

The ROBOTS.TXT is a file that is typically found at the document root of the website. You can edit the robots.txt file using your favorite text editor. In this article, we explain the ROBOTS.TXT file and how to find and edit it. The following is the common example of a ROBOTS.TXT file:

User-agent: *

Disallow: /

The * (asterisk) mark with User-agent implies that all search engines are allowed to index the site. By using the Disallow option, you can restrict any search bot or spider for indexing any page or folder. The “/” after DISALLOW means that no pages can be visited by a search engine crawler.

By removing the “*” from the User-agent an also the “/” from the Disallow option, you will get the website listed on Google or other search engine and also it allows the search engine to scan your website.  The following are the steps to editing the ROBOTS.TXT file:

 

1) login to your cPanel interface.

2) Navigate to the “File Manager” and go to your website root directory.

Robots.txt File

 

3) The ROBOTS.TXT file should be in the same location as the index file of your website. Edit the ROBOTS.TXT file and add the below code and save the file.

User-agent: *

Disallow: /

 

You can also block a single bad User-Agent in .htaccess file by adding the below code.

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]

RewriteRule .* – [F,L]

 

If you wanted to block multiple User-Agent strings at once, you could do it like this:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} ^.*(Baiduspider|HTTrack|Yandex).*$ [NC]

RewriteRule .* – [F,L]

 

You can also block specific bots globally. To do this please login to your WHM.

Then you would need to navigate to Apache Configuration >> Include Editor >> go to “Pre Main Include” >> select your apache version (or all versions) >> then insert the code below and click Update and then restart apache.

<Directory “/home”>

SetEnvIfNoCase User-Agent “MJ12bot” bad_bots

SetEnvIfNoCase User-Agent “AhrefsBot” bad_bots

SetEnvIfNoCase User-Agent “SemrushBot” bad_bots

SetEnvIfNoCase User-Agent “Baiduspider” bad_bots

<RequireAll>

Require all granted

Require not env bad_bots

</RequireAll>

</Directory>

 

This will definitely reduce the server load and will help you to improve your website performance and speed.

 

If you need any further help, please do reach our support department.

 

Leave A Comment