Check your robots.txt file
Validate your file syntax to ensure it's correctly formatted.
Validate your file syntax to ensure it's correctly formatted.
Here are some common questions about robots.txt files and how to use them.
You don't need to submit a robots.txt file to search engines. Crawlers look for a robots.txt file before crawling a site. If they find one, they read it first before scanning your site.
If you make changes to your robots.txt file and want to notify Google, you can submit it to Google Search Console. Use the Robots.txt Tester to paste the text file and click Submit.
Search engines and other crawling bots look for a robots.txt file in the main directory of your website. After generating the robots.txt file, add it to the root folder of your website, which can be found at https://yoursite.com/robots.txt.
The method of adding a robots.txt file depends on the server and CMS you are using. If you can't access the root directory, contact your web hosting provider.
You can add your Sitemap to the robots.txt file to make it easier for bots to crawl your website content. The Sitemap file is located at http://yourwebsite/sitemap.xml. Add a directive with the URL of your Sitemap like this:
User-agent: *
Disallow: /folder1/
Allow: /image1/
Sitemap: https://your-site.com/sitemap.xml
The Allow directive counteracts the Disallow directive. Using Allow and Disallow together, you can tell search engines to access a specific folder, file, or page within a disallowed directory.
Example: search engines are not allowed to access the /album/ directory
Disallow: /album/
After filling in the User-agent directive, specify the behavior of certain (or all) bots by adding crawl instructions. Here are some tips:
1. Don't leave the Disallow directive without a value. In this case, the bot will crawl all of the site's content.
Disallow: / # allow to crawl the entire website
2. Do not list every file you want to block from crawling. Just disallow access to a folder, and all files in it will be blocked from crawling and indexing.
Disallow: /folder/
3. Don't block access to the whole website unless necessary:
Disallow: / # block access to the entire website
Make sure essential website pages are not blocked from crawling: the home page, landing pages, product pages, etc.
Specify the name of the bot to which you're giving crawl instructions using the User-agent directive.
To block or allow all crawlers from accessing some of your content, use an asterisk (*):
User-agent: *
To allow only Google to crawl your pages, use:
User-agent: Googlebot
Keep in mind that each search engine has its own bots, which may differ in name. For example, Yahoo's bot is Slurp. Google has several bots for different purposes:
The robots.txt syntax consists of directives, parameters, and special characters. Follow these rules for proper functionality:
1. Each directive must start on a new line with only one parameter per line.
User-agent: *
Disallow: /folder1/
Disallow: /folder2/
2. Robots.txt is case-sensitive. Match the case of folder names exactly.
Correct
Disallow: /folder/
Incorrect if the actual folder name is lowercase
Disallow: /Folder/
3. Do not use quotation marks, spaces at the beginning of lines, or semicolons after lines.
Disallow: /folder1/;
Disallow: /“folder2”/
Disallow: /folder1/
Disallow: /folder2/
For more information on the robots.txt file, visit:
I know sometimes it's hard to contribute on open source projects but I'm here to help you. You can contribute to this project by adding new features, fixing bugs, or improving the existing code creating a pull request or issue on our GitHub repository .
Contribute on open source projects is a great way to learn new things, improve your skills, and help the community. You can start by reading the project's documentation, checking the issues, and creating a pull request. If you have any questions, feel free to contact me on Twitter / X
Thanks for your interest in contributing to this project. I'm looking forward to seeing your contributions. Let's make this project even better together .