Add specific rules for different crawlers.
User-agent: *
User-agent: Googlebot
User-agent: Bingbot
Disallow: /admin/
Disallow: /private/
Disallow: *.pdf
Allow: /
Allow: /public/
Allow: /blog/
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/news-sitemap.xml
Crawl-delay: 1
Crawl-delay: 10
Disallow: *.pdf$
Disallow: /*?
Disallow: /search*
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /customer/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=
Sitemap: https://yourstore.com/sitemap.xml
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
Allow: /wp-content/uploads/
Disallow: /*?s=
Disallow: /feed/
Disallow: /comments/
Sitemap: https://yourblog.com/sitemap.xml
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Disallow: /internal/
Disallow: /staff/
Disallow: /temp/
Disallow: *.pdf$
Disallow: *.doc$
Sitemap: https://company.com/sitemap.xml
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /search?
Allow: /docs/
Allow: /guides/
Allow: /tutorials/
Sitemap: https://docs.example.com/sitemap.xml
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /thank-you/
Disallow: /confirmation/
Disallow: /download/
Disallow: /*?utm_
Sitemap: https://landing.com/sitemap.xml
User-agent: *
Disallow: /
# Use this for:
# - Development sites
# - Private websites
# - Under construction pages
Search Engine | User-Agent | Purpose | Example Usage |
---|---|---|---|
Googlebot | Web crawling | Most important for SEO | |
bingbot | Web crawling | Second largest search engine | |
Slurp | Web crawling | Yahoo search results | |
🦆 DuckDuckGo | DuckDuckBot | Privacy-focused search | Growing alternative search |
Twitterbot | Link preview generation | Social media optimization | |
facebookexternalhit | Link preview crawling | Social sharing optimization | |
LinkedInBot | Professional content | Business networking | |
📱 Mobile Google | Googlebot-Mobile | Mobile-first indexing | Mobile SEO optimization |
Wrong:
Disallow: *.css
Disallow: *.js
Why it's bad: Google needs CSS and JavaScript to properly render and understand your pages.
Impact: Poor mobile-friendly test results, incorrect page rendering in search results.
Wrong:
Sitemap: /sitemap.xml
Right:
Sitemap: https://example.com/sitemap.xml
Why it matters: Relative URLs in sitemap directives may not work properly.
Good practice:
Disallow: /*?s=
Disallow: /search?
Why block these: Search result pages create duplicate content and waste crawl budget.
Benefit: Focus crawler attention on your important content pages.
When to use:
User-agent: *
Crawl-delay: 1
Use cases: Slow servers, shared hosting, high-traffic sites.
Warning: Google ignores crawl-delay, but other search engines respect it.
Place your robots.txt file at https://yoursite.com/robots.txt
Use the robots.txt Tester tool to check for syntax errors and test specific URLs
Check for common syntax errors like missing colons, incorrect spacing, or invalid directives
Watch for crawl errors in Search Console that might indicate robots.txt issues
A robots.txt file is a text file that tells search engine crawlers which pages or sections of your website they should or shouldn't crawl. It's placed in the root directory of your website and follows the Robots Exclusion Protocol.
The robots.txt file must be placed in the root directory of your website. For example, if your website is example.com, the robots.txt file should be accessible at example.com/robots.txt.
While not mandatory, a robots.txt file is highly recommended for SEO. It helps search engines understand your site structure and prevents them from crawling unnecessary pages, which can improve your crawl budget efficiency.
Crawl delay is the number of seconds a crawler should wait between requests to your server. This helps prevent overwhelming your server with too many requests at once. Most modern search engines ignore this directive, but some crawlers still respect it.
No, robots.txt is not a security measure. It's a polite request to search engines, and compliant crawlers will respect it. However, malicious crawlers may ignore it. For true access control, use password protection or server-level restrictions.
Yes, including your sitemap URL in robots.txt is a good practice. It helps search engines discover your sitemap and understand your site structure better. You can include multiple sitemap URLs if needed.
Update your robots.txt file whenever you make significant changes to your website structure, add new sections you want to block, or when you launch new features that affect crawling behavior.