Answer
Jul 12, 2019 - 03:15 AM
There are many things you can do to prevent unwanted spiders, but the more extreme you get, the higher the odds are that you'll mess up legit crawlers as well. The primary reason to keep competitors out, is to avoid them replicating your site structure, page targets, and text. No matter how hard you try, they still can visit your site like a regular user and figure out a number of things.
Robots.txt: you can block specific spiders in your robots.txt file. For example, some might block Majestic, Screaming Frog, A Hrefs, Moz, Deep crawl, Alexa, or other spidering tools, but it usually means you can't use them on your site, which limits the SEO you can do. Also, most of these tools allow the user agent to be changed to look like Google.
You can also choose which spiders to allow, but keep in mind that Google isn't the only spider in town. Bing, Baidu, and Yandex will want access, too. Even Google uses multiple spiders/users agents, including one for adsense, one for images, one for news, one for android, and some lesser known ones like Feedfetcher and Read Aloud.
Submit Sitemaps directly: search engines want xml sitemap files and you should list it in your robots.txt file, but if you are paranoid about competitors seeing all your pages, you could instead directly submit the xml files to the search engines so only they know where they live.
IP Address/User Agent: some services help you identify Google's known IP addresses, but Google will use other ways to see your site, too (Chrome, Android, and datacenters less known). Often these services are provided by companies that also guard against DDOS attacks or other types of attacks or hacks. If they notice a flurry of activity, they can forbid access to the site. They can also block out other countries from visiting your site.
If you know your competitor's IP address (maybe from an email or something), you could block them or play tricks on them by serving up things differently or even giving them a custom warning.
Cloaking: this one I don't recommend. Cloaking is where you identify a user based on user (in this case Google) by agent or IP address and serve them something different than what other users see. Search engines do not like this.
Add New Comment