What Are Mirror Sites?
Mirror sites are usually duplicate versions of your main website. They have a set of pages which are the exact copy of your main website. These set of pages are usually hosted on a separate domain or a subdomain within your site.
There are few reasons such a site can occur. You might be testing one of your new designs on a live test environment on a subdomain and the development team forgot to block search engine access to it. Your old site might be on a different domain or subdomain and yet not redirected to your new website.
The Problem With Mirror Sites
Can outrank the main website – Sometimes mirror pages outrank your main site and eat away your traffic numbers. This can lead to wrong reporting and can lead to wrong decision making.
Poor User Experience – Mirror sites are created for a purpose other than serving your audience. They are either test pages or is there to serve a small segment of your audience. When people end up on your mirror sites either through search engines or because you link to it they will not have an optimal user experience. This can make the user have a bad experience with your brand or service.
Loss of revenue – Since mirror sites are not optimized for the audience and conversions the user who lands on a mirror site might not have a way of buying your product. It has a low perceived. This will lead to loss of revenue.
Eats up crawl budget – Since mirror sites have a substantial number of pages which are duplicate of your main site they eat up a lot of search engine crawl budget without adding any value to search engine index or your site traffic numbers.
Bandwidth and Maintenance Costs – Needless mirror sites take your server bandwidth and maintenance costs. Ideally, you want your server to serve your most important users and move your mirror sites to a different server so that it doesn’t affect the performance load times for your target audience.
How to find all mirror sites of your main site
Method 1: Meet your development team
The best way to get a list of all your mirror site is to meet your development team and ask them to walk you through all the possible versions of your website which are live. You can start by asking the below questions.
- Was the site running on a different domain before?
- On which subdomains does your testing environment run? What are those pages?
- What have you done with the old version of the website?
- Are you coming up with a new design? Are you testing the design on any server?
Method 2: Ask Google for all subdomains
You can find various subdomains of your site as known to Google by using smart search queries.
To get started, make a list of all legit subdomains you know of. Once you make a list use an expression template like below in Google.
Site:*.yourdomain.com -www -‘known subdomain 1’ -‘known subdomain 2’……….
For example – A site like Practo I know that news.practo.com and blog.practo.com are the subdomains I know of. I will find other subdomains which are under the radar by using an expression like the one below in Google.
site:*.practo.com -www -blog -news
Once you find each of these subdomains you can open each one of them and see which among them are mirrors to your original site.
Method 3: Run DeepCrawl To Find All Sub Domains
Step 1: Start a new Project using DeepCrawl
Step 2: Select “Crawl Sub Domains” in the settings section.
Once the crawl is done DeepCrawl reports all subdomains. You will need to hunt out subdomains which are mirror sites.
How to deal with mirror sites?
Identify the purpose of the mirror site.
If it is something your team needs to be up in order to test a design or an upcoming feature then all you have to do is disable search engine access to it by adding a Robots.txt file to the root folder of the subdomain (mirrorsite.yourdomain.com/robots.txt) with the following code.
User-agent: * Disallow /
Make sure you don’t link to such pages from your main site as Google does index pages even if they are blocked crawler access from robots.txt if you link to them.
If the mirror site is an old version of the site which is indexed by Google and serves no purpose currently or in the future the best approach is to 301 redirect the pages to its respective page on the main site.