Usually, when I get access to a Google Analytics account, I quickly try to find out how accurate (or polluted) the data is. Most of the time there is some spam traffic in Google Analytics. For smaller websites, it sometimes accounts for up to 30% of the sessions!
The side effect of spam traffic? You can’t really rely on any general data. It will corrupt your general understanding of the visitor profile. Let’s fix this.
What is spam traffic?
Some “dark web” digital marketers try to get visibility (traffic) for their websites. And one way for them is to be noticed via your Google Analytics account. They are using web-crawlers to generate “fake” visits all around the web.
Fake visits might boost your traffic (and ego) but are useless to your business.
Is my Analytics Data affected?
Probably yes. To be sure, log into your Google Analytics account and browse to Aquisition > All Traffic > Referrals.
Focus on all “strange referrals sources” that have:
- A too low or too high bounce rate.
- A very short average time spent on site
- A very high ratio of new visitors.
For this website, almost 50% of traffic comes from referral, and 90% of this traffic is useless because it’s spammy. The only valid source, is the blurred out one…
We need to clean this to get a better view of the data.
Excluding spam traffic in Google Analytics reports
In order to clean up your views, we first need to make sure we have a backup. We are going to keep the old “raw” and polluted view and make a new one to compare them. Unfortunately, the new view will not hold any historical data.
Duplicate your Google Analytics view
- Go to the admin tab and add a new view in your property.
- In this new view: Go to View Settings and check “Bot Filtering: Exclude all hits from known bots and spiders”. (This will help but not fix entirely the problem).
Find your valid hostnames
Sometimes the spammers do the job poorly and do not set a hostname, or use strange one.
In the old “raw” view go to Audience > Technology > Network and switch to hostname. Also make sure to have a wider time range selected. Maybe a year.
You should see a wide variety of URLs here. Now copy all the valid ones; usually your domain name and its aliases.
Then we will create a filter in the new view, which will only include traffic with those valid hostnames.
Go to Admin tab > Filters > Create a new filter with a custom filter type. And an “include” with “hostname” filter field. And fill up the pattern filter. In the case of my website, the filter pattern would be “yanngraf|sylk|archive.org”. Note: Sylk.ch was the initial domain of this site.
Exclude additional Crawler spam
Now we need to go one step further and exclude additional spam from web crawler. In filters, create a new custom filter. But this time use the exclude option. Choose Campaign Source and paste the code below in the field.
And voilà, now that your data is clean, it’s time to make sure you start tracking your conversion.
And let me know if you have questions or need help to setup the tracking on your website.