Google Analytics started life as a web server log analysis package developed by Urchin Software, which Google acquired in 2005. Google Analytics quickly came to dominate the website analytics space because it was offered for free at a time when site analytics software like Web Trends and Omniture cost hundreds of thousands of dollars. Practically all small sites and blogs adopted Google Analytics, because they could not afford anything else. Today, Google Analytics represents the vast majority (76% of top 1,000 sites, 88% of top 100k sites) of the site analytics market share.
Publishers, ecommerce sites, and marketers rely on Google Analytics data to make decisions that affect their business. But very few understand the vulnerabilities of Google Analytics and how those put them at risk. As with any technology, certain features may be useful; but they can also be exploited by others for fraudulent gain. In the case of Google Analytics (“GA”), the vulnerability is the ability for any third party to write data into it, simply by having the UA identifier (publicly visible in the source code of a webpage). Of course, being able to write data into GA is a “feature;” they even have full online documentation to show you how to do it.
However, not requiring any passwords or means of authentication in order to write data is a security loophole that has remained unpatched for 16 years, despite the fact that it continues to be actively exploited. Google Analytics 4, which was launched October 2020, finally adds the most basic form of authentication — the use of an API key — before data can be written. The loopholes in older versions of GA remain since it takes time for the millions of sites to upgrade to GA4. There are many ongoing exploits of GA, but this article will focus on the ones that directly affect marketers and the business decisions they make based on insights from Google Analytics.
“Phantom traffic” – the appearance of traffic in GA
Site owners are often desperate for traffic – more traffic means “better.” That gave rise to large criminal enterprises profiting from selling traffic. Of course, the traffic was not from humans, but from bots, because no one can compel large masses of humans to visit specific websites on command. But as long as traffic buyers believed the traffic was from humans visiting their site, they kept buying it. A simple Google search — “buy traffic” — turns up 1.7 billion search results, hundreds of thousands of traffic sellers that you can buy traffic from with a credit card, Paypal, or now cryptocurrencies.
In some cases, traffic sellers don’t even send real bot traffic. After all, why spend the effort making botnets and incurring bandwidth costs from the bots actually loading webpages, when you can you just trick Google Analytics to show phantom traffic. This is exactly how GA is being exploited now — fraudsters are sending false data into GA to make it appear they are delivering tons of traffic when they are actually delivering no traffic at all. The video demo below illustrates how this simple exploit can show more than 13,000 simultaneous visitors on a site, when there literally is not even a single visitor in reality.
MORE FOR YOU
“Phantom clicks and sources” – the appearance of performance in GA
The fake data being written into Google Analytics can also be very detailed, using Urchin Tracking Module (UTM) parameters, a throwback to its creator. For example, the perpetrator can write any parameter like “utm_source=Facebook” and GA faithfully records that as a “social” visit. If the url contains “utm_medium=cpc” it is labeled as paid search; if “referrer=google” it is labeled as organic search, and so on. Note in the video example above, the social traffic is marked as “Instagram Stories, Facebook, and Twitter” even though all of it was fake; and “active pages” are literally nonsensical strings of letters and numbers, to illustrate that anything can be passed into any field in GA. These are all examples of the false data written into GA; not a single real visit.
This technique is also how fake traffic sellers advertise their services — it’s called “referral spam.” Instead of email spam, the most efficient way to get in front of potential customers looking for more traffic for their sites is by inserting data right into their GA. The screen shot below shows some classic examples like “referrer=www.Get-Free-Traffic-Now[.]com.” When the analytics folks see that, they are curious and visit the site. Some of them turn into customers of the fake traffic seller. Look at the thousands of traffic selling vendors in this handy compilation.
Marketers who use traffic numbers to gauge the performance of their digital marketing campaigns should also be aware of these vulnerabilities of Google Analytics and how they are being exploited. Some of the “performance” you see in GA may be from bots clicking on your ads; and some of it could be phantom traffic. These exploits may remain hidden for years. But when fraudsters mess up, they come to light and are obviously not real. For example, some marketers have seen greater than 100% click through rates – more clicks arriving on their site than there were ad impressions. Some have seen click throughs to their sites even after campaigns have been turned off entirely. Marketers may see a lot of traffic, but very few sales. That may be a symptom of the problems mentioned above.
If marketers include their campaign names and IDs in UTM codes, those are “in the clear” and can be copied and replayed to make it appear that visits came from those campaigns. More specifically, the bots used for digital ad fraud are tuned to click ads at a rate of between 1% and 9% to give the appearance of performance. The bots can either actually click the ads and come to the site, or they can insert false data into GA to make it appear that it happened. This is usually enough to trick marketers into allocating more budget to those campaigns because they appear to perform so well. Hopefully this answers the “why?” question that marketers might have — why do fraudsters bother messing with my Google Analytics? So you allocate more money to campaigns you run with them.
“Phantom sales” – the appearance of sales in GA
You should be sitting down for this next part. For years, marketers tightened up their digital marketing to reduce waste and risk and increase performance. Some marketers moved away from paying for ad impressions, citing ad fraud risk, and only paid for clicks. But they came to realize that the clicks were faked by bots too. So they shifted away from paying for clicks and moved to paying for performance — leads (cost per lead), installs (cost per install), or sales (affiliate revenue share). But they came to realize leads were easily faked and install fraud and affiliate fraud (i.e. cookie stuffing) also ran rampant. See: How Has Affiliate Fraud Evolved To Rip Performance Marketers Off? and One Of Uber’s Lawsuits Against Ad Fraud Comes Full Circle—They Won.
What performance marketers may not fully grasp yet is that even sales can be faked. No, it doesn’t mean bots actually pay for stuff. This form of fraud is where the perpetrators claim credit for sales that have already happened or would have happened anyway. Many retailers and DTC (direct-to-consumer) brands use a form of digital marketing called remarketing. As opposed to retargeting, which targets ads at users who visited a site before, remarketing campaigns target ads at users who have purchased from a site before. The theory behind it is to get users to buy again, buy more, and buy more frequently. However, there’s a rampant form of fraud hidden in plain sight — remarketing vendors claiming credit for sales that have already happened. How does this happen? They do so by exploiting the loophole in Google Analytics – being able to write false data into GA – described above.
Let’s illustrate this with a concrete example. A consumer who has purchased from macys.com before is likely going to buy from the site again, because they know and like the retailer. In a future visit, they type in macys.com to go to the site. This is called a “direct” visit in Google Analytics. If the user looks at 20 pages and then completes a purchase, this purchase is an “organic” one, meaning the user didn’t see an ad, click on it and make a purchase as a result of it. Remarketing vendors exploit the GA “feature” that allows them to write false data – a fake click that makes it appear that the user came to the site after clicking on one of the vendor’s ads, run on behalf of the retailer (ever wonder why they don’t let you tag the ads themselves?). More specifically, they record which visits result in a purchase and note the session identifier (See: “cid” exfiltration, documented by security researcher Dr. Krzysztof Franaszek). By inserting false clicks into specific sessions that ended in purchases, remarketing vendors can turn the 20-pageview direct visit into a 21-pageview visit that appears to have come from a click on an ad in their remarketing program. The remarketing vendor has thus claimed credit for a sale that had already happened.
Note that similar exploits have been documented at the intersection of influencer and affiliate fraud — influencers insert false data into marketers’ Google Analytics to make it appear they are able to drive lots of traffic; this helps them secure paid sponsorship and affiliate marketing deals. Once secured, influencers use affiliate links to claim credit for driving what would have been organic sales; so the marketer ends up paying twice for sales that would have happened anyway! Sweet, sweet moolah for the influencer, though. And note the huge uptick in affiliate fraud in 2020 as more people are stuck and home, and increased their online shopping dramatically.
What can marketers do if they suspect this is happening to them? Do what Kevin Frisch, Head of Performance Marketing and CRM at Uber, did. He saved Uber millions of dollars when he discovered the pervasive cost-per-install fraud that was ripping off Uber. He paused the ad spending, and the app installs kept happening. Those were organic installs that the mobile exchanges fraudulently claimed credit for, so they could get paid the CPI. In the slide below, the green area is the ad spend. When the spend was paused, notice that the blue line (organic signups) rose to the exact level of the red line (paid signups) before the drop. This shows that the installs that were claimed to have come from paid channels were actually organic installs instead (customers installed the Uber app because they wanted to, not because they saw an ad and clicked on it). The mobile exchanges were falsely claiming credit for organic installs by tricking the attribution reporting. This is equivalent to remarketing vendors claiming credit for sales that have already happened by inserting false data into their own clients’ Google Analytics. This is also why remarketing programs appear to perform many times better than any other form of digital marketing. It only appears to be, because of fraud hidden in plain sight. Do you have the courage to stop this form of fraud ripping off your company?