Mobile app version of vmapp.org
Login or Join
Marchetta884

: Is there a set of well-known tracking parameters besides utm_*? I am collecting a large number of URLs. I am not responsible for the websites in question, and I want to remove tracking parameters

@Marchetta884

Posted in: #QueryString #UtmParameters

I am collecting a large number of URLs. I am not responsible for the websites in question, and I want to remove tracking parameters that do not affect the content of the website. With the tracking parameters, it's impossible to identify two URLs that should be considered equal.

For example, if I have the following links:

example.com/blog/post1?utm_xyz=1234 http://example.com/blog/post1?utm_xyz=5678 example.net/viewblog?post_id=2&utm_xyz=9999

I want to convert to the equivalent canonical-type URLs:

example.com/blog/post1 http://example.com/blog/post1 example.net/viewblog?post_id=2

The first two are for the same content, but have different tracking parameters. The last example illustrates why I can't just remove all query parameters.

The most common of these are the utm_ ones, but I have also found:


Piwik: pk_campaign and pk_kwd
WebTrends: WT.nav, WT.mc_id
unknown, maybe Apple: campaign_id
Wikimedia: wprov
HootSuite: hootPostID


Is there a well-known list of these query parameters that I can safely remove?

(I am using the canonical URLs where they are supplied in the HTML metadata, but I want to use this approach when none is supplied.)

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Marchetta884

1 Comments

Sorted by latest first Latest Oldest Best

 

@Annie201

I guess your intention is to clean the scrapped URLs.

You can refer to articles on best practices of using UTM. Commonly used keywords for utm_medium are based on the naming conventions used in Google Analytics such as, social, referral, email.

At the end of the day there is no good way if you’re doing this based on a fixed list of keywords. Because the parameters can be anything.

You will have a better chance of sanitising your results by using regex to detect and remove any UTM parameters.

For a URL like example.com?utm_source=facebook&utm_medium=social&utm_campaign=book-launch-2014 you need to search and replace the parameters with nothing:


utm_source
utm_medium
utm_campaign
utm_term
utm_content

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme