我一直在尝试从URL列表中删除所有utm_ *参数.
我发现的最接近的是: https://gist.github.com/626834.
我发现的最接近的是: https://gist.github.com/626834.
有任何想法吗?
解决方法
它有点长,但使用url *模块,并避免重复.
from urllib import urlencode
from urlparse import urlparse,parse_qs,urlunparse
url = 'http://whatever.com/somepage?utm_one=3&something=4&utm_two=5&utm_blank&something_else'
parsed = urlparse(url)
qd = parse_qs(parsed.query,keep_blank_values=True)
filtered = dict( (k,v) for k,v in qd.iteritems() if not k.startswith('utm_'))
newurl = urlunparse([
parsed.scheme,parsed.netloc,parsed.path,parsed.params,urlencode(filtered,doseq=True),# query string
parsed.fragment
])
print newurl
# 'http://whatever.com/somepage?something=4&something_else'