This is a website for the Pro Python System Administration book, written by Rytis Sileika and published by Apress. Some of the projects described in the book are avialable on GitHub.
Latest blog post
Quick way to check for URL redirects
13 Mar 2013
Recently I had to go through a bunch of URLs and identify whether or not they redirect to another URL. Since the list was rather long (several hundred entries), I thought I should rather write a script to do this for me.
I used the Requests library to make the initial request for a given URL. If the response code was 3XX
, then I assumed it is being redirected. There are a couple of other edge cases (timeouts and generic connection errors, either due to non-existent domain, or network error) that the script attempts to cover for.
The script accepts a single parameter, which should be a filename containing URL strings. If the filename is not provided, the script will look for domains.txt
in the current directory.
#!/usr/bin/env python
import sys
import requests
def check_for_redirects(url):
try:
r = requests.get(url, allow_redirects=False, timeout=0.5)
if 300 <= r.status_code < 400:
return r.headers['location']
else:
return '[no redirect]'
except requests.exceptions.Timeout:
return '[timeout]'
except requests.exceptions.ConnectionError:
return '[connection error]'
def check_domains(urls):
for url in urls:
url_to_check = url if url.startswith('http') else "http://%s" % url
redirect_url = check_for_redirects(url_to_check)
print("%s => %s" % (url_to_check, redirect_url))
if __name__ == '__main__':
fname = 'domains.txt'
try:
fname = sys.argv[1]
except IndexError:
pass
urls = (l.strip() for l in open(fname).readlines())
check_domains(urls)
Below is an example of URLs. If protocol is not defined, http
will be assumed.
google.com
http://gmail.com
www.ibm.com
redhat.com
example.com
i-believe-this-domain-does-not-exist-123abc.com
And the results:
http://google.com => http://www.google.com/
http://gmail.com => http://mail.google.com/mail/
http://www.ibm.com => http://www.ibm.com/us/en/
http://redhat.com => http://www.redhat.com/
http://example.com => http://www.iana.org/domains/example/
http://i-believe-this-domain-does-not-exist-123abc.com => [connection error]
Jump to Blog Archive →