EvilZone
Programming and Scripting => Scripting Languages => Topic started by: DamonX on April 20, 2013, 04:11:25 AM
-
Hi,
I am working on creating a HTTP download client in Python and need little assistance.
I am getting url from command line argument (./clientprogram www.google.com/images/test.png (http://www.google.com/images/test.png)) and the split that url into host, path, and filename. I am only downloading and displaying images on screen tho.
Here is my lil code:
import string
import socket
import sys
import os
from subprocess import call
from urllib.parse import urlparse
# ******************************************
#
# (1) Test input arguments to program - correct number provided?
# Exit if the required URL is not provided.
# (2) Split URL into "host", "path", and "filename" variables.
# http://www.google.com/images/srpr/logo3w.png
# * host=www.google.com
# * path=/images/
# * file=test.png
# host=????
# path=????
# filename=????
# port=????
print("Preparing to download object from http://" + host + path + filename)
print()
How to do split url. Its easy to do it if url is hardcodes, but not sure it we don't know what URL will be provided by user.
Thanks
Damon
-
count the /
before first / is domain
between first and last are paths
after last is file
-
>>> path = "www.google.com/images/test.png"
>>> pathparts = path.split('/')
>>> for part in pathparts:
... print part
...
www.google.com
images
test.png
>>> host = pathparts[0]
>>> path = pathparts[1]
>>> filename = pathparts[2]
>>> print host
www.google.com
>>> print path
images
>>> print filename
test.png
>>>
-
You could also see this link for some routines: http://docs.python.org/2/library/os.path.html#module-os.path
-
>>> path = "www.google.com/images/test.png"
>>> pathparts = path.split('/')
>>> for part in pathparts:
... print part
...
www.google.com
images
test.png
>>> host = pathparts[0]
>>> path = pathparts[1]
>>> filename = pathparts[2]
>>> print host
www.google.com
>>> print path
images
>>> print filename
test.png
>>>
I had exactly the same thing in mind.
url="www.google.nl/images/test.png"
for i in url.split("/"):
print i
Output:
www.google.nl
images
test.png
-
Exactly :) and the print can be done a bit cleaner this way
>>> print("Preparing to download object from http://%s/%s/%s" %(host, path, filename))
Preparing to download object from http://www.google.com/images/test.png
-
Use urlparse. It takes care for every case you might not think of right now.
Example:
from urlparse import urlparse
result = urlparse('http://evilzone.org/scripting-languages/split-any-url-into-%27host%27-%27path%27-and-%27filename%27-variables-%28python%29/new/#new')
print "scheme", result.scheme
print "netloc", result.netloc
print "path", result.path
print "params", result.params
print "query", result.query
print "fragment", result.fragment
Output:
deque@decra:~/Dokumente/python$ python url.py
scheme http
netloc evilzone.org
path /scripting-languages/split-any-url-into-%27host%27-%27path%27-and-%27filename%27-variables-%28python%29/new/
params
query
fragment new
Edit: For Python 3 the name is urllib.parse
-
wow ... can't believe how many people replied within short period of time. This is even better than stackoverflow. :) I will try your suggestions and will let u know how it goes.
Thanks all
Damon
-
Thanks, I had to do lil modification but I was able to do it by also using basename() and dirname().