EvilZone

Programming and Scripting => Scripting Languages => Topic started by: DamonX on April 20, 2013, 04:11:25 AM

Title: Split any URL into "host", "path", and "filename" variables (Python)
Post by: DamonX on April 20, 2013, 04:11:25 AM
Hi,


I am working on creating a HTTP download client in Python and need little assistance.


I am getting url from command line argument (./clientprogram www.google.com/images/test.png (http://www.google.com/images/test.png)) and the split that url into host, path, and filename.  I am only downloading and displaying images on screen tho.


Here is my lil code:



Code: [Select]
import string
import socket
import sys
import os
from subprocess import call
from urllib.parse import urlparse


# ******************************************
#
#  (1) Test input arguments to program - correct number provided?
#      Exit if the required URL is not provided.
#  (2) Split URL into "host", "path", and "filename" variables.
#      http://www.google.com/images/srpr/logo3w.png
#      * host=www.google.com
#      * path=/images/
#      * file=test.png


# host=????
# path=????
# filename=????
# port=????


print("Preparing to download object from http://" + host + path + filename)
print()

How to do split url.  Its easy to do it if url is hardcodes, but not sure it we don't know what URL will be provided by user.


Thanks


Damon
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: relax on April 20, 2013, 06:39:59 AM
count the /
before first / is domain
between first and last are paths
after last is file

Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: RedBullAddicted on April 20, 2013, 07:37:26 AM
Code: (python) [Select]
>>> path = "www.google.com/images/test.png"
>>> pathparts = path.split('/')
>>> for part in pathparts:
...     print part
...
www.google.com
images
test.png
>>> host = pathparts[0]
>>> path = pathparts[1]
>>> filename = pathparts[2]
>>> print host
www.google.com
>>> print path
images
>>> print filename
test.png
>>>
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: Kulverstukas on April 20, 2013, 08:01:06 AM
You could also see this link for some routines: http://docs.python.org/2/library/os.path.html#module-os.path
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: proxx on April 20, 2013, 08:17:46 AM
Code: (python) [Select]
>>> path = "www.google.com/images/test.png"
>>> pathparts = path.split('/')
>>> for part in pathparts:
...     print part
...
www.google.com
images
test.png
>>> host = pathparts[0]
>>> path = pathparts[1]
>>> filename = pathparts[2]
>>> print host
www.google.com
>>> print path
images
>>> print filename
test.png
>>>

I had exactly the same thing in mind.
Code: [Select]
url="www.google.nl/images/test.png"
for i in url.split("/"):
        print i
Output:
Code: [Select]
www.google.nl
images
test.png
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: RedBullAddicted on April 20, 2013, 08:30:17 AM
Exactly :) and the print can be done a bit cleaner this way

Code: (python) [Select]
>>> print("Preparing to download object from http://%s/%s/%s" %(host, path, filename))
Preparing to download object from http://www.google.com/images/test.png
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: Deque on April 20, 2013, 08:44:27 AM
Use urlparse. It takes care for every case you might not think of right now.
Example:

Code: [Select]
from urlparse import urlparse

result = urlparse('http://evilzone.org/scripting-languages/split-any-url-into-%27host%27-%27path%27-and-%27filename%27-variables-%28python%29/new/#new')
print "scheme", result.scheme
print "netloc", result.netloc
print "path", result.path
print "params", result.params
print "query", result.query
print "fragment", result.fragment

Output:

Quote
deque@decra:~/Dokumente/python$ python url.py
scheme http
netloc evilzone.org
path /scripting-languages/split-any-url-into-%27host%27-%27path%27-and-%27filename%27-variables-%28python%29/new/
params
query
fragment new

Edit: For Python 3 the name is urllib.parse
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: DamonX on April 20, 2013, 07:18:32 PM
wow ... can't believe how many people replied within short period of time.  This is even better than stackoverflow.  :)  I will try your suggestions and will let u know how it goes.

Thanks all

Damon
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: DamonX on April 21, 2013, 10:55:59 PM
Thanks, I had to do lil modification but I was able to do it by also using basename() and dirname().