EvilZone

Programming and Scripting => Scripting Languages => Topic started by: DamonX on April 20, 2013, 04:11:25 AM

Title: Split any URL into "host", "path", and "filename" variables (Python)
Post by: DamonX on April 20, 2013, 04:11:25 AM: Hi,

I am working on creating a HTTP download client in Python and need little assistance.

I am getting url from command line argument (./clientprogram www.google.com/images/test.png (http://www.google.com/images/test.png)) and the split that url into host, path, and filename. I am only downloading and displaying images on screen tho.

Here is my lil code:

Code: [Select]
import string import socket import sys import os from subprocess import call from urllib.parse import urlparse # ****************************************** # # (1) Test input arguments to program - correct number provided? # Exit if the required URL is not provided. # (2) Split URL into "host", "path", and "filename" variables. # http://www.google.com/images/srpr/logo3w.png # * host=www.google.com # * path=/images/ # * file=test.png # host=???? # path=???? # filename=???? # port=???? print("Preparing to download object from http://" + host + path + filename) print()
How to do split url. Its easy to do it if url is hardcodes, but not sure it we don't know what URL will be provided by user.

Thanks

Damon
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: relax on April 20, 2013, 06:39:59 AM: count the /
before first / is domain
between first and last are paths
after last is file
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: RedBullAddicted on April 20, 2013, 07:37:26 AM: Code: (python) [Select]
>>> path = "www.google.com/images/test.png" >>> pathparts = path.split('/') >>> for part in pathparts: ... print part ... www.google.com images test.png >>> host = pathparts[0] >>> path = pathparts[1] >>> filename = pathparts[2] >>> print host www.google.com >>> print path images >>> print filename test.png >>>
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: Kulverstukas on April 20, 2013, 08:01:06 AM: You could also see this link for some routines: http://docs.python.org/2/library/os.path.html#module-os.path
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: proxx on April 20, 2013, 08:17:46 AM: Quote from: RedBullAddicted on April 20, 2013, 07:37:26 AM
Code: (python) [Select]
>>> path = "www.google.com/images/test.png" >>> pathparts = path.split('/') >>> for part in pathparts: ... print part ... www.google.com images test.png >>> host = pathparts[0] >>> path = pathparts[1] >>> filename = pathparts[2] >>> print host www.google.com >>> print path images >>> print filename test.png >>>

I had exactly the same thing in mind.
Code: [Select]
url="www.google.nl/images/test.png" for i in url.split("/"): print iOutput:
Code: [Select]
www.google.nl images test.png
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: RedBullAddicted on April 20, 2013, 08:30:17 AM: Exactly :) and the print can be done a bit cleaner this way

Code: (python) [Select]
>>> print("Preparing to download object from http://%s/%s/%s" %(host, path, filename)) Preparing to download object from http://www.google.com/images/test.png
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: Deque on April 20, 2013, 08:44:27 AM: Use urlparse. It takes care for every case you might not think of right now.
Example:

Code: [Select]
from urlparse import urlparse result = urlparse('http://evilzone.org/scripting-languages/split-any-url-into-%27host%27-%27path%27-and-%27filename%27-variables-%28python%29/new/#new') print "scheme", result.scheme print "netloc", result.netloc print "path", result.path print "params", result.params print "query", result.query print "fragment", result.fragment
Output:

Quote
deque@decra:~/Dokumente/python$ python url.py
scheme http
netloc evilzone.org
path /scripting-languages/split-any-url-into-%27host%27-%27path%27-and-%27filename%27-variables-%28python%29/new/
params
query
fragment new

Edit: For Python 3 the name is urllib.parse
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: DamonX on April 20, 2013, 07:18:32 PM: wow ... can't believe how many people replied within short period of time. This is even better than stackoverflow. :) I will try your suggestions and will let u know how it goes.

Thanks all

Damon
Title: Re: Split any URL into "host", "path", and "filename" variables (Python)
Post by: DamonX on April 21, 2013, 10:55:59 PM: Thanks, I had to do lil modification but I was able to do it by also using basename() and dirname().