This forum is in archive mode. You will not be able to post new content.

Author Topic: Need to Extract Data from a Website - Noobie here  (Read 342 times)

0 Members and 3 Guests are viewing this topic.

Offline JM35

  • NULL
  • Posts: 1
  • Cookies: 0
    • View Profile
Need to Extract Data from a Website - Noobie here
« on: January 29, 2015, 10:49:55 PM »
Hello,

So basically what I'm looking for I wouldnt consider hacking at all, but I figure the knowledge on here could probably help me out with a solution.

So I run an online store, car parts to be specific. We have a supplier that has a massive online catalog and we basically would like to get all the product data they have for a specific make of vehicles.

The way the catalog works is you pick your make of vehicle, then pick the year, then select the model. From there you select a category of parts, which takes you to a subcategory where you select the specific part that is being searched for. Once you select that it takes you to a list of the parts they have: sku, brand, price, picture, what all vehicles it fits, etc..

This is the information we are trying to get from their catalog, basically a list of all the parts we have access to listing on our website. We just need a way to extract it so that we dont have to manually upload the products to our site one by one.

Any ideas how this would be done?

Thanks

Offline Kulverstukas

  • Administrator
  • Zeus
  • *
  • Posts: 6627
  • Cookies: 542
  • Fascist dictator
    • View Profile
    • My blog
Re: Need to Extract Data from a Website - Noobie here
« Reply #1 on: January 30, 2015, 09:09:28 AM »
First you need to figure out how do the requests get sent - GET or POST. Either way you need to also get the same parameters from their dropdowns and then basically it's just a matter of crafting the request... oh and well, parsing the HTML - I recommend using a dedicated lib for that, if with python use beautifulsoup - only use regex for very simple things.

I imagine you want to do it like this - you select some car model from your site and you get the data from another site with the parameters you selected?

 



Want to be here? Contact Ande, Factionwars or Kulverstukas on the forum or at IRC.