Author Topic: [Source] pastebinScraper.py (Read 3673 times)

daxda · « **on:** November 05, 2013, 08:53:00 AM »

Just finished another scraper, this time it's for scraping latest created pasties from pastebin.com, Idea taken from joepie91, he's been talking about scraping the site which motivated me coding a script for myself aswell.

The script chooses a random proxy from a defined list, if a connection to the target fails, the proxy will get discarded and when you exit the script the updated proxy list will be saved to file. It connects to pastebin, filters out the latest links to the pasties and fetches the data of those and saves them into files.

Attached to this post is the script with it's dependent files (*.py, Data/Results, Data/Proxies.txt, Data/User-Agents.txt)

As always, this code is free to use and modify, take from it what you want and do with it what you like. Improvements, critique, feedback and so forth are welcome.

Usage: python pastebinScraper.py [ -s <optional sleep time in seconds here>]

[gist]Daxda/7315302[/gist]

imation · « **Reply #1 on:** November 05, 2013, 09:54:01 AM »

nice, looks good

d4rkcat · « **Reply #2 on:** November 05, 2013, 02:24:59 PM »

I've been waiting for something like this, ace!
I'm gonna have to look over this code.

Many Thanks daxda!

Author Topic: [Source] pastebinScraper.py (Read 3673 times)

daxda

[Source] pastebinScraper.py

imation

Re: [Source] pastebinScraper.py

d4rkcat

Re: [Source] pastebinScraper.py