Just finished another scraper, this time it's for scraping latest created pasties from pastebin.com, Idea taken from joepie91, he's been talking about scraping the site which motivated me coding a script for myself aswell.
The script chooses a random proxy from a defined list, if a connection to the target fails, the proxy will get discarded and when you exit the script the updated proxy list will be saved to file. It connects to pastebin, filters out the latest links to the pasties and fetches the data of those and saves them into files.
Attached to this post is the script with it's dependent files (*.py, Data/Results, Data/Proxies.txt, Data/User-Agents.txt)
As always, this code is free to use and modify, take from it what you want and do with it what you like. Improvements, critique, feedback and so forth are welcome.
Usage: python pastebinScraper.py [ -s <optional sleep time in seconds here>]
[gist]Daxda/7315302[/gist]