Show Posts

Scripting Languages / [Python] Manga downloader

« on: March 19, 2014, 09:24:09 PM »

This can download a manga (one at a time) from mangareader.net for offline reading.

Code: (python) [Select]

#!/usr/bin/python
import urllib2
from re import compile, findall
from sys import argv
from os import mkdir

chapter_exp = compile('(?<=href=")/(?:\d+-)+\d+/[\w-]+/chapter-\d+\.html')	# extract the URLs for each chapter
page_exp = compile('(?<=value=")/(?:\d+-)+\d+/[\w-]+/chapter-\d+\.html')	# the URLs for each page in the chapter
img_exp = compile('(?<=src=")https?://i\d*\.mangareader\.net/.*\d+\.jpg')	# the image for the page

chapter_number = page_number = 1

def get_matches(url, exp):
	opener = urllib2.build_opener()
	opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20100101 Firefox/20.0')]
	matches = findall(exp, opener.open(url).read())
	for match in matches:
		while matches.count(match) != 1:
			matches.remove(match)
	return matches

def fetch_image(url):
	opener = urllib2.build_opener()
	opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20100101 Firefox/20.0')]
	return opener.open(url).read()

def usage():
	print "Usage: mangadownloader.py [options]"
	print "Options: \n\n"

	print "-starturl=http://www.mangareader.net/311/bloody-monday.html \t download the manga for which the table of contents is at the given mangareader URL (Bloody Monday in this case)\n"
	print "-outputpath=/path/to/output/directory \t Write the manga into this directory\n"
	print "-subdir=BloodyMonday \t In the output path listed above, create a subdirectory titled 'BloodyMonday' to put the manga into (optional)\n"
	print "-mergechapters \t Merge the pages of all chapters into one chapter (optional)\n"
	print "-help or -h \t Display this message"

	exit()

if "-help" in argv or "-h" in argv:
	usage()

arguments = {}

for arg in argv:

	if arg.startswith("-starturl="):
		arguments["starturl"] = arg[len("-starturl="):]

	elif arg.startswith("-outputpath="):
		arguments["outputpath"] = arg[len("-outputpath="):]

	elif arg.startswith("-subdir="):
		arguments["subdir"] = arg[len("-subdir="):]


for required_argument in ["starturl", "outputpath"]:
	if not arguments.has_key(required_argument):
		usage()

if "-mergechapters" in argv:
	arguments["mergechapters"] = True
else:
	arguments["mergechapters"] = False

if arguments.has_key("subdir"):
	arguments["outputpath"] += "/" + arguments["subdir"]
	try:
		mkdir(arguments["outputpath"])
	except:	# directory already exists
		pass

chapters = get_matches(arguments["starturl"], chapter_exp)
for chapter in range(len(chapters)):
	chapters[chapter] = "http://www.mangareader.net" + chapters[chapter]	# convert to absolute URLs

for chapter in chapters:

	if not arguments["mergechapters"]:
		path = arguments["outputpath"] + "/" + str(chapter_number)
		try:
			mkdir(path)
		except:
			pass
		chapter_number += 1
	else:
		path = arguments["outputpath"]


	pages = get_matches(chapter, page_exp)
	for page in pages:
		image = fetch_image(get_matches("http://www.mangareader.net" + page, img_exp)[0])
		open(path + "/" + str(page_number) + ".jpg", "wb").write(image)
		page_number += 1
	if not arguments["mergechapters"]:
		page_number = 1

raw_input("Done. Press [enter] to quit: ")

Usage:

-starturl=http://www.mangareader.net/326/code-geass-lelouch-of-the-rebellion.html (This downloads the manga which is at the specified URL - Code Geass in this case. Obviously it must be a URL at mangareader.net.)

-outputpath=C:/Users/Me/path/to/output/directory (Specify the directory the manga chapters and pages are to be written into.)

-subdir=CodeGeass (Creates a subdirectory with the specified name - again, "CodeGess" in this case - in the directory specified in the option above, and the manga will go into it. This is optional.)

-mergechapters (Rather than organizing the manga into chapters, all the pages will be written into same directory. This is obviously optional, but not recommended by me, and I don't see why anyone would want to do this for anything but the shortest of mangas.)

The chapters will be numbered from 1 to [insert the number of chapters the given manga has, here] and the pages in each chapter will be numbered from 1 to [insert the number of pages the given chapter has, here].

Messages - bonewagon

Scripting Languages / [Python] Manga downloader

Scripting Languages / [Python] Google information harvesting program