This forum is in archive mode. You will not be able to post new content.

Author Topic: Python Port of Word List Generator  (Read 2843 times)

0 Members and 1 Guest are viewing this topic.

Offline Matriplex

  • Knight
  • **
  • Posts: 323
  • Cookies: 66
  • Java
    • View Profile
Python Port of Word List Generator
« on: October 04, 2013, 11:18:57 PM »
I decided to learn Python because I was bored last weekend and have actually been really amazed at the power of it! To test my (elementary) skills with Python I decided to try and port the Java Word List Generator found here: http://evilzone.org/tutorials/%28tut%29-create-a-wordlist-generator-%28i-e-for-bruteforcing%29

I think I did alright. The end file size of a 4 char long run was 8.4 megs, however when I pushed it to 5 characters it froze my computer for about 2 minutes and produced a 343 megabyte file...
I don't mean to brag, but I do have a pretty decent computer so I am probably doing something wrong with this :P

Anyways, here is the code:
Code: [Select]
import math
import sys
import os

def generate(am):
    pos = '/home/rotc/Documents/list.txt'
    pathdir = '/home/rotc/Documents'
    print os.listdir(pathdir)
    print "Opening the file at " + pos
    f = open(pos, "w")
    print "File opened."
    wordLength = am;
    alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']
    radix = len(alphabet)
   
    MAX_WORDS = float(math.pow(float(len(alphabet)), float(wordLength)))
    print "Writing..."
    for i in range(0, int(MAX_WORDS)):
        indices = convertToRadix(radix, i, wordLength)
        word = [1] * int(wordLength)
        for k in range(0, int(wordLength)):
            word[k] = alphabet[indices[k]]
           
        #print ''.join(word)
        write(word, f)
    print "Finished writing. \nFlushing..."
    f.flush()
    print "Finished Flushing..."
    print "Closing File..."
    f.close()
    print "File Closed."

def write(word, f):
    f.write(''.join(word) + "\n")

def convertToRadix(radix, number, wordLength):
    result = [1] * int(wordLength)
    for i in reversed(xrange(int(wordLength))):
        if number > 0:
            rest = number % radix
            number /= radix
            result[i] = rest
        else:
            result[i] = 0
    return result

amount = raw_input("How many letters long: ")
generate(amount);

If you see any obvious performance killers let me know please, but for now thanks for reading.
« Last Edit: October 04, 2013, 11:20:30 PM by Matriplex »
\x64\x6F\x75\x65\x76\x65\x6E\x00

Offline Kulverstukas

  • Administrator
  • Zeus
  • *
  • Posts: 6627
  • Cookies: 542
  • Fascist dictator
    • View Profile
    • My blog
Re: Python Port of Word List Generator
« Reply #1 on: October 05, 2013, 06:47:13 AM »
The code to me looks clean. However python itself is not really good with such computations so I would blame python for being slow.

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Re: Python Port of Word List Generator
« Reply #2 on: October 05, 2013, 01:36:38 PM »
The problem with Python: It has immutable strings (like Java). Note that I used a char array to produce the words in the Java code. You are using strings and do a lot of string operations, like this one:

word = [1] * int(wordLength) --> this isn't necessary btw.

Immutability means, you can't change a string object. So everytime you do a string operation, a new object is created, which is costly.

I didn't test your code, so I can't say this is the reason for sure. But you might test my assumption by going a different direction with your code.

Look here for for more information about string operations and their performance in Python: http://www.skymind.com/~ocrow/python_string/
https://wiki.python.org/moin/PythonSpeed/PerformanceTips

I also did a port of my code to python, but never tested it for performance. You might try this one as well:

Code: [Select]
def convert_to_radix(number, wordlength, radix):
    indices = []
    for i in xrange(wordlength):
        if number > 0:
            rest = number % radix
            number /=  radix
            indices.append(rest)
        else:
            indices.append(0)
    return indices

def word_gen(alphabet, wordlength):
    MAXWORDS = len(alphabet) ** wordlength
    RADIX = len(alphabet)
    for k in xrange(MAXWORDS):
        indices = convert_to_radix(k, wordlength, RADIX)
        word = [alphabet[indices[i]] for i in xrange(wordlength)]
        yield word

Example usage:

Code: [Select]
alphabet = ['a', 'b', 'c']
for word in word_gen(alphabet, 4):
  print ''.join(word)

Also: did you verify that your code produces correct results?
« Last Edit: October 05, 2013, 01:45:34 PM by Deque »

Offline Matriplex

  • Knight
  • **
  • Posts: 323
  • Cookies: 66
  • Java
    • View Profile
Re: Python Port of Word List Generator
« Reply #3 on: October 05, 2013, 05:58:55 PM »
Oh that makes sense Deque! Well nuts, I thought that it wouldn't create a new object. I'll check out those links and see if I can get performance any better.
And yes, the code produces perfect results :). After coding this I researched the performance issue a little more and concluded that, as you said, Java or C++ would be better for this because they can handle memory better since they are a compiled language.
Thanks!
\x64\x6F\x75\x65\x76\x65\x6E\x00

Offline ande

  • Owner
  • Titan
  • *
  • Posts: 2664
  • Cookies: 256
    • View Profile
Re: Python Port of Word List Generator
« Reply #4 on: October 05, 2013, 07:17:56 PM »
The big performance killer here is not memory management. The slowest part here is the to-disk writing. You should consider buffering to memory and write less often. Like let it buffer 100k words and then write to file. That way you save a lot of IO traffic.
« Last Edit: October 05, 2013, 07:18:19 PM by ande »
if($statement) { unless(!$statement) { // Very sure } }
https://evilzone.org/?hack=true

Offline madara

  • Serf
  • *
  • Posts: 23
  • Cookies: 3
    • View Profile
Re: Python Port of Word List Generator
« Reply #5 on: October 08, 2013, 11:35:33 PM »
Python is Python because isn't  php or  java....Please..


Code: [Select]
#Python3


def product(*args, **karg):
    pools    = list(map(tuple, args)) * karg.get('repeat', 1) #list because in Python3 map return a gen.
    acc      = [[]]
    for pool in pools:
        acc  = (x+[y] for x in acc for y in pool) #generator expr.  save memory
    for prod in acc:
        yield tuple(prod)


for x in product(['a', 'b', 'c'], repeat=5):
    print(''.join(x))





or simply use plug and play *product* iterator from *itertools*; look at official documentation
« Last Edit: October 10, 2013, 05:33:23 PM by madara »

Offline madara

  • Serf
  • *
  • Posts: 23
  • Cookies: 3
    • View Profile
Re: Python Port of Word List Generator
« Reply #6 on: October 09, 2013, 12:19:19 PM »
The code to me looks clean. However python itself is not really good with such computations so I would blame python for being slow.


Mr. Admin  Kulverstukas
ok You deleted my last post, because does not have a pleasing my comment on your post..(is this  your rules of behavior  for *create a forum worthy of being visited?????*...MEMBERS I LEAVE YOU A RESPONSE...)...
congratulations but  this isn't  the way!!!
comparison between people..that's the way.


Eventually it's ok no problem, but Please deleted also your post because you do not know what you're talking...and GIVE  INCORRECT INFORMATION;
 otherwise You have to justify what You said...we are waiting
ps. sorry to all for my bad english...uhhhh  ;)


Regards


madara

Offline ande

  • Owner
  • Titan
  • *
  • Posts: 2664
  • Cookies: 256
    • View Profile
Re: Python Port of Word List Generator
« Reply #7 on: October 09, 2013, 05:42:53 PM »

Mr. Admin  Kulverstukas
ok You deleted my last post, because does not have a pleasing my comment on your post..(is this  your rules of behavior  for *create a forum worthy of being visited?????*...MEMBERS I LEAVE YOU A RESPONSE...)...
congratulations but  this isn't  the way!!!
comparison between people..that's the way.


Eventually it's ok no problem, but Please deleted also your post because you do not know what you're talking...and GIVE  INCORRECT INFORMATION;
 otherwise You have to justify what You said...we are waiting
ps. sorry to all for my bad english...uhhhh  ;)


Regards


madara

Actually I deleted your post. It only said "PLEASE......", which is completely useless. On top of that it was a double post. Get your own information and structure right before you bash on someone else's.
if($statement) { unless(!$statement) { // Very sure } }
https://evilzone.org/?hack=true

Offline Phage

  • VIP
  • Overlord
  • *
  • Posts: 1280
  • Cookies: 120
    • View Profile
Re: Python Port of Word List Generator
« Reply #8 on: October 09, 2013, 08:20:07 PM »
Second, Kulvertukas is actually right. Python is known (it's a fact) for being slower than other languages, you can't deny that!
"Ruby devs do, in fact, get all the girls. No girl wants a python, but EVERY girl wants rubies" - connection

"It always takes longer than you expect, even when you take into account Hofstadter’s Law."

Offline Kulverstukas

  • Administrator
  • Zeus
  • *
  • Posts: 6627
  • Cookies: 542
  • Fascist dictator
    • View Profile
    • My blog
Re: Python Port of Word List Generator
« Reply #9 on: October 09, 2013, 08:40:47 PM »
Why is it that I am always the one to blame when posts get deleted? :P if they are deleted, there was a reason for it. Deal with it or gtfo, honestly. Staff does not remove posts "just because...".

@madara: my facts are straight, while your post is curved...

Offline Phage

  • VIP
  • Overlord
  • *
  • Posts: 1280
  • Cookies: 120
    • View Profile
Re: Python Port of Word List Generator
« Reply #10 on: October 09, 2013, 09:07:17 PM »
It's probably because you are the only admin that post regularly on the boards. 
"Ruby devs do, in fact, get all the girls. No girl wants a python, but EVERY girl wants rubies" - connection

"It always takes longer than you expect, even when you take into account Hofstadter’s Law."

Offline madara

  • Serf
  • *
  • Posts: 23
  • Cookies: 3
    • View Profile
Re: Python Port of Word List Generator
« Reply #11 on: October 09, 2013, 09:56:03 PM »

 Actually I deleted your post. It only said "PLEASE......", which is completely useless.
 

my post was an implicit  *comment* (too complicated to understand?..)
""get your own informaton before....ecc..."""
ok, i'm agree  --> sorry to Kulverstukas for that...but Kulverstukas...
when you say "my post ar right and your curved"...You confirm that you do not know the subject..Please learn before comment

 Second, Kulvertukas is actually right. Python is known (it's a fact) for being slower than other languages, you can't deny that!
Phage obviously Python is slower then a compiled language, such C ,, CLisp & company but also C/C++ are slower then Assembly, or CLR…then you prefer write your SW in pure machine language?..perhaps for a rootkit…in this case an assembly C  inline only to access specific CPU registers…but only small chunks.

eventually in this case   when You use *itertools*'s  methods  for simple or complex iteration, speed will not be a your problem…make your tests guy ;D
Thanks to all!
« Last Edit: October 09, 2013, 09:59:28 PM by madara »

Offline Phage

  • VIP
  • Overlord
  • *
  • Posts: 1280
  • Cookies: 120
    • View Profile
Re: Python Port of Word List Generator
« Reply #12 on: October 09, 2013, 10:03:02 PM »
You clearly have a problem with respect for other people and authorities. My tests? Tests on what? I know Python pretty well, I know up's and down's about it. But your types of words makes me second your knowledge? You act like it's a war between you and the rest of the community. It's not. Chill out bro! I should lead you to the ganja shop!
"Ruby devs do, in fact, get all the girls. No girl wants a python, but EVERY girl wants rubies" - connection

"It always takes longer than you expect, even when you take into account Hofstadter’s Law."

Offline madara

  • Serf
  • *
  • Posts: 23
  • Cookies: 3
    • View Profile
Re: Python Port of Word List Generator
« Reply #13 on: October 09, 2013, 10:24:05 PM »
I should lead you to the ganja shop!
can be an idea  ;D


ok, I have my ways that may seem uncomfortable but I  always say what I think but this is not a lack of respect...
remeber that I do not judge people, but what they say...it's very different...


we are here to learn not for the glory and touchiness does not allow the evolution , is it? ;)

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Re: Python Port of Word List Generator
« Reply #14 on: October 10, 2013, 01:36:37 PM »
Madara: I see a lot of good stuff in your posts, but I feel you fail to explain them, which makes them seem pretty much respectless or useless or both.

I.e. you bash at Kulvers post, but give no arguments why it is wrong in your eyes.
You say "Please, Python is not Java", but don't explain anything. Be so kind and explain your thoughts to others and people will see it as a chance to either agree with you and get better by learning from your posts or they will disagree and may give you the opportunity to learn by their own arguments.

I think you miss the point with advising itertools. The original question was about why the program is slow, not how to implement the same functionality in another way. Yes, it is shorter if you use this, but no, the TO won't learn why there is a problem in his code if you just smash your own code at him without any explanation (other than "Python is not [insert your language here]").

Furthermore:
Python is compiled, at least in its probably most used implementation which is CPython. Even if you type python myprogram.py it will first be compiled to bytecode and this bytecode is interpreted afterwards (in case of CPython).
For Java it is the same, but there is more optimization going on and it uses JIT (compilation to native code at runtime). That makes a real boost in performance and can in some cases outperform a purley compiled language implementation, because it is able to adapt the compilation at runtime as well based on the things that happened in the past. As CPython doesn't use JIT by now, it is indeed slow.

But in most cases not that slow that it matters and in the case of Matriplex' code there is another underlying problem (I believe ande got his finger on the right spot).

Quote
when you say "my post ar right and your curved"...You confirm that you do not know the subject..Please learn before comment

Kulver didn't say he was right, he said his comment is straight (to the point), while yours isn't and I have to agree with him.

Quote
You confirm that you do not know the subject..Please learn before comment

So far you didn't confirm either that you know the subject. Do you see why this is respectless?

Quote
Phage obviously Python is slower then a compiled language, such C ,, CLisp & company but also C/C++ are slower then Assembly, or CLR…then you prefer write your SW in pure machine language?..perhaps for a rootkit…in this case an assembly C  inline only to access specific CPU registers…but only small chunks.

eventually in this case   when You use *itertools*'s  methods  for simple or complex iteration, speed will not be a your problem…make your tests guy

So, your argument is: Yes, it is slower, but no, it is not slower.
Maybe you meant it right, but the way you write it *facepalm*
itertools is written in C. Testing it is no proof that Python is fast.
« Last Edit: October 10, 2013, 01:37:23 PM by Deque »

 



Want to be here? Contact Ande, Factionwars or Kulverstukas on the forum or at IRC.