/*
* @(#)DisassemblingJavaApplications.java 1.00 29/12/2009
*
* This tutorial is the confidential and proprietary information of nobody.
* You shall not disclose such Confidential Information and shall use it only in
* accordance with the terms of the license agreement you entered into
* with nobody.
*/
package java.disassemble;
import java.io.*;
/**
* How to disassemble Java Applications.
* (Apologies for the colors)
*
* @version 1.00 29 Dec 2009
* @author xor
*/
1. Introduction 1.1 What is Java? 1.2 Why disassemble Java Applications?2. Requirements 2.1 Java Development Kit (JDK) 2.2 Java API Web Reference 2.3 Java VM Instruction Set Reference (OP codes)3. Let's get started! 3.1 HelloWorld.java 3.1.1 Source Code 3.1.2 Compiling 3.1.3 Executing && Output 3.2 Disassembling .class files 3.2.1 javap - the easy part 3.2.2 javap - the laborious part 3.2.3 The bytecode - lets break it down 3.2.3.1 The bytecode - create the .java file 3.2.3.2 The bytecode - create the class declaration 3.2.3.3 The bytecode - create the method declarations 3.2.3.4 The bytecode - recreate the methods code 3.2.3.4.1 The bytecode - recreate :: HelloWorld() 3.2.3.4.2 The bytecode - recreate :: main(String[]) 3.2.3.5 The source code - reconstructed 3.3 Things to look out for! 3.3.1 File Extensions 3.3.2 Naming Conventions 3.4 Summary 3.5 Notes, addendum's, comments
1. Introduction1.1 What is Java?Rather than reinvent the wheel, please refer to the following
thread by
Deathspirit for a good overview of Java and the installation of the JDK / JRE.
1.2 Why disassemble Java Applications?So why should we disassemble Java Applications? If you're anything like a few people I know, then you're curious just to see how some things work, you may see a closed source Java applet on some website and you just have to know how it works, or you just want to copy / modify it - this curiosity will lead you here, and this tutorial will help you get your feet on the ground and your head deep into disassembling your first Java Application.
Reverse engineering is defined as the
"process of discovering the technological principles of a device, object or system through analysis of its structure, function and operation", in my own mind reverse engineering is not only a skill, it is an art, it's something which takes time and produces beautiful results.
The art of reverse engineering will not only help you to understand how something works, but it will also enable you to envision ways in which they can be exploited, or changed to better suit your needs.
2. Requirements2.1 Java Development Kit (JDK)If you have followed the tutorial by
Deathspirit, you should have this part done by now, if not, please refer to the thread link in the Introduction as to how to do this.
The JDK (Java Development Kit) is not essential if you are going to disassemble the code manually using a mnemonics table, but it will help you greatly if you wish you do it a little quicker. You will also benefit from being able to use the other tools provided with the development kit such as javac, giving you the ability to compile your own Java based applications for testing.
2.2 Java API Web ReferenceThe API (
Application Programming Interface), while not entirely necessary, will provide you with all you will ever need to know about the implementation of the Java class structures and the methods that go with them.
Using the relevant web reference to your version is highly recommended and can be selected from the
Java SE (Second Edition) API Reference page on the sun website.
2.3 Java VM Instruction Set Reference (OP codes)The web reference to the
Java Virtual Machine Instruction Set provides an A-Z (or at least an A B C D F G I J L M N P R S T W), reference to all of the OP Code instructions that are understood by the Java Virtual Machine.
This reference is definitely essential if we're going to understand the output later on from javap.
For those of you (like
connection), who are hardcore and wish to do this without the JDK, the following list of
Opcode Mnemonics by Opcode will be extremely helpful.
3. Let's get started!Today I am going to provide a very simple example of how reverse engineering can be applied to Java applications and how we can take them to pieces to understand how they work.
We are going to start with the stereotypical "Hello World!" example which plauges the programming profession, and then continue on to explain both how to compile and disassemble the code, but before we get to that you first need to understand the tools that you're going to be using. I am going to provide you with the generic syntax I use for each command and explain why, but please feel free to tack on -help to any of the binaries to further understand the command line parameters available to you.
Go ahead! Fire up the ole terminal, let's do this!
3.1 HelloWorld.java3.1.1 Source CodeGo ahead and save the following code into a file called
HelloWorld.java. As you can see it's pretty much the most simplistic and useless code you're ever going to get, which is actually useful to us in this example! YAY!
public class HelloWorld
{
public static void main(String[] args)
{
System.out.println("Hello World!");
}
}
3.1.2 CompilingNow that you have the code saved into the file
HelloWorld.java, use your terminal or favorite IDE (
Integrated Development Environment) to compile your code. If you are using a terminal, please refer to the syntax below on how to compile.
(How to use a terminal / command line application is not in the scope of this tutorial, please come back when you've learned how to do that)
$ javac -d . *.java
$ javac -d . HelloWorld.java
Either of the syntax above will achieve the same result, if you only wish to compile the HelloWorld.java file then the second example is perfect. Be careful when using the first example however, as the wildcard * will be evaluated to every single .java file in the directory you're currently in.
The reason why I have the additional
-d . is so that each .class file, is generated into the relevant directory. If our application was part of a package and the -d . option was not specified, the .class file will get generated into the current folder. On the other hand if we use the -d . option, the class files will get created in the corresponding directories relating to the package name.
Example 1:File: HelloWorld.java
Package: hello;
Compile: javac HelloWorld.java
Created:
[FILE] -> HelloWorld.class
Example 2:File: HelloWorld.java
Package: hello;
Compile: javac -d . HelloWorld.java
Created:
[FOLD] -> hello/
[FILE] -> hello/HelloWorld.class
3.1.3 Executing && OutputNow that we have our class file compiled, it's time to see this bad boy in action! Go ahead and execute it
$ java HelloWorld
Hello World!
HOLY SHIT THAT WAS AWESOME!!! But WTF?!!! This is someone elses closed-source proprietary application and I want to know how it works!!! ... Well so do I, so let's disassemble that sucker!
3.2 Disassembling .class filesThis is it!! This is the bit we've all been waiting for, we've either read (or scrolled) our way through all that other bullshit and we're
finally going to learn how to disassemble a Java .class file!! woop! Here comes the easy part:
3.2.1 javap - the easy partThe easy part is running the command... Luckily for us, sun loves us so much that they actually provided their own tool to enable us to disassemble the .class files!! Aren't they nice
.
So at the moment, we've saved our code into
HelloWorld.java, we've compiled it and made one very lovely
HelloWorld.class file. What we have to remember now, is that just like when we are executing out Java Applications, we must omit the .class extension like below.
$ javap -c -private HelloWorld
The -help switch will show us that -c means to disassemble the .class file, and that -private will show us all classes and members inside that .class file, this becomes very useful when decompiling applications that have private members (you don't want to miss out any code when re-creating this do you?)
Have you pressed enter yet???
3.2.2 javap - the laborious part:O.... did you see that output!!! WHAT DOES IT ALL MEAN?!! (wee's pants in excitement)
What you actually just saw, is a breakdown of your applications methods and the bytecode associated with them! Hopefully you have something like the following on your screen.
Compiled from "HelloWorld.java"
public class HelloWorld extends java.lang.Object{
public HelloWorld();
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>")V
4: return
public static void main(java.lang.String[]);
Code:
0: getstatic #2; //Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3; //String Hello World!
5: invokevirtual #4; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
}
3.2.3 The bytecode - lets break it downWhat we're going to do now is recreate our source file line by line from the bytecode that we just got handed. It's always useful at this point to have the bytecode reference open in a side window available so we can quickly look up what each opcode does.
3.2.3.1 The bytecode - create the .java fileThe first line we see is
Compiled from "HelloWorld.java", this makes it a relatively easy task to know what to call the class file
, so go ahead, in a separate directory create a file called
HelloWorld.java.
3.2.3.2 The bytecode - create the class declarationpublic class HelloWorld extends java.lang.Object{
As we can see here, in comparison to our original source file, the only difference we will actually notice is
extends java.lang.Object.
The reason why we're seeing this is that by default all Java classes are subclasses of (extend) Object - being an object oriented language, this makes sense. If you wish to know more about (
Object Orientation), there are plenty of articles online to get you started.
So from that line we have deduced our class declaration should be as follows.
public class HelloWorld
{
}
Now onto the methods!
3.2.3.3 The bytecode - create the method declarationspublic HelloWorld();
public static void main(java.lang.String[]);
Now hold on one cotton picking minute! Why the HELL are there two methods?!! We only had one method in our original source file... my brain is going to ESPLODE!!!!!!
Calm down, calm down!! It's actually rather simple.
Each class file (object) has what is called a Constructor, and it needs a Constructor to be defined a legitimate object. In the cases where you don't provide one, the Java compiler is actually nice enough to go and put one in there for you! awwww, nice Java compiler. ^_^
This is effectively the same as going back to our original source code and adding in the following blank constructor.
public HelloWorld() {}
It doesn't actually do anything, but if you compile your new source, it will turn into exactly the same .class file we already have!
*phew* - wipes head. Nothing serious!
So from these declarations we now have the following source code, pretty much copy / pasted verbatim.
public class HelloWorld
{
public HelloWorld();
public static void main(java.lang.String[]);
}
If you tried to compile this right now, you would actually get the following error:
HelloWorld.java:4: <identifier> expected
public static void main(java.lang.String[]);
^
1 error
Again, this is nothing to worry about, this error is basically saying that you have defined a type (java.lang.String[]) but you haven't give it a name!! (identifier). The identifiers are like our little pets, for each one that we make, we're going to be nice enough to give them a name. As you can see from our original source code we called this specific String[] array args. ^_^ !! What we're also going to do, is add in a blank method body like when we did our blank constructor earlier, by adding {} onto the end.
public class HelloWorld
{
public HelloWorld() {};
public static void main(java.lang.String[] args) {};
}
Go ahead! Give it a try. This baby will actually compile
. wooooo, look at that.
We've basically recreated half of our source file already and we haven't even looked at any opcodes yet!
Note: The semi-colons at the end of the method declarations are not required. We have just kept them there to clearly define the end of our statements; << see
3.2.3.4 The bytecode - recreate the methods codeNow comes the laborious part, we've defined the structure of our application, we have the shell, now we need to pad it out with some meat.
Here is an excerpt taken from "
http://www.ibm.com/developerworks/ibm/library/it-haggar_bytecode/" to help you better understand the byte code.
To understand the details of the bytecode, we need to discuss how a Java Virtual Machine (JVM) works regarding the execution of the bytecode. A JVM is a stack-based machine. Each thread has a JVM stack which stores frames. A frame is created each time a method is invoked, and consists of an operand stack, an array of local variables, and a reference to the runtime constant pool of the class of the current method.
The array of local variables, also called the local variable table, contains the parameters of the method and is also used to hold the values of the local variables. The parameters are stored first, beginning at index 0. If the frame is for a constructor or an instance method, the reference is stored at location 0. Then location 1 contains the first formal parameter, location 2 the second, and so on. For a static method, the first formal method parameter is stored in location 0, the second in location 1, and so on.
The size of the array of local variables is determined at compile time and is dependent on the number and size of local variables and formal method parameters. The operand stack is a LIFO stack used to push and pop values. Its size is also determined at compile time. Certain opcode instructions push values onto the operand stack; others take operands from the stack, manipulate them, and push the result. The operand stack is also used to receive return values from methods.
The bytecode for this method consists of three opcode instructions. The first opcode, aload_0, pushes the value from index 0 of the local variable table onto the operand stack. Earlier, it was mentioned that the local variable table is used to pass parameters to methods. The this reference is always stored at location 0 of the local variable table for constructors and instance methods. The this reference must be pushed because the method is accessing the instance data, name, of the class.
So after reading (I skipped most of it) that interesting text, it will help you when we recreate the methods.
3.2.3.4.1 The bytecode - recreate :: HelloWorld()public HelloWorld();
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>")V
4: return
o.O WTF IS THAT! The method was blank, why does it have those... THINGS in it?
aload - Load reference from local variable
invokespecial - Invoke instance method; special handling for superclass, private, and instance initialization method invocations
return - Pretty self explanatory.
public HelloWorld();
Code:
0: aload_0 // Push the object reference(this) at index
// 0 of the local variable table.
1: invokespecial #1;... // Call the superclass (our parents (Object)) method.
// Basically, our blank constructor is calling the constructor for its
// parent class Object. (Remember we all inherit from that parent?)
4: return // And we leave.
public HelloWorld()
{
super();
// basically we do nothing xD if you wanted to recreate it exactly, you could add the super(); line in
// super basically calls the parents method declaration for the current method we're in.
// So the JVM goes up to the Object() Constructor and creates us a nice little Object.
}
So at the moment, what we've basically got is a class file which when executed will make itself into an Object to be used and abused.
3.2.3.4.2 The bytecode - recreate :: main(String[])public static void main(java.lang.String[]);
Code:
0: getstatic #2; //Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3; //String Hello World!
5: invokevirtual #4; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
getstatic - Get static field from class
ldc - Push item from runtime constant pool
invokevirtual - Invoke instance method; dispatch based on class
public static void main(java.lang.String[]);
Code:
// We're getting the static variable "out" from the System object,
// we can see from the stuff on the right that it's a PrintStream.
// Remember, at the moment we're just getting a reference to the "out" variable,
// so that we can use it later.
0: getstatic #2; //Field java/lang/System.out:Ljava/io/PrintStream;
// At compile time, all out strings get turned into constants and saved in the runtime
// constant pool. This way, if we use that string more than once, it will conserve memory.
// So we're loading the string "Hello World!"
3: ldc #3; //String Hello World!
// Now that we've loaded a reference to System.out and we have loaded in our
// String. We want to "invoke" a virtual method. Method #4 as we can see on the right
// is called println() and is expecting a String variable to be passed to it...
// luckily we already loaded one of those in!!
5: invokevirtual #4; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
// and return...
8: return
public static void main(java.lang.String[] args)
{
System.out.println("Hello World!");
}
Luckily for us the System.out handles the creation of the PrintStream so that we don't have to worry about how to grab the OutputStream required to dump that text to the console / terminal / debug window that we've got open - making out code ALOT shorter!
... So what happens now? We've re-witten all the methods, we've got the source code. Well then, let's test it!!
3.2.3.5 The source code - reconstructedFile: HelloWorld.java
public class HelloWorld
{
public HelloWorld()
{
super();
}
public static void main(String[] args)
{
System.out.println("Hello World! "); // modified for our nefarious needs
}
}
$ javac -d . HelloWorld.java
$ java HelloWorld
Hello World!
And that's that... almost identical to our original source code, we have recreated that amazing HelloWorld application and have now modified it for our nefarious needs! muahahahah >
3.3 Things to look out for!3.3.1 File ExtensionsAs you may have read in the post by Deathspirit, each source file used in a Java application has the extension of ".java". Please note that this extension is
case-sensitive. If you attempt to use any other extension you will receive the following error message.
error: Class names, 'Hello.Java', are only accepted if annotation processing is explicitly requested
1 error
A successful compilation will convert each of your source files into the respected ".class" file. When attempting to run these class files, please note that you have to omit the ".class" extension else you will receive something like the following:
java hello.class
Exception in thread "main" java.lang.NoClassDefFoundError: hello/class
Caused by: java.lang.ClassNotFoundException: hello.class
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
Could not find the main class: hello.class. Program will exit.
The reason for this is that the "java" command line tool interprets . as a directory/package switch. So it's actually looking for a file called class.class in a directory called hello.
3.3.2 Naming ConventionsWhen naming the files for Java sources, we should take into account the following conventions:
1. Java is an object oriented language. This considered, the names we create for out objects should accurately reflect the object it is describing (without being ridiculously long);
2. Should adhere to proper noun capitalization, i.e. rather than helloworld.java it would be HelloWorld.java.
3. Object names (class declarations inside the source file), should match the name of the file.
FileName: HelloWorld.java
Declaration: public class HelloWorld {
Sticking to these and other Java conventions will help you avoid messages such as the following:
Hello.java:1: class hello is public, should be declared in a file named hello.java
public class hello
^
1 error
3.4 SummaryOk, so it wasn't that exciting! We only reconstructed a basic hello world application, but hopefully, armed with the reference material, and now with the ability to write basic Java applications, you will be able to delve into and expand on the art of "Disassembling Java Applications"
3.5 Notes, addendum's, commentsI'm a post edit whore, so I will probably pick up on most of the typo's. If you feel there is anything that should be added here as a follow up, please drop me a comment and I'll make sure I edit it.
Enjoy!
-- xor