EvilZone
Programming and Scripting => Scripting Languages => Topic started by: str0be on June 21, 2013, 09:54:10 AM
-
Ok, so Base64 encoding/decoding in POSIX sh has been rolling around in my head for a bit and I've decided that its impossible! I started working with bash and I think I've made some progress:
#!/bin/bash
# array of standard base64 characters
CHARS=($(echo -n "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c \
d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 + /"))
# standard character used for padding
# XXX needs implementation
PAD="="
# return input as a string of space-separated decimal numbers
function to_dec {
echo -n "$1" | hexdump -ve '/1 "%d "'
}
function encode {
typeset -a -r bytes=($(to_dec "$1")) # Read-only array of input as numbers.
typeset -i byte # For translating to a CHARS index.
typeset -i bytei # Index to current byte of input.
typeset -i chari # Index to char to translate to.
typeset -i left=0 # Bits from previous translation.
typeset -i a # tmp
typeset -i b # tmp
typeset encoded="" # Output buffer.
for ((bytei=0; bytei < ${#bytes[*]}; bytei++)); do
if [ $left -eq 0 ]; then
byte=${bytes[$bytei]}
chari=$(( $byte >> 2 ))
encoded=${encoded}${CHARS[$chari]}
left=2
elif [ $left -eq 6 ]; then
byte=${bytes[$(( $bytei - 1 ))]}
chari=$(( $byte & 63 ))
encoded=${encoded}${CHARS[$chari]}
byte=${bytes[$bytei]}
chari=$(( $byte >> 2 ))
encoded=${encoded}${CHARS[$chari]}
left=2
else
a=$(( (${bytes[$(( $bytei - 1 ))]}) & (2 ** $left - 1) ))
b=$(( (${bytes[$bytei]}) >> ($left + 2) ))
chari=$(( ($a << (6 - $left)) | $b ))
encoded=${encoded}${CHARS[$chari]}
left=$(( $left + 2 ))
fi
done
if [ $left -ne 0 ];then
byte=${bytes[$(( ${#bytes[*]} - 1 ))]}
chari=$(( ($byte & (2 ** $left - 1)) << (6 - $left) ))
encoded=${encoded}${CHARS[$chari]}
fi
echo -n "$encoded"
}
function decode {
echo "Not Implemented"
}
if [ $# -eq 0 ]; then
F=$(cat -)
echo $(encode "$F")
else
echo "Wrong args..."
fi
This doesn't seem to be compatible with GNU core utils Base64 but I can't figure out why...
I realize this doesn't have much practical value but I'm wondering if someone can come up with a POSIX sh way of doing things? Or show me where I'm going wrong with bash...
-
You don't take the newline into account. Your script translates "hello\n" the same way as it does with "hello".
The other thing that's wrong is the padding. Have another look into that.
deque@desolate:~/scripts$ base64
hello
aGVsbG8K
deque@desolate:~/scripts$ base64
helloaGVsbG8=
deque@desolate:~/scripts$ ./base.sh
hello
aGVsbG8
deque@desolate:~/scripts$ ./base.sh
helloaGVsbG8
-
Thanks for a pointer in the right direction. I hadn't even considered that 'encode' wasn't getting a pristine copy of the input. Turns out that process substitution silently truncates. After I discovered this I made a simple demonstration:
pitcher
#!/bin/bash
printf "This is output that needs to be
preserved exactly
"
catcher
#!/bin/bash
CATCH1=$(./pitcher)
echo CATCH1
echo -n $CATCH1
echo
echo
echo "---------------------------"
echo
CATCH2=$(./pitcher)
echo CATCH2
echo -n "$CATCH2"
echo
echo
echo "---------------------------"
echo
TMP=$(./pitcher; echo XXX) # Add our own 'end of input' sentinel.
CATCH3="${TMP%XXX}" # Del sentinel == exact copy of subprocess output.
echo CATCH3
echo -n "$CATCH3"
Output from executing catcher:
CATCH1
This is output that needs to be preserved exactly
---------------------------
CATCH2
This is output that needs to be
preserved exactly
---------------------------
CATCH3
This is output that needs to be
preserved exactly
Here's the updated script:
#!/bin/bash
# array of standard base64 characters
CHARS=($(echo -n "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c \
d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 + /"))
IFS=' '
# standard character used for padding
# XXX needs implementation
PAD="="
# return input as a string of space-separated decimal numbers
function to_dec {
echo -n "$1" | hexdump -ve '/1 "%d "'
}
function encode {
TMP=$(to_dec $1; echo XXX)
typeset -a -r bytes=(${TMP%XXX}) # Read-only array of input as numbers.
typeset -i byte # For translating to a CHARS index.
typeset -i bytei # Index to current byte of input.
typeset -i chari # Index to char to translate to.
typeset -i left=0 # Bits from previous translation.
typeset -i a # tmp
typeset -i b # tmp
typeset encoded="" # Output buffer.
for ((bytei=0; bytei < ${#bytes[*]}; bytei++)); do
if [ $left -eq 0 ]; then
byte=${bytes[$bytei]}
chari=$(( $byte >> 2 ))
encoded=${encoded}${CHARS[$chari]}
left=2
elif [ $left -eq 6 ]; then
byte=${bytes[$(( $bytei - 1 ))]}
chari=$(( $byte & 63 ))
encoded=${encoded}${CHARS[$chari]}
byte=${bytes[$bytei]}
chari=$(( $byte >> 2 ))
encoded=${encoded}${CHARS[$chari]}
left=2
else
a=$(( (${bytes[$(( $bytei - 1 ))]}) & (2 ** $left - 1) ))
b=$(( (${bytes[$bytei]}) >> ($left + 2) ))
chari=$(( ($a << (6 - $left)) | $b ))
encoded=${encoded}${CHARS[$chari]}
left=$(( $left + 2 ))
fi
done
if [ $left -ne 0 ];then
byte=${bytes[$(( ${#bytes[*]} - 1 ))]}
chari=$(( ($byte & (2 ** $left - 1)) << (6 - $left) ))
encoded=${encoded}${CHARS[$chari]}
fi
echo -n "$encoded"
}
function decode {
echo "Not Implemented"
}
if [ $# -eq 0 ]; then
TMP=$(cat -; echo -n XXX)
F=${TMP%XXX}
echo $(encode $F)
else
echo "Wrong args..."
fi
$ echo hello | ./base64.sh | /usr/bin/base64 -d
hello
btw, I'm just ignoring proper padding until I get the core algorithms working. Thanks again. ;D