The base85x encoding

How to store binary data within XML files

Kullo will use a slightly modified form of the base64/base85 encoding to store binary files inside a xml file: base85x.

The general idea is to encode 4 bytes of binary data into 5 bytes of printable characters, which means that on the one hand, file size increases by 25 % but on the other hand binary data can be displayed, copy’n’pasted and stored in text files and source code.

Given four binary bytes b_1, …, b_4 out of [0, 255], we use the following base transformation to get the values c_1, …, c_5 in the range of [0, 84]:
7c0d52b0582cf30c85e6fe72bf2f2455

Then we use another alphabet than Adobe’s base85, since that includes the crucial XML characters < and >. The alphabet we’re using is the following:

0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
A
11
B
12
C
13
D
14
E
15
F
16
G
17
H
18
I
19
J
20
K
21
L
22
M
23
N
24
O
25
P
26
Q
27
R
28
S
29
T
30
U
31
V
32
W
33
X
34
Y
35
Z
36
a
37
b
38
c
39
d
40
e
41
f
42
g
43
h
44
i
45
j
46
k
47
l
48
m
49
n
50
o
51
p
52
q
53
r
54
s
55
t
56
u
57
v
58
w
59
x
60
y
61
z
62
?
63
!
64
.
65
:
66
-
67
_
68
,
69
;
70
(
71
)
72
[
73
]
74
=
75
|
76
{
77
}
78
*
79
+
80
@
81
#
82
%
83
~
84
^

There are 94 printable characters in ACSII (plus the space ” “). The spare characters we are not using are: "$&'/<>\`

All these 94 ASCII chars are represented as 1-Byte characters in UTF-8, thus the size will not increase when storing base85x blocks in XML files.

In order to get consistent results, we need to use a standardized padding if the input data is not divisible by 4: We add as many ^ characters as we need to get a full 4-bytes block. And to store the number of tildes which were added, we trim the resulting 5-bytes output by exactly this number.

If we had for example 42 bytes of binary data, ending on “n!”, the padding rule expands this block to “n!^^” before encoding.

base85xEncode(“n!^^”) = 110*256³ + 33*256² + 94*256 + 94 = (35,33,54,28,76)85 = ZXsS{

In this Example, we cut the last 2 characters and get “ZXs”

Note: Since the first version of this encoding there have been incompatible updates in the alphabet. See all related posts.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.