2024-09-04

base91

base91 is a binary-to-text encoding similar to base64.

it needs 10-19% less space than base64, employs all printable ascii characters except '-\, and is designed without the need for padding.

from the readme of the canonical implementation by joachim henke:

basE91 is an advanced method for encoding binary data as ASCII characters. It
is similar to UUencode or base64, but is more efficient. The overhead produced
by basE91 depends on the input data. It amounts at most to 23% (versus 33% for
base64) and can range down to 14%, which typically occurs on 0-byte blocks.
This makes basE91 very useful for transferring larger files over binary
insecure connections like e-mail or terminal lines.

The current algorithm has been written with portability and simplicity in mind
an is therefore not necessarily optimized for speed.

* Alphabet

As the name suggests, basE91 needs 91 characters to represent the encoded
binary data in ASCII. From the 94 printable ASCII characters (0x21-0x7E), the
following three ones have been omitted to build the basE91 alphabet:

- (dash, 0x2D)
' (apostrophe, 0x27)
\ (backslash, 0x5C)

there is also a relevant paper: a proposal of substitute for base85/64 – base91.

the character table for encoding is not directly aligned with ascii codes. it begins with uppercase letters, followed by lowercase letters, numbers, then the remaining four ranges of special characters except dash, apostrophe, backslash, and double quote. finally, the double quote character is appended.

coffeescript implementation

chars_encode = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!#$%&()*+,./:;<=>?@[]^_`{|}~"'
chars_decode = Array(256).fill null
chars_decode[chars_encode.charCodeAt(i)] = i for i in [0...chars_encode.length]

base91_decode = (data) ->
  # string -> Uint8Array
  max_length = Math.ceil (7 * data.length)  / 8
  result = new Uint8Array max_length
  result_length = 0
  bit_accumulator = 0
  bit_count = 0
  value = null
  for i in [0...data.length]
    bits = chars_decode[data.charCodeAt i]
    continue unless bits?
    if value?
      value += bits * 91
      bit_accumulator |= value << bit_count
      bit_count += if (value & 8191) > 88 then 13 else 14
      loop
        result[result_length] = bit_accumulator
        result_length += 1
        bit_accumulator >>= 8
        bit_count -= 8
        break unless bit_count > 7
      value = null
    else value = bits
  if value?
    result[result_length] = bit_accumulator | value << bit_count
    result_length += 1
  result.subarray 0, result_length

base91_encode = (data) ->
  # Uint8Array -> string
  result = ""
  bit_accumulator = 0
  bit_count = 0
  for a in data
    bit_accumulator |= a << bit_count
    bit_count += 8
    if bit_count > 13
      value = bit_accumulator & 8191
      if value > 88
        bit_accumulator >>= 13
        bit_count -= 13
      else
        value = bit_accumulator & 16383
        bit_accumulator >>= 14
        bit_count -= 14
      result += chars_encode[value % 91] + chars_encode[Math.floor value / 91]
  if bit_count
    result += chars_encode[bit_accumulator % 91]
    if bit_count > 7 or bit_accumulator > 90
      result += chars_encode[Math.floor bit_accumulator / 91]
  result

example usage

uint8array_to_string = (a) -> Buffer.from(a).toString "utf-8"
string_to_uint8array = (a) -> new Uint8Array Buffer.from a, "utf-8"
console.log uint8array_to_string base91_decode base91_encode string_to_uint8array "test"