FFFF
2010-11-04
Introduction
FFFF is a data encoding designed for efficiency, flexibility, and ease of processing. Like s-expressions and XML, FFFF allows arbitrarily deep nesting and can be used to express any kind of data structure or program. Unlike s-expressions and XML, FFFF is a binary format, optimized for being quickly parsed and/or transmitted.
Status
FFFF is currently in draft status. Experimental implementations are being written. The experience gathered during this process will be used to update and finalize the definition of the format. Until then, this document is subject to change without notice.
Overview
The primary purpose of FFFF is encoding data. Ideally, it should be possible to encode any data that one might wish to store and/or transfer in such a way that the result preserves the structure of the original data, takes up few bytes, and can be read by another program with little work.
The goals set out for FFFF are very ambitious, and it is possible that experience or specific applications will lead to new insights and the development of better encoding schemes. FFFF anticipates this by providing a method for switching to a specific language, where a language has complete freedom in defining its own encoding scheme. Of course, a plethora of different languages is detrimental to interoperability. Therefore, care has been taken to make FFFF as described in this document good enough to cover the vast majority of cases. The use of this language is indicated by setting the language to FFFF version 0.2.
In FFFF, data is encoded as a stream of individual data items, each called a datum. Each datum consists of a tag identifying what kind of datum it is. The tag is followed by a value of zero or more bytes, whose interpretation depends on the kind of datum.
Tags in FFFF are natural numbers, encoded using FFFF's numeral encoding. In order to parse an FFFF datum, a program reading the datum must know the tag's definition. These definitions are provided by the language specification. The tags defined in this version of FFFF can be found in the section on tags.
In addition to encoding data directly, values may also be encoded by reference. FFFF allows defining a tag to refer to a specific value. After such a definition is made, the tag can be used as a reference to the value, which preserves object identity and conserves space. More information can be found in the description of definitions.
Encoding
Two commonly used encoding schemes used in FFFF are numeral encoding and blob encoding. These are described below.
Numeral Encoding
In FFFF, numerals are used to encode arbitrarily large numeric values. The numeral encoding consists of one or more bytes and can express any integer value of zero or greater.
The numeral encoding consists of a number of code bytes, where each byte contains a continuation bit (the most significant bit) and 7 data bits (the least significant bits). If the continuation bit is set to 1, this indicates that the code byte is followed by another code byte. The last code byte has its continuation bit set to 0. The data bits in the code bytes correspond to the bits in the encoded value, where the first code byte holds the 7 least significant bits of the encoded value, and each successive code byte holds the 7 next least significant bits of the encoded value.
Examples
Any value from 0 to 127 (both inclusive) can be encoded in a single byte:
Decimal | 0 |
---|---|
Binary | 0 |
Encoding | 00000000 |
Decimal | 42 |
---|---|
Binary | 101010 |
Encoding | 00101010 |
Decimal | 127 |
---|---|
Binary | 1111111 |
Encoding | 01111111 |
The value 128 needs two bytes in FFFF numeral encoding:
Decimal | 128 |
---|---|
Binary | 1 0000000 |
Encoding | 1000000 00000001 |
The value 1387055 needs three bytes:
Decimal | 1387055 |
---|---|
Binary | 1010100 1010100 0101111 |
Encoding | 10101111 11010100 01010100 |
Blob Encoding
A blob can hold arbitrary binary data. In FFFF, blob encoding consists of a numeral indicating the number of bytes in the blob, followed by the bytes comprising the blob data.
Example
Unencoded bytes (hex) | e0 a4 ae e0 a5 87 e0 a4 a4 e0 a5 8d e0 a4 a4 e0 a4 be |
---|---|
Blob encoding (hex) | 12 e0 a4 ae e0 a5 87 e0 a4 a4 e0 a5 8d e0 a4 a4 e0 a4 be |
Tags
This section gives an overview of the tags defined in FFFF version 0.2.
Integers
Integer values are encoded entirely in the tag. To mark the tag as encoding an integer, the least significant bit of the tag is set to 1 (all tags for non-integer values have this bit set to 0). The rest of the tag bits are used to encode the integer's value in two's complement encoding. The following provide examples of integers encoded using this scheme:
Decimal value | 0 |
---|---|
Two's complement | 0 |
FFFF encoding | 00000001 |
Decimal value | 1 |
---|---|
Two's complement | 01 |
FFFF encoding | 00000011 |
Decimal value | -1 |
---|---|
Two's complement | 1 |
FFFF encoding | 01111111 |
Decimal value | 31 |
---|---|
Two's complement | 011111 |
FFFF encoding | 00111111 |
Decimal value | -32 |
---|---|
Two's complement | 100000 |
FFFF encoding | 01000001 |
Decimal value | 32 |
---|---|
Two's complement | 0 100000 |
FFFF encoding | 11000001 00000000 |
Decimal value | 100 000 |
---|---|
Two's complement | 01100 0011010 100000 |
FFFF encoding | 11000001 10011010 00001100 |
Available since: FFFF version 0.1.
Booleans
There are two possible boolean values: true and false. Each of these has its own tag in FFFF:
Value | Tag |
---|---|
false | 0 |
true | 2 |
Available since: FFFF version 0.1.
Blobs
A blob is a container for arbitrary binary data; a blob contains zero or more bytes. The tag value for blobs is 4. The tag is followed by the value, in blob encoding. For example:
Bytes to be encoded (hex) | e9 9b bb e5 ad 90 e8 a8 88 e7 ae 97 e6 a9 9f |
---|---|
Encoding (hex) | 04 0f e9 9b bb e5 ad 90 e8 a8 88 e7 ae 97 e6 a9 9f |
Available since: FFFF version 0.1.
Strings
In FFFF, a tag value of 6 indicates a string. This is followed by a blob-encoded value consisting of a numeral indicating the number of characters in the string, followed by the contents of the string, encoded as UTF-8. Note that, in UTF-8, the number of characters is not necessarily equal to the number of bytes.
For example, the string Hello, world!
is encoded as follows:
String | Hello, world! |
---|---|
UTF-8 (hex) | 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 |
Number of characters | 13 (0d hex) |
Encoded string | 06 0e 0d 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 |
The string Здравствуй, мир!
(Hello, world!
in Russian) is encoded as follows:
String | Здравствуй, мир! |
---|---|
UTF-8 (hex) | d0 97 d0 b4 d1 80 d0 b0 d0 b2 d1 81 d1 82 d0 b2 d1 83 d0 b9 2c 20 d0 bc d0 b8 d1 80 21 |
Number of characters | 16 (10 hex) |
Encoded string | 06 1e 10 d0 97 d0 b4 d1 80 d0 b0 d0 b2 d1 81 d1 82 d0 b2 d1 83 d0 b9 2c 20 d0 bc d0 b8 d1 80 21 |
Available since: FFFF version 0.1.
Symbols
Symbols are named identifiers. There are two encodings for symbols in FFFF: the short encoding and the long encoding. The short encoding consists of only the symbol name, whereas the long encoding also includes a namespace. In both cases, the name is stored in UTF-8 with a character count prepended, similar to string encoding.
The short encoding for a symbol uses tag value 8, but is otherwise identical to the encoding for strings. That is, the tag is followed by a numeral indicating the number of bytes until the end of the encoded symbol (excluding the tag and this numeral), followed by a numeral indicating the number of characters in the symbol name, followed by the symbol name in UTF-8. For example:
Symbol name | foo |
---|---|
UTF-8 (hex) | 66 6f 6f |
Short encoding | 08 04 03 66 6f 6f |
The long encoding for a symbol uses tag value 10 (0a hex). The tag is followed by a numeral indicating the number of bytes until the end of the encoded symbol. After this numeral comes the namespace. The namespace can be any FFFF value, following the tag-value encoding. After the namespace, the symbol name is stored, as an UTF-8 string preceded by a numeral denoting the number of characters in the string.
The example below shows how a symbol quuz
in the
namespace foo
is encoded. The symbol foo is encoded using
short encoding, as shown in the earlier example.
Symbol name | quuz |
---|---|
UTF-8 (hex) | 71 75 75 7a |
Namespace | foo (symbol) |
Namespace, encoded | 08 04 03 66 6f 6f |
Long encoding | 0a 0b 08 04 03 66 6f 6f 03 71 75 75 7a |
Both types of symbol encoding have been available since FFFF version 0.1.
Arrays
An array is a collection of zero or more values. There are two array encodings in FFFF: one with variable-size elements, and one with fixed-size elements.
An array with variable-size elements is encoded as the tag 12 (0c hex), followed by the size of the following data in bytes (as a numeral), followed by the number of elements in the array (as a numeral), followed by the elements themselves (as FFFF tag-value pairs). The elements follow directly after one another. Examples:
- An empty array: 0c 01 00
- An array containing the integers 1, 2, and 3: 0c 04 03 03 05 07
- An array containing the strings
foo
andbar
: 0c 0d 02 06 04 03 66 6f 6f 06 04 03 62 61 72
In arrays with fixed-size elements, elements are stored in fixed-size blocks, each the same number of bytes in size. If an element doesn't actually need all bytes allocated to it, the bytes are unused and are expected to be set to zero.
The encoding for an array with fixed-size elements is as follows: the tag 14 (0e hex), followed by the size of the following data in bytes (as a numeral), followed by the number of bytes per element (as a numeral), followed by the elements themselves. The number of elements in the array can be calculated by dividing the remaining bytes after the element size indicator by the number of bytes per element. Examples:
- An array with two elements of six bytes each, containing the
strings
foo
andbar
: 0e 0d 06 06 04 03 66 6f 6f 06 04 03 62 61 72 - An array with three elements of 7 bytes each, containing the
integer 1, the string
quuz
, and the integer 2: 0e 16 07 03 00 00 00 00 00 00 06 05 04 71 75 75 7a 05 00 00 00 00 00 00
Both types of array encoding have been available since FFFF version 0.1.
Blocks
Blocks are used to group data. In FFFF, a block also provides scoping: tag definitions and language changes in a block persist until the end of the block, after which the definitions that were present at the beginning of the block are restored. This allows blocks to be processed independently and in parallel.
A block is encoded as a tag, followed by a numeral indicating the number of bytes until the end of the block, followed by zero or more FFFF values in the regular tag-value encoding. The tag value for blocks is 16 (10 hex).
The following example is the encoding for a block containing the
symbol foo (in short encoding), the string bar
, and the integer
42: 10 0e 08 04 03 66 6f 6f 06 04 03 62 61 72
d5 00
Available since: FFFF version 0.1.
Definitions
To allow the same datum to be referenced from
multiple locations, and to allow for more compact encoding, FFFF
allows defining a tag to refer to a specific datum. Such definitions
are introduced by the tag 18 (12 hex), followed by the numeral which
will act as a reference, followed by the FFFF datum it will refer to.
For example, the following code defines the numeral 32 (20 hex) to
refer to the string foo
:
Given this definition, the tag 20 can subsequently
be used to refer to the string foo
in any place where an
FFFF datum is expected. For example, with the above definition in
effect, the following encodes an array
containing 3 references to the string foo
:
Any tag that is not an integer may be defined this way, including tags that have previously been defined by earlier definitions or language definitions. Redefining an integer is an error, and the consequences of doing so are not defined by this document.
A definition takes effect immediately after it occurs. If it occurs inside a block, it is in effect until the end of the block. Otherwise, it is in effect until the end of the stream.
Available since: FFFF version 0.1.
Language
FFFF provides support for different languages, where each language can define its own encoding, which may or may not be compatible with the encoding described in this document. To support such languages, FFFF provides a tag to indicate the language of the content following it.
A language directive consists of the tag 16256 (3f80 hex), followed by a blob that identifies the language, followed by two numerals that indicate the major version and the minor version of the language, in order.
The major version and minor version allow new versions of languages to be released, while indicating to programs reading FFFF whether the language being used in an FFFF stream is compatible with the version of the language the program knows. Language version numbers must be assigned in such a way that a program that can read language L with major version X and minor version Y can also read language L with major version X and minor version Z, where Z is less than Y. Thus, if not all streams that are valid under an existing language version are valid under a new language version, the new language version must have a different major version number.
An FFFF stream that uses the language described in this document can indicate this by specifying a language identified by a blob containing four bytes of value 46 hex (the Unicode code point for the capital letter F), major version 0, and minor version 1. Encoded as an FFFF datum, this becomes: 80 7f 04 46 46 46 46 00 01
If the language is switched inside a block, the language switch is in effect until the end of the block.
Available since: FFFF version 0.1.
Imports
Tag 16258 (3f82 hex), identifier (blob), major version (numeral), minor version (numeral), offset (numeral). Imports a named set of definitions, at the given offset. Definitions are defined by export directives. Offset is added to the numbers mentioned in the export directives.
The imported definitions become available immediately after the import directive. If the import occurs inside a block, its effect lasts until the end of the block.
Available since: FFFF version 0.2.
Exports
Tag 16260 (3f84 hex), reference (numeral), datum. Same thing as a definition, but is made avaialable to the import statement.
Example of export and import: TODO
Available since: FFFF version 0.2.