Status: Draft Standard (v2.0)
This document defines the technical specification for TOON (Token-Oriented Object Notation), a data serialization format optimized for Large Language Model (LLM) context windows. See the official project page for updates.

The key goals of this specification are:

Maximizing Token Density: Representing data in the fewest possible BPE (Byte Pair Encoding) tokens.
Human Readability: Maintaining a structure that is intuitive for humans and easy for LLMs to reason about.
Lossless JSON Conversion: Ensuring 1:1 mapping with standard JSON types.

1. Document Structure

A TOON document MUST be encoded in UTF-8. A TOON document represents exactly one root node, which may be an Object or an Array.

2. Whitespace and Indentation

TOON uses Significant Whitespace to denote hierarchy.

Indent Unit: The standard indentation is 2 spaces (U+0020). Tabs (U+0009) are forbidden to ensure consistent tokenization across different models.
Newlines: Lines must end with `\n` (LF) or `\r\n` (CRLF).

3. Objects

An Object is a collection of key-value pairs.

Syntax: key: value

user:
  name: Alice
  role: admin

The colon ` : ` is mandatory. It must be followed by at least one space if a value follows on the same line.
If the value is a nested object/array, the newline follows immediately after the colon (or key).

4. Arrays

Arrays can be represented in two ways: List Style and Table Style.

4.1 List Style (Heterogeneous)

Used when array items have different structures or are primitives.

tags:
  - featured
  - new
  - on-sale

The hyphen `-` denotes a list item.

4.2 Table Style (Homogeneous)

Used when array items are objects sharing the same keys. This is the primary optimization feature of TOON.

Syntax header: key[Count]{field1, field2}

users[3]{id, name, score}:
  1, Alice, 99
  2, Bob, 85
  3, Charlie, 42

Rules:

Count: The `[N]` indicates the number of rows. This hints the LLM to expect a loop.
Fields: The `{a, b}` defines the schema for the rows.
Separator: Values are separated by comma `,`.

Note: Using pipes `|` for visual alignment is allowed by some parsers but discouraged in the strict spec as it consumes extra tokens.

5. Values and Types

5.1 Strings

Strings can be Unquoted or Quoted.

Unquoted Strings (Bare Words):Any sequence of characters that does not start with a special character (`-`, `[`, `{`, `"`, `#`) and does not contain newlines or delimiters.

status: active
color: light blue

Quoted Strings:Double quotes `"` are required if the string contains special characters, starts/ends with whitespace, or resembles a boolean/number/null.

greeting: "Hello, World!"
empty: ""
number_string: "123"

5.2 Numbers

Follows standard JSON number format (integer, float, exponent).

count: 42
temp: 36.6
avogadro: 6.022e23

5.3 Booleans

Literals `true` and `false` (lowercase).

5.4 Null

Literal `null`.

6. Comments

Comments start with `#` and extend to the end of the line.

# This is a comment
config:
  timeout: 5000 # milliseconds

7. ABNF Grammar (Excerpt)

The following is a simplified ABNF definition.

TOON        = object / list
NL          = %x0A / %x0D.0A
Indent      = 2SP

object      = 1*(key pair)
pair        = key ":" [SP value] NL

list        = list-item / table-array
list-item   = "- " value NL
table-array = key "[" integer "]" "{" field-list "}" ":" NL *row
row         = Indent value *("," SP value) NL

value       = string / number / boolean / null / object / list

8. Parsing Implementation Guide

When implementing a TOON parser, the primary challenge is Context Tracking. Since structure is defined by indentation, the parser must maintain a stack of current indentation levels.

Algorithm Sketch:

Read line.
Calculate indentation level (count leading spaces / 2).
If level > current_level: Push new container (Object/Array).
If level < current_level: Pop containers until match.
Parse content (Key-Value or List Item).

9. Test Suite

To ensure interoperability, all implementations should pass the TOON Core Test Suite (available on GitHub). It covers:

Deep nesting limits (default: 100).
Unicode handling (emojis, CJK characters).
Corner cases (empty keys, empty strings, trailing commas).

View on GitHubCompare with TONL

TOON Format Specification: Complete Guide to Grammar and Syntax