Skip to content

TOON Format

TOON (Tabular Object-Oriented Notation) v3 is a compact text serialization format designed to reduce token usage when sending structured data to LLMs.

Why TOON?

When you send query results to an LLM as JSON, every row repeats the same field names:

[{"id":1,"name":"Widget","category":"Tools","price":29.99},
 {"id":2,"name":"Gadget","category":"Tools","price":19.99}]

TOON moves field names into a header, so they appear only once:

[2,]{id,name,category,price}:
  1,Widget,Tools,29.99
  2,Gadget,Tools,19.99

The savings grow with row count and stabilize at the dataset's natural ceiling.

Real-World Token Savings

Measured on real public datasets using tiktoken cl100k_base (GPT-4o tokenizer):

Dataset Rows JSON Tokens TOON Tokens Savings
MovieLens (7 cols) 10 632 495 21.7%
MovieLens (7 cols) 100 6,306 4,789 24.1%
MovieLens (7 cols) 500 26,674 18,927 29.0%
Restaurant (9 cols) 10 723 473 34.6%
Restaurant (9 cols) 100 7,071 4,326 38.8%
Restaurant (9 cols) 500 35,663 21,787 38.9%

Savings are highest with: many columns, short values, numeric data. Savings are lowest with: long text content, few rows.

Format Specification

A TOON tabular document has this structure:

[ROW_COUNT,]{field1,field2,...}:
  value1,value2,...
  value1,value2,...
  • [N,] — Row count in square brackets, followed by a comma
  • {field1,field2} — Field names in curly braces
  • : — Header terminator
  • Each data row is indented with two spaces, values separated by commas

Value Rules

Type Example TOON
String "Alice" Alice (unquoted if safe)
String with comma "Smith, John" "Smith, John" (quoted)
Number 29.99 29.99 (canonical, no scientific notation)
Boolean true true
Null null null
Empty string "" "" (quoted)

Quoting Rules (Section 7.2)

A string value must be quoted when it:

  • Contains the delimiter (comma by default)
  • Matches a keyword (true, false, null)
  • Looks like a number (123, 3.14)
  • Has leading/trailing whitespace
  • Is empty

Escape Sequences (Section 7.1)

Sequence Meaning
\\ Backslash
\" Double quote
\n Newline
\r Carriage return
\t Tab

Conformance

Seamless-RAG's TOON encoder passes 166/166 official TOON v3 specification test fixtures, covering: nested escaping, empty rows, unicode content, mixed types, key folding, delimiter options, and number canonicalization.

Side-by-Side Example

JSON (207 tokens)

[{"movie_id":318,"title":"Shawshank Redemption, The (1994)","genres":"Crime, Drama","year":1994,"avg_rating":4.43,"num_ratings":317},{"movie_id":858,"title":"Godfather, The (1972)","genres":"Crime, Drama","year":1972,"avg_rating":4.29,"num_ratings":192},{"movie_id":2959,"title":"Fight Club (1999)","genres":"Action, Crime, Drama, Thriller","year":1999,"avg_rating":4.27,"num_ratings":218}]

TOON (157 tokens — 24.2% saved)

[3,]{movie_id,title,genres,year,avg_rating,num_ratings}:
  318,"Shawshank Redemption, The (1994)","Crime, Drama",1994,4.43,317
  858,"Godfather, The (1972)","Crime, Drama",1972,4.29,192
  2959,Fight Club (1999),"Action, Crime, Drama, Thriller",1999,4.27,218

Usage

from seamless_rag.toon.encoder import encode_tabular

rows = [
    {"id": 1, "content": "Climate change affects biodiversity", "score": 0.92},
    {"id": 2, "content": "Recent studies show temperature rise", "score": 0.87},
]
print(encode_tabular(rows))

Output:

[2,]{id,content,score}:
  1,Climate change affects biodiversity,0.92
  2,Recent studies show temperature rise,0.87