Maxml

Maxml (pronounced maximal) looks and feels like very simple YAML, but it is not YAML. Maxml is meant to be a very small and simple subset of YAML with some significant structural changes.

The Maxml binary package is ~ 20KB with debug symbols.

Why Maxml?

I like JSON.
I would have been satisfied with a simple, really small JSON parser. The closest I found was JSON-Simple, but that relies on a JFlex-generated parser class.
Verdict: Debugging JFlex-generated code is not for humans.

I like YAML.
I would have been satisfied with a simple, really small YAML parser. The closest I found was SnakeYaml, but that is ~250KB.
Verdict: At half the size of the full toolkit, it is too large.

Because Moxie needs to have a bundled parser for the build.moxie project descriptor and I could not find a parser that suited my needs, I wrote a custom YAML-ish parser.

Processing Rules

  1. Maxml files must be UTF-8 encoded.
  2. Leading whitespace is completely irrelevant and ignored (except for text blocks) so indent all you want.
  3. All lines are expected to begin with a key: or "my:key:name": sequence unless
    1. the line equals """ (start/end text block)
    2. the line equals "" (start/end offset text block)
    3. the line is blank (included in a text block, otherwise ignored)
    4. the line starts with # (comment, ignored)
    5. the line equals --- or ... (reserved, ignored)
    6. the line equals } (end map)
    7. the line begins with a - dash (list item)
  4. Values that equal { define a map and all subsequent lines will fill the map until a } line is processed
  5. Values that equals """ define a text block and all subsequent lines will be appended to that text block until a """ line is processed
  6. Values that equals "" define an offset text block and all subsequent lines will be appended to that text block until a "" line is processed
  7. All remaining values are subject to fall-through interpretation

Value Fall-Through Interpretation

Values are deserialized based on the following algorithm. If a step is successful the interpreted value is returned, otherwise the next step is checked until it falls-through the bottom of the check and a string is returned.

  1. Empty value is an empty string
  2. Explicit Null value ~
  3. Single-quoted strings 'my string'
  4. Double-quoted strings "my string"
  5. Inline-list [a, b, c, d]
  6. Inline-map {a:1, b:2, c:3, d:4}
  7. Boolean interpretation true, false, yes, no, on, off
  8. Numeric interpretation octal (32-bit signed integer, 64-bit signed long), whole number (32-bit signed integer, 64-bit signed long), hexadecimal (32-bit signed integer, 64-bit signed long), decimal (float, double)
    if the value fits in the smaller container, the smaller container will be used.
    e.g. 1 is an integer but 4,000,000,000 is a long
  9. Date interpretation: canonical, iso8601, simple date
  10. String

Key Definitions

Maxml is a key-value serialization format.

  1. Keys may only be strings
  2. Keys may have colons, if they are quoted
  3. Keys are case-insensitive

Value Definitions

Strings

# fall-through string
myString1: this is a\nstring
# explicit single-quoted string, leading/trailing quotes are stripped
myString2: 'this is another\nstring'
# explicit double-quoted string, leading/trailing quotes are stripped
myString2: "this is another\nstring"
# explicit multi-line string, leading/trailing quotes are stripped
myString3a: """
this is another string
which contains
several lines
"""
# explicit multi-line string, leading/trailing quotes are stripped
myString3b: '''
this is another string
which contains
several lines
'''
# explicit multi-line string, leading/trailing quotes are stripped
myString4: """
this is another string
which contains
several lines"""

note: ''
      this is
      an offset text block
      ''

Numerics

myInt: 12345
myLong: 4000000000
myFloat1: 2.3
myFloat2: 6.0E+23
myDouble: 6.0E+56
# octals lead with 0
myOctal: 010
# hexadecimals lead with 0x
myHex: 0x12345

Booleans

# true values
myBool1: true
myBool2: yes
myBool3: on
# false values
myBool4: false
myBool5: no
myBool6: off

Dates

Dates support second resolution. When Date objects are deserialized, the msecs value is manually set to 0.

# Simple Date, local timezone
myDate1: 2012-02-01
# Canonical, GMT
myDate2: 2012-02-01T15:30:45Z
# ISO8601, timezone-specified
myDate3: 2012-02-01T15:30:45-0400

Miscellaneous

# null
myNull: ~

Collections

Parsed collections preserve definition order.

Lists/Sequences

java.util.ArrayList

Lists may be defined using the following notation where each item in the list is defined on a separate line with a leading dash character.

# multi-line list
myList2:
- src
- tests
- extra

Simple lists may also be defined inline as key-value pairs.
Inline lists may not have nested collections.

# inline list
myList1: [src, tests, extra]
# a list of inline lists
# add myList1 as the 3rd inline list
myList3:
- [apple, banana, strawberry]
- [celery, carrot, lettuce]
- &myList1
# create myList4 by adding all elements from the first
# inline list in myList3.  myList4 = [apple, banana, strawberry]
myList4:
+ &myList3[0]

Maps

java.util.LinkedHashMap

Maps may be defined using the familiar curly brace notation.
Indentation is used for readability but is otherwise ignored by the parser.

myMap1: {
    field1: test
    field2: {
        subfield1: 10
        subfield2: [a, b, c]
    }
    field3:
     - src
     - tests
     - extra
}

Simple maps may also be defined inline as key-value pairs.
Inline maps may not have nested collections and they may not have strings with commas.

myMap2: { field1: test, field2: 100, field3: true }