TDD - Textual Data Definition

Basics

TDD is a very simple expression language created for defining hierarchical data (hashes, sequences) with plain text. It is mostly used for configuring FMPP. For example, configuration files use TDD syntax.

TDD syntax is identical to the syntax of literals in FTL (FreeMarker Template Language). For example, this data structure:

(root)
 |
 +- user = "Big Joe"
 |
 +- tall = true
 |
 +- animals
     |
     +- (1st)
     |   |
     |   +- name = "white mouse"
     |   |
     |   +- price = 30
     |
     +- (2nd)
     |   |
     |   +- name = "black mouse"
     |   |
     |   +- price = 25
     |
     +- (3rd)
         |
         +- name = "green mouse"
         |
         +- price = 150

could be described with this TDD expression:

{
    "user": "Big Joe",
    "tall": true,
    "animals": [
        {"name": "white mouse", "price": 30},
        {"name": "black mouse", "price": 25},
        {"name": "green mouse", "price": 150}
    ]
}

which is also a legal FTL expression. Please read the FreeMarker Manual about "Specify values directly" if you are not familiar with them. TDD doesn't support any other FTL operators (as variables, built-ins, interpolations, etc.), only the "Specify values directly" part of FTL.

TDD syntax allows terser expressions than FTL because of these additional syntactical rules:

  • Strings need not be quoted if they doesn't look like a legal boolean or number value, and they don't contain:
    • white-space: space, tab, line-break, etc.
    • Quotation marks or apostrophe-quote: ", '
    • Separator-like chars: comma (,), semicolon (;). Colon (:) is allowed without quoting the string if the string is not a key in a hash.
    • Bracket-like chars: (, ), [, ], {, }, <, >
    • Equals sign (=)
    • Plus sign (+)
  • Line-break can be used instead of comma (,). That is, in practice, you can omit commas that would be at the end of the lines.
  • If in a hash the value is missing from a key:value pair, then the value defaults to boolean true.

Utilizing these rules, the example TDD expression can be written as:

{
    user: "Big Joe"
    tall
    animals: [
        {name: "white mouse", price: 30}
        {name: "black mouse", price: 25}
        {name: "green mouse", price: 150}
    ]
}

Back to the strings... Examples of legal unquoted strings:

  • **/foo-bar/baz.txt
  • C:\windows\system32
  • 25%
  • #FF80FF
  • ?!
  • 7.txt
  • 1-2
  • True

Examples of strings where quotation had to be used:

  • "contains space"
  • "a'b"
  • "(c)"
  • "<head>"
  • "7"
  • "true"

TDD supports an escape sequence that FTL doesn't: backslash at the end of the line. It is used to break strings into multiple lines in the TDD, without actually introducing line-breaks and indentation in the value of the string. For example here:

{
    text: 'This is \
           a single \
           line.'
}

the value of text will be the same as with:

{
    text: 'This is a single line.'
}

The exact rule is that if backslash is followed by a line-break (extra horizontal white-space is allowed between the backslash and the line-break), then all characters after the backslash will be removed until the first non-white-space character is reached, or a 2nd line-break is reached.

Interpretation modes

A text can be interpreted as TDD either in:

  • Single expression mode: this is the basic case.
  • Hash mode: The text is assumed to describe the name:value pair list of a hash.
  • Sequence mode: The text is assumed to describe the value list of a sequence.

An example of hash mode are configuration files. There you enter the settings as if you were already between { and }:

sourceRoot: src
outputRoot: out

and this will evaluate to a hash. In single expression mode, you could describe the same value with this:

{
    sourceRoot: src
    outputRoot: out
}

An example of sequence mode is when you specify the removeExtensions setting with command-line argument to the command-line tool or as Ant task parameter. For example, when you enter this in the command-line:

fmpp -S src -O out --removeExtensions "foo, bar, baaz"

Don't be confused on the quotation marks here, those are required by the command-line parser of the OS shell, it has nothing to do with TDD. What FMPP gets don't contain the quotation marks, only the text between them. So the TDD expression we are talking about is:

foo, bar, baaz

The same value could be described in single expression mode as:

[foo, bar, baaz]

When to use single expression mode, and when hash mode, and when sequence mode? Hash mode or sequence mode is used when you specify the value of something as discrete text (that is, not as the part of a larger, enclosing TDD expression), and it is known that the value must be hash or sequence respectively. For example, in the --removeExtensions example above, the TDD expression is given in an independent text fragment, and it is known that the removeExtensions setting is a sequence. Compare it with the case, when you specify the same setting value in a configuration file:

removeExtensions: [foo, bar, baaz]

Here, the value is not a discrete text, because it is a fragment of a larger TDD expression (which is, by the way, a hash mode expression). Thus, the value must be specified in single expression mode, regardless that we know that it should be a sequence. Because, if you were allowed to write:

removeExtensions: foo, bar, baaz

then there would be an ambiguity, as it could be also interpreted as:

removeExtensions: foo
bar: true
baaz: true

(Since, if the value is missing from a key:value pair, then the value defaults to boolean true, and comma can be replaced with line-break) So be sure you don't forget the brackets in configuration files.

Hash addition

TDD allows you to put hash value directly into another hash value, without specifying key for it. For example:

{a: A, b: B,  {c: C, d: D}}

In this case, when the TDD interpretation passes the {c: C, d: D}, it will add all key:value pairs of it to the enclosing hash. So the final result hash will contain these key:value pairs: a: A, b: B, c: C, d: D.

When the hash is added to its parent, it may overwrite keys in that. For example:

{a: A1, b: B1, c:C1, {b: B2, c: C2}, {c: C3}}

will result in a hash that contains these key:value pairs: a:A1, b:B2, c:C3.

You may wonder what is this all good for. Hash additions are useful with data loaders that return hashes. Read on...

TDD functions

TDD has a construct called TDD function that is identical to FTL method calls. The meaning of TDD functions depends on which setting do you use them in. For example, when you use TDD with the data setting, then they are used to invoke data loaders, and their return value is the loaded data. A part form the configuration used in the Quick Tour:

data: {
    tdd(data/style.tdd)
    birds: csv(data/birds.csv)
}

The first function call (tdd(...)) returns the a hash that was built by interpreting the data/style.tdd file. There is no key given for it (like someKey: tdd(...)), so its key:value pairs will be added directly to the data hash (hash addition). The second call (csv(...)) returns a sequence, which will be stored with key birds in the data hash.

Another example of the usage TDD functions is when method calls are used with the modes setting. A fragment from a possible configuration file:

modes: [copy(**/*.html, **/*.htm), ignore(tmp/)]

In this use case, you are not interested what kind of values do the function calls return. You just use the function calls for describing groups. It's the internal business of the FMPP core what the function calls return to solve this task.

There is no restriction regarding the type of the value a TDD function returns. You may thought that TDD knows only these types: string, number, boolean, hash, sequence. These are the types for which you can specify values directly, but a sequence or a hash can store any type of Java objects as values, not only these. So any type of object can get into hashes or sequences as the return value of TDD function.

Comments

TDD knows 2 types of comments:

  • FTL comment: These are delimited with <#-- and -->, and can span multiple lines. They can be inserted everywhere where optional white-space could be inserted. FTL comments can't be nested into another FTL comment.
  • Shell-script/"properties" style comment: These are lines starting with #, optionally preceded with white-space. The comment spreads until the end of the line. This comment can be inserted everywhere where optional line-break could be inserted.

Comments will not work inside quoted strings, nor nested inside comments. That is, they will count as normal text in these places.

Example:

# This is a test.
# Now "a" will be 1
a: 1 <#-- now "a" is 1 -->
b <#-- this was the key --> : <#-- now comes the value --> 2
  # Comments can be indented.
<#--
  FTL-style comment
  can span over multiple lines.
-->

Character encoding issues

If you load TDD from a file, then FMPP have to use an encoding (charset) to interpret the bytes as text. The default of this encoding depends on how do you load the TDD file:

  • If it is a configuration file: ISO-8859-1 is used.
  • If it is loaded with tdd data loader as tdd(fileName): sourceEncoding is used.
  • If it is loaded with data loader tdd as tdd(fileName, encoding): the encoding suggested by the 2nd parameter is used.

A TDD file can specify its own encoding with a special comment, in which case the default of the encoding (see above) is ignored. For example this TDD file will be always interpreted with UTF-8 encoding, doesn't mater how you load it:

# encoding: UTF-8
some: tdd
comes: here

The encoding comment must be the very first line of the TDD file. FTL-style comment can't be used for this purpose. Extra white-space, or no white-space is OK between the # and encoding, also extra white-space can be present around the colon. Word charset can be used instead of encoding. The words are not case sensitive.

The encoding comment works only if the file can be correctly interpreted as US-ASCII until the end of the encoding name. So it will not work for UTF-16, UCS-2, UCS-4, and with EBCDIC based charsets. Note that an UTF-8 BOM at the begining of the file is automatically ignored.