2019-01-12

indent-syntax

about indent-based tree structures

line indentation can be used to create a tree-like structure where the length of empty space at the beginning of lines determines nesting depth

interpretation

an indent-tree can be parsed and interpreted in more than one way. following are three possible interpretations for this text:

line-1
  line-2
line-3

denoted tree

one entry for each line with an integer for the indent-depth

((0 line-1) (1 line-2) (0 line-3))

tree

indent-depth equals nesting-depth

(line-1 (line-2) line-3)

prefix-tree

((line-1 line-2) line-3)
(depth-0 (depth-0 depth-1 (depth-1 depth-2 depth-2) depth-1))

sub-list prefixes are the roots of sub-trees

multiple indent-steps at once

if nesting depth increases by multiple steps at once like in the following example

line-1
    line-2
      line-2-1
line-3

then line-2 could be interpreted as having no prefix

(line-1 ((line-2 line-2-1)) line-3)

operator application intepretation

space for operators and comma and newline for arguments

application

operator arg1, arg2,
  argn, ...

composition

operator operator arg1, arg2

with optional round brackets or in other languages corresponds to

operator(operator(arg1, arg2))

space for operators and space and dot for arguments

application

operator arg1 arg2
  . argn ...

composition

operator : operator arg1 arg2
  . a

advantages of indent-based syntax

only space is needed to create a nesting structure and only the beginning of lines needs to be marked. the potential for variation in formatting is lower than for alternative tree notations like s-expressions or xml. the same structure notated by different authors, who otherwise tend to invent and use personal formatting styles for brackets, whitespace and nesting, will look very similar, especially without empty lines

downsides

indent alone can not mark multiple sub-lists on the same line, like in this s-expression:

(+ (* 1 2) (/ 4 2))

formatting

long lines

line wrapping can be done with continued or one-step increased indent on following lines. naive line wrapping starts at the beginning of a line and can be more difficult to read

indent step

two spaces per indentation step is a widespread convention. use of the tab character is also common, which introduces all the complications associated with tab character usage, including the introduction of a second invisible space character and therefore a possible incorrect mix of spaces and tabs, the designation of an extra character for text compression, the necessity for viewer and editor programs to render it and the required configuration of all potential viewer and editor programs to show the tab character with an appropriate and preferred width. tab is usually rendered as 8 spaces tabular aligned to the next equidistant spacing from the beginning of the line, which isnt how people usually want to indent. indent isnt hard to recognise and viewers could display it in users preferred width regardless of the use of space or tab character

some languages that use indent for code structure

for note taking

here is an indent based, machine and human readable text format for titled, separated parts of text or notes. words can be tags or make up a headline. nested structures can be created in content, but dont need to be parsed. if words are tags, then note lists can be processed to extract, merge or analyse notes by tag. an itpn management utility is part of sph-script. "indent tree packet notation", itpn

word word
  content-line
  content-line
  content-line
  content-line
word
  content-line
  content-line

syntax

  • packet: [prefix content] ...
  • prefix: word [" " word] ...
  • content: ["\n" indent any-character ...] ...

for document markup

here is a generic, indentation based syntax for structured documents. it includes forms that can be evaluated by custom procedures to create output like lists, tables and more. "indent tree markup language", itml

expression properties

scope

  • inline: start and end somewhere on a line
  • indent: include all immediately following further indented lines
  • line: from their start to the end of the line

content interpretation

  • scm: start with # and arguments have to be valid scheme syntax
  • text: start with ## and arguments are plaintext

evaluation phase

  • ascend: itml expressions in arguments have been evaluated
  • descend: itml expressions in arguments have not been evaluated

inline expressions

inline-scm

#(identifier scheme-expression ...)

indent-scm

#identifier scheme-expression ...
  scheme-expression ...
  ...

inline-text

##(identifier plaintext/itml-expression ...)

indent-text

##identifier plaintext/itml-expression ...
  plaintext/itml-expression ...
  ...

line-scm, line-text

#identifier: scheme-expressions ...
##identifier: plaintext/itml-expressions ...

indent-descend

###identifier plaintext ...
  plaintext
  ...

the text is passed as a parsed tree without any nested expressions evaluated. this can be used for example to create block escaping

headings

a line before increased indent becomes a heading

this is a heading
  this is content
  and more example text
  a sub-heading
    more content

line breaks

each empty line, two newlines, creates one line break in the output

example text

more text after empty line

escaping

inline expression prefixes, colons and backslashes can be escaped with a backslash

\:
\#
\##
\###
\\

block escapes

###escape
  content
    content
  content