2023-08-30

indent-syntax

about indent-based notation.

line indentation can be used to create a tree-like structure where the length of empty space at the beginning of lines determines nesting depth.

interpretation

an indent-tree can be parsed and interpreted in more than one way. following are three possible interpretations for this text:

line-1
  line-2
line-3

denoted tree

one entry for each line with an integer for the indent-depth

((0 line-1) (1 line-2) (0 line-3))

tree

indent-depth equals nesting-depth

(line-1 (line-2) line-3)

prefix tree

((line-1 line-2) line-3)
(depth-0 (depth-0 depth-1 (depth-1 depth-2 depth-2) depth-1))

prefixes as the roots of sub-trees

multiple indent-steps at once

if nesting depth increases by multiple steps at once like in the following example

line-1
    line-2
      line-2-1
line-3

line-2 can be interpreted as having no prefix

(line-1 ((line-2 line-2-1)) line-3)

operator application intepretation

operators followed by space in application position

application

operator arg1, arg2,
  arg3, arg4

composition

operator operator arg1, arg2

with round brackets

operator(operator(arg1, arg2))

in this case commas are used to escape application.

space for operators and line specific syntax

application

operator arg1 arg2
  . arg3 arg4

composition

operator
  operator arg1 arg2
  . arg3 arg4

in this example a dot is used to mark the contination the argument list.

nesting on one line

indent alone can not mark multiple sub-lists on a single line, like in this s-expression:

(+ (* 1 2) (/ 4 2))

round brackets can be used in this case. wisp uses a colon to nest until the end of the line:

+ : * 1 2 : / 4 2

is equivalent to

(+ (* 1 2 (/ 4 2)))

advantages of indent-based syntax

only space is needed to create a nesting structure and only the beginning of lines needs to be marked. the potential for variation in formatting is lower than for alternative tree notations like s-expressions or xml. the same structure notated by different authors, who otherwise tend to invent and use personal formatting styles for brackets, whitespace and nesting, will look more similar, especially without empty lines

formatting

long lines

line wrapping can be done with continued or further increased indent for following lines. naive line wrapping tends to restart wrapped text from the beginning of the line, ignoring indent, and can be more difficult to read.

indent step

two spaces per indentation step is a widespread convention. use of the tab character is also common, which introduces all the complications associated with tab character usage, including the introduction of a second invisible space character and therefore a possible incorrect mix of spaces and tabs, the designation of an extra character for text compression, the necessity for viewer and editor programs to render it and the required configuration of all potential viewer and editor programs to show the tab character with an appropriate and preferred width. tab is usually rendered as 8 spaces tabular aligned to the next equidistant spacing from the beginning of the line, which isnt how people usually want to indent. indent isnt hard to recognise and viewers could display it in users preferred width regardless of the use of space or tab character

some languages that use indent for code structure

  • coffeescript (also interesting is how html can be generated with teacup)

    • spaces separate operators in application position from arguments "operator argument", "operator operator argument", "a b c d" -> "a(b(c(d)))"
    • commas mark arguments "operator argument, argument", also colons mark arguments in different contexts: "operator argument: argument"
    • what follows newline and indent continues the previously lesser indented line
    • "() -> body ..." function bodies list expressions like "begin" in scheme
  • wisp
  • python

other

design ideas for markup languages that use indent.