2018-07-02

scheme syntax

over time scheme has been extended with various inconsistent read-syntax forms which became part of the official standard or a quasi-standard in implementations

these read-syntaxes differ from other syntax in that they are not based on round bracket delimited expressions and break the regularity of the fundamental syntax, and that they necessitate extra complexity in the parser, which is usually a non-scheme or at least scheme independent part of the implementation. read-syntax is otherwise of concern with serialisation to avoid additional deserialisation processing. efforts to introduce new read-syntax seem typically guided by an interest to decrease typing effort for specific constructs with at the same time increasing the amount of semantically differrent patterns and permutations, or for giving in to personal habit, or the notion that it would be better if scheme would be more like other popular languages - as if there were not enough alternatives that already share similar non-scheme syntax, or also, that a language need to be made comfortable for users of other-languages or that the current in-language possibilities were insufficient

with syntax, details that may seem like nitpicks matter - single characters are important - a single character can break a program. specific combinations of them carry the semantics, are repeated with everything they encode, read, written and in the mind frequently. imagine for example the difference it would make to have to prefix every variable with a dollar sign

the following specifies a few reductions and renamings in an attempt to simplify things and is still completely compatible and usable with current interpreters

specification

yes, it is that short

removed

#F #T
square-bracket-sexp [ ]
scsh-block-comment #! !#
srfi30-block-comment #| |#
upper-case-symbol
' , #' #, {backtick}
multiple-return-values

renamed

first car

tail cdr

pair cons

pairs cons*

aliased

q quote

qq quasiquote

l lambda

code-comments

#;() the nestable block comments which are already part of the official standard

; line comments

additional interpreter supported syntax

#! hash-bang

modified form

(let (name value) body ...) for single bindings

other

allowing utf8 in identifiers could be problematic because using different alphabets leads to inaccessible knowledge and might support further international separation of people by language. furthermore, apart from cultural conservatism, it does not seem to be a technical progression

symbols and identifiers should always be lowercase. the set of upper case characters is not necessary

a positive and supported extension are hash-comma readers. they are a small addition to the language that allows custom read-syntaxes (which means not using bound identifiers) generically, without adding more and more syntactically relevant special prefix characters and structures which have to be more tediously learnt and deciphered to understand the code

rationale

syntax changes

removed

alternative delimiters for s-expressions like square brackets

the main problems with these are added syntactic noise, superfluousness and repeated bracket type alteration

they are a complication for reading and learning

they have the exact same meaning as (), but still require additional processing from the human reader. a newcomer does not know about them, and will wonder what these square brackets mean. might not even find it out on their own without documentation, even though they mean nothing new. other languages use square brackets for literal array definition, which adds unnecessary confusion

they make reading, understanding and editing more difficult in regards to successive opening or closing delimiters at the beginning or end of expressions. a mix of square and round brackets removes the practicality of adding to or moving round brackets that appear in succession to quickly change nesting of expressions (re-purposing brackets to be paired with other brackets for different expressions). having this freedom is an elegant property from the simplicity of the original syntax. without square brackets, only the count of opening and closing brackets is important to ensure the valid nesting of the range delimited expressions. square brackets add a completely new condition that has to be accounted for: the order and type of brackets. this is all relatively high cost

even considering the additional set of characters and keys that have to be used alternatingly between brackets and the attention necessary to execute that are sufficing reasons against them

it may try to improve on the task of finding the closing delimiter, but fails and actually worsens it:
[(lambda (a b) (+ a b))]
((lambda (a b) (+ a b)))

my guess is that the real reason for adding them is rooted in a mistake, because it goes so much against the general simplicity style of schemes design. or it might be about increasing the number of possible permutations to make the view of code more entertaining on an impractical level

quote syntax

at least discouraged

it sacrifices of the simplicity of dealing with regular bracket list-syntax, where elements matter instead of characters in front of the opening delimiter, and can too easily be replaced by this syntax

the elusive appearance that clutters the code with particularly small, non-alphanumerically cryptic, hard to discern special symbols outside of lists makes it harder to read. usually identifiers at the beginning of read-syntax-lists tend to describe the list contents, like for example (syntax (a b)). #'(a b) does hardly look as clear and helpful

"display" uses (quote) syntax after parsing, which can be confusing. it shows the ambiguity and complication, and it should

short bindings like "q" are as easy to read as any other s-expression, are not much more difficult to write using structural editing, and may be even simpler to manage because of that - at least no extra complexity has to be built into the structural-editing algorithms

the backtick in particular is an odd and indistict invention. sidenote

alternative block comment syntax

the standard-specified syntax with hash, semicolon and round brackets is sufficient and elegant because of its retaining and simplest transformation of fundamental syntax, starting with the # prefix that is known for special read-syntax constructs, followed by a semicolon that is already in use for line comments

it is quickly added to any possibly nested brackets expression. paredit-mode may not be able to handle it, but smartparens-mode is

guile uses #! !# for block comments because it starts like a hash-bang commonly found at the beginning of shell executed scripts, but this still requires the closing part on a separate line for hash-bangs in scripts

uppercase false and true

unnecessary

additions

hash-bang: seems necessary for creating shell executable scripts. this format is the standard for shell executable scripts which are important because they allow scheme programs to be used as simple commands on most systems

the syntax is one line starting with #!

semantic changes

multiple return values

it leads the programmer to think of a low-level optimisation in the form of tedious to-work-with, ambiguity creating syntax

it does not enable a very useful new way of expression because everything could be specified using lists and the basic "pattern matching" that lambda application provides; concepts where a high investment in compiler optimisation is likely done because it is ubiquitous. example, passing "multiple return values" to a procedure and binding to identifiers: (apply (lambda (a b . c) #t) (list 1 2 3 4 5))

doubles the possible syntax and semantics for result value destructuring. you have to learn call-with-values, values, let-values, the new "too few arguments" problems you will be dealing with and repeatedly rewrite one way of passing multiple values into the other, as of course you will still be working with procedures that are well applied with lists

continuation-passing-style could be a better alternative for all cases where multiple-return-values are deemed useful. and it still keeps the many-to-one relation between arguments and result, the simplicity of which is not to be underestimate. anecdote: for example in a automated testing library i wrote, input and expected output arguments are specified alternatingly. input arguments can be lists to specify multiple input arguments. that means input arguments that are lists always have to be wrapped in a list to designate them as a first argument without ambiguity. an analogous complication would have arisen had i implemented the same interpretation for output arguments

the execution time when using multiple return values was 6 times longer when i tested it. now what was the reason for using them again. tested with guile 2, i will test if it is still true with guile 2.2. that it can be slower says something about its implementation complexity

the theoretical performance benefit is relevant when value destructuring happens often, which i have seen in mathematics related algorithms. it is questionable though, if the use of multiple-return-values in existing procedures like partition, span or list-diff+intersection can lead to an appropriate performance benefit

in a case for mrv, what seems missing syntactically is a feature where the values are spliced into the arguments of the standard application form, like so: (proc a b (mv-producer) e). something like "apply-values" could also be useful: (apply-values (l (a b . c) (+ a b)) mv-producer)

renamings

a few names have been changed for increased clarity. it should be the goal of a language to have a consistent naming scheme with regular plain english names, not abbreviations, that make it easy to get what they mean and are not isolating language, and to use new terms only if it is absolutely necessary, and be able to improve

first

"car" and "cdr" are absolutely opaque words. even knowing about the etymology, coming from "contents of the address part of register number" and "contents of the decrement part of register number", does not really help to infer the meaning

the word increases vocabulary while not really adding a distinct meaning, adding ambiguity

"car" is about referencing the first pair element in a list, or the first list element. that is why first is a relevant name

the next best word might be "head", but this could include to mean multiple elements, while "first" is really just about the first element of a pair or list. as a sidenote, the linux command-line tool "head" selects one or multiple elements

"left" and "right" may be even better because they somewhat emphasize two-valuedness, and would avoid the figurative aspect of the following renaming "tail"

the word "first" can be considered short. one could use abbreviations for it, but i would say do not bother, because losing the clarity obtained by using common english words is not worth it

there is "last" in srfi-1 for lists, so "first" is the opposite end

names are usually vague, but we should strive for keeping the vagueness low without having to invent new words. sometimes concepts are so different, a new word is appropriate. but not in this case. appeals to tradition, or arguments like that it supposedly sounds better over the phone, do not cut it

tail

"tail" is already a common name for its result

the name "cdr" has the same problems as "car"

an alternative could be "rest" for "rest of list", but rest has a broader meaning and might lead to confusion easier because of the existence of rest-arguments, and when using the word rest for "rest of something" instead of "rest of list-elements"

as mentioned above for "first", "right" could be another viable name

pair

aside from there being historical or technical explanations for "cons cells" or the like, the name coming from "construct", which is too general, is not about what we specifically do when using cons

we are creating pairs, 2-tuples with a left/first element and a right/last/second element, which as a verb is called "to pair" and is coincidentally also the word for the result

sometimes the word "cons" is used to mean "prepend"

pairs

like "pair" but chains the pairs with their second element

(pairs 1 2 3) is equivalent to (pair 1 (pair 2 3))

l q qq

these are the shortest renamings. because they are only few, and fundamental syntax forms, used very often (read: in almost every code file), and the result is visible literally in the arguments, it should be acceptable to have short, opaque binding names for them. it is in any case better than special chars

lambda is used very often and this as an inlined ad-hoc argument even

examples

'test (q test)
'(a b c) (q (a b c))
(map (lambda (e) (+ 1 e)) mylist)
(map (l (e) (+ 1 e)) mylist)
(list-q a b c)

the "list-q" syntax for quoted lists is included as an example for avoiding additional nesting

it is not easy to add an abbreviation for "unquote" without redefining quasiquote, for example "uq", because the macro definition includes the longer keyword

l is certainly better than using the greek lambda character special symbol, some people do that. a character which is not included on most keyboards, in very few world languages if any, not english and requires utf8 encoding. you write code with the lambda character, somebody opens a code-file of yours, somebody be disappoint

each

this is optional

the original "for-each" might actually be the better name

modified let

for single bindings instead of

let ((testname testvalue))

the following can be used

let (testname testvalue)

i looked, but i have not found any kind of conflict yet. it works well making the code look simpler


tags: syntax sph-scheme