layout: post
title: Next project, The Pooh programming language
-

{{ page.title }}

8 November 2011

I want a new tinker project, so a scripting language will be created. The project will be presented as a series of articles; the first article will define the language in some detail.

First the name; It will be the ‘Pooh programming language’ ; I think that if there is a Python in honor of Monty, then there should be a Pooh in honor of Pooh, so to speak, this is a long missing detail, so I will strive to correct it ;-)

The ‘Pooh programming language’ will be an Educational programming language its purpose is to introduce kids to programming;

General observations on the subject of (educational) programming languages

For me, the practice of programming mainly consists of building bridges / translations between different concepts. Very often information has to be extracted from one source (be it some textual or binary language, or GUI toolkit that receives input from a real person) also very often some existing logical formalism is used to to access structured storage (SQL comes to mind),

What I like about scripting languages is that they make it easy to create bridges between different concepts, bridges that map the concepts directly into features of the scripting language; into combinations of easy to grasp structures like hashes and dynamic arrays – those can be combined into very elaborate structures.

An example of such a bridge between concepts is an XML parsing tool; the Perl module XML::Simple (the idea has been adapted into other scripting languages like Ruby XML::Simple or Groovy XmlSlurper ) XML is a structured way of expression information; data in this format is magically translated into nested hashes that map tag names to the XML tree contained by that tag, XML attributes are mapped to name value pairs within the hash map that corresponds to a tag.

This way of looking at XML is a simplification of sorts, but it is much easier to work with the data this sway rather than working with the standard DOM API ;

Historical justification – why is a new language needed

In the eighties computers where much simpler; the programming languages available on a home computer would be various types of Basic; Logo and Assembly for serious developers; These languages were easier to master; partly by intent of their design; partly because of the limitation of the computer platform – for example, a mini computer could not have run the whole smalltalk environment or act as a Lisp machine.

Limited computers could run only simple languages – so paradoxically the barrier of entry (into programming) was lower with simple machines.

Also the limitations of the machine meant that one could do meaningful projects (by the standard of the day) with fewer means; When I was in High school, I was given a home computer as present; now it did not have any games on it and for me it was difficult to get them; so I learned how to do my own computer games that went quite well.

What happened during later years? With the advent of Windows, schools switched to teach basic computer literacy skills like using Word and Excel;
Not every household had a PC, so teaching the skill of using a word processor was regarded as a significant preparation for adult life. Also programming might have lost its importants, as some societies thought that they were entering a post industrial age where financial services were more important that productive activity, but that is a digression of sorts.

Also programming languages got more complicated; Serious langauges like C++ are notoriously hard to master; Java still is too complicated and verbose for teaching basic concepts. Scripting languages like Perl, Python and Ruby have lots of features; far too many; even Visual Basic is hard to compare with simple Basic dialects.

Things lacking in old Basic dialects that where added in more powerfull scripting languages are variable scope, hashes, dynamic arrays, references, objects, regular expressions, closures. The designers of Basic probably thought that it is essential to cut all these features out, in order to have a language that is easily taught / simple to acquire; I think it is still a challenge is to create a language that combines these goodies but that still remains simple;

There are still some relatively simple languages left – Squeak; Javascript; Logo. A problem with them is that the only practical usage of these languages is from their own environment, Logo is good for moving ‘turtles’ and playing with words; trie to use it for something else…

So there seems to be a place for some simple scripting language that
makes it easy to introduce programming
is versatile enough to do real / interesting projects on real systems.

For scripting languages its is very important te be versatile; be usable in a multitude of different situations and from varying environments. As observed here

… Smalltalk’s weakness is “at the boundaries:” when you want to try to do some typical unix system maintenance, or interfacing with underlying C libraries, or something similar. As long as you’re staying within the Smalltalk environment, it completely rocks. But it’s definitely painful if you try to reach outside. And it’s especially painful if you want your code to work with different Smalltalks. What Perl got right was making it completely painless to integrate with its environment – In some sense LISP wants to be on a LISP Machine and Smalltalk wants to be in its virtual machine, whereas Perl wants to go out and play with the other kids. The former languages are introverted and Perl is extroverted.

General purpose scripting languages have to cram in a lot of features; often design decissions are made with the aim to simplify the runtime interpreter / execution envionment, these tradeoffs tend to be counter intuitive / hard to explain; Examples of such tradeoffs made by some languages are – function variables are by default global unless declared as of local scope; values can be used before having been assigned a value;
With an educational programming language one should strive to avoid such trade offs.

What should be the user experience

The Pooh language should come with some basic interactive environment – like a REPL where one can edit the program and try out its parts as they are written. The environment must be friendly to bottom up software construction; one should be able to start with low level functionality, try it out and then work upword and use the simple parts to construct more complicated things.

It must be possible to integrate the environment into ‘software laboratories’ suited for a particular purpose. Each ‘software laboratory’ should integrate into a different environment; One such labority can be a tool that implements a two dimentional plane with moving sprite objects; the Pooh language would then be used to script the movement of the sprites, react to collisions or to additional input from the user ; A different laboratory can be aimed at construction three dimensional objects scenes out of simple geometric figures; yet another ‘software laboratory’ can deal with grammars and parsing text with various means

The idea is that programming language becomes embedded into multiple possible spaces; so the program would manipulate concepts from its concept space. I think that an educational programming language also requires that the developer environment for the language can be plugged into different applications.

Language design principles

The design principles for the Pooh programming language – favour code readability over brevity of notation; leave out complex features that can be left out, those that exist for ‘programming in the large’ etc. Do not make tradeoffs that sacrifice ease of use against efficiency; still try to be efficient ; take the best features from other langauges while avoid things that suck.

Short summary of features to steal from other languages

And so it goes that most programming languges (especially scripting languages) are created by means of adapting / stealing features from previous programming languages.

these are a few of my favorite things:

Bash,Scheme: script debugging with set -x modes ; for a small program it is easier to debug by viewing a trace of the program execution, rather than to work with a debugger.
Perl: Strict mode – the compiler checks that a variable/function is actually defined before using it. I think that some level of syntax checking is of benefit ; misspelled identifiers should not create runtime errors – it is easier to catch these errors during compilation.
Function calls with Named parameters en.wikipedia.org/wiki/Named_parameter ; various languages offer named parameters as an additional means of calling a function; I think that this should be the only way to call a function; It is much easier to read code that calls functions with named parameters; alas the notation for function calls is less compact.
Python: the for loop construct / generator functions used in for loops (yield statement)
Javascript: objects system by means of creating hash collections (prototypes)
Lua: like the basic syntax of the language – no major anoyances there; In Lua I don’t like that variable scope in function defaults to global.
Ruby: Qualified identifiers (I don’t like the choice of characters for prefix, but there should be some way of saying that an identifier refers to a class member or to a global variable) ; by default a variable in a function should be of local scope – what is done most frequently should be the shortest form; other options should be more explicit.
Perl/Javascript/Python/Ruby/Lua: Syntax to declare a collection object and populate it with data
Here documents: they should be similar to how Text::Template
Perl module defines template text. Should be possible to treat a string as a Here document (templates)
C++11: The new raw string literal (R"aaaa") that manages to work without any escape characters at all !
Perl: The way that Perlfunc / Perl Functions by lCategory documents standard library functionality in a way similar to the Roget’s thesaurus.

Things that I particularly dislike

Forcing to teach object orientation right from the start, just because everything is an object. I think that teaching object orientation right from the start is confusing (not just me here ).
Ambiguous rules; for example call-by-sharing is ambiguous (which is used in most scripting languages); Ambiguous operators that do stuff depending on he type of operands (example assignment with values and object references); overloaded functions create ambiguity, these all can be powerful and confusing tools.
Overriding + as string concatenation and addition was a really bad thing to do for javascript.
C style syntax; forced use of semicolon to finish a statement ; In particular I don’t like for loop syntax in C. (too many functionality in one line, assignment of index, loop invariant checking and modification of loop index in one line). One could do better with foreach + generator functions ; also things to cut are auto-increment + and shortcut (=) operators; switch statement – all of them can be left out.
Python: indentation has special meaning, just can’t get used to this.
Arrays – for a beginner it is easier to have an array indexed from one onward (rather than zero) – without ways to feature to override this way of indexing; It is easier to reason about things as the first element of a sequence, etc; rather than zeroth case of prove by induction, first case of prove by induction. Basic and Lua do array indexing from one onward (alas one can override this); Python, Perl, Javascript, C, Pascal, Algol all start with 0.
Javascript/Perl/Lua: the var/my keywords; assigning a variable in a function results in a new global unless you have declared the local with the var keyword; I think that this is very counter intuitive to have global scope by default.
Perl,Basic: variable prefix (suffix) for type of variable; the type of a variable should become evident from the way the variable is used.
Perl: References and explicit dereferencing of references
Modules and namespaces, they are necessary for building libraries, but well, could one still possibly do without them – for an introduction to programming that is ?
Exception feature in scripting languages – I think this tool is too heavy; also most scripts (and most programs) only bail out on error and do not have to do complex error recovery sequences.
Operator overloading in scripting languages – keeps you guessing about the meaning of an operator, does not increase readability; Javascript can do without this feature.
I don’t like it when there are many many ways to do a string literal; but HERE documents are great
I don’t like escape sequences in string literals
Identifier names – in many languages identifier names consist of letters, digits and underscores. I think that this is too limiting.
Level of detail in error messages is mostly often awful.
All of them: Hash tables with complex keys – keys that are not strings. Ideally a hash should be able to store any object as key; well Python solves this problem by introducing tuples – read only sequences can be key to a hash; which results in the introduction of yet another entity. Perl does not bother with this problem – if you insert a Array as key, then it just casts its to scalar – takes the length of the array as key that is; not very intuitive either. I think one could do better than that.

Things that I don’t know how to eat

Regular expressions; it seems clear that a scripting language must have some form of regular expression; all was well and clear until they invented Perl6 rules or Perl5 Parse::RecDescent module ; now this features seems to come from snobol patterns
Optional typing ; worth to bother if there is a JIT or compilation to native code.
Perl: ties ; most scripting languages have some way to override certain aspects.
Perl has ties ;
Lua has metatables ;
Python and Ruby have per class overrides, in python you do methods ;
Currently Javascript does not have any of these features, and seems to do fine without them, for what it does.

Plain problems

Shells, Perl: Separate comparison operators for numeric and string values. Perl inherits separate string comparison operators from the UNIX shells. In Lua and Javascript one does not have them – here if a string is compared with a number, then the number is treated as a string; Unfortunately it becomes much more difficult to infer the type of variables this way; also numeric and string comparison is not the same !
Same problem goes for arithmetic operators ; what should happen when adding a string to an number;

General Terms / General requirements.

This section spells out the general requirements for the Pooh programming language in more detail.

(GR1) Language must serve an educational purpose / be suited for the purpose of introducing kids to
programming; it is a procedural language; that means that its has assignments.

(GR2) The syntax may not force the user to perform repetitive tasks over and over again ; For example it may not require the user to put in semicolon delimiter between statements – that’s an evil; or it should not require the user to declare variables / types of variables.

I think that C and languages with a syntax similar to C are too terse and cryptic for beginners. On the other hand it should be a bit less verbose than Pascal.

(GR3) It must be possible to read the code – meaning that it is essential to be able to understand what a line of code does; no ‘Under the scene’ actions like nested construction/destruction, operator
overloading, no macros (never), No advanced looping construct that does assign an initial value to index, check the exit condition on index and pass to next iteration in one line (like for in C);

(GR4) There is some support for object orientation; Object orientation via prototypes (like javascript / Self) is supported; one can have objects by means of closures.
Please see oo feature

(GR5) Be fun to use. It is very difficult to quantify this or to give a definition for this concept, but we will try:
1. Encourage data driven programming / it must be possible to declare complex lists / hashes in code / Nesting of hashes and lists;
2. Encourage bottom up software construction / exploratory programming; (That means we need a REPL)
3. Quickest way of debugging should be by means of a built in tracing facility. Here we learn from other languages, like in Korn/Bash shells we have set -x; in Chez Scheme there is also a very strong built in trace facility.
4. If there is a syntax error, it should be clear what it means.

(GR6) Must have a wide field of application; must be able to serve as a scripting language (must not be the most efficient one for that purpose); One should get the impression of learning something tangible.

(GR7) Error message must be very detailed and helpful.

Defining the Pooh language in some more detail

(R1) Language must not have strong typing, this would introduce too many compiler errors; the language should

(R2) functions are ‘first class’ – meaning that we have anonymous functions, and
functions can be returned as return values, etc.

(R3) A value may have one of the following types
scalar, with numeric value type
scalar, with string value type
scalar, with lambda value (reference to function)
dynamic array
hash

(R4) Variable binding is created either by assignment (by assigning a value to a variable) or by parameter declaration;

(R3) The following situations should result in compilation error – use of parameter that has not been defined (variables are declared by assigning a value to a name) ; call of function that has not been declared; passing a parameter that is not used by a function. These checks will will be similar to STRICT mode in Perl – these checks are mandatory and cannot be turned off.

(see more in type checking feature

(R4) do not have to define the type of a variable; it will become obvious from how the variable is later used; if a variable is used as an array (a[ 0 ] = 0 ), then a is an array; the same variable name can not be later used as if it were a scalar. Problem: it is not always possible to tell the type of a binding: If you insert something into an array / hash, then later access to this entry is untyped and must be checked at
run time;

General principle: If you can infer the type of something during parsing/compilation then do so, if you can’t then defer type check to run time (‘somewhat’ dynamically typed)

(R5) Semantics of values;

Please see values feature for a detailed discussion of the topic.

(R6) Functions have prototypes; One function can have one prototype (no overloading); The prototype defines the names of the parameters, type of parameters is derived/infered from usage of parameter.

Please see function feature

(R7) Parameter passing / Evaluation strategy.

Please see function feature

(R11) Check function prototype on call: If function is called directly then check number of parameters and if their types match (for example check if scalar is not passed instead of an array); If type is difficult to establish (function reference stored in hash) then defer check to run time; the type of a parameter is determined by how it is used (very simplistic form of type inference)

(R6) Operators should strictly have one meaning; I don’t like operators where the meaning depends on the type operand types.

There are numeric comparison operators: <> , == , > , < , >= , <= ;
Numeric operators + , – , * , / , %
For these operators the operands must be of numeric type; the operand may not be a hash, dynamic array or function type.

There are string comparison operators: ne , eq , gt , lt , le , ge – here both operands must be string values.

String concatenation operator ..
For these concatenation operator, the operands may any typ; a numeric value will be turned to string value implicitly. The values of an array will be printed with spaces between values; hash table will be formated in a simple manner.

(R14) Reflection, will be done as a library, so no need for special syntactic constructs here.

(R12) Co-routines / will have generator functions with yield; *must have for nice iteration *;
more on threads

(R13) Things to cut: exceptions, namespaces, classes (also cuts Generic types, so the innovation of doing without classes :really makes things much more feasible); no macros / no meta programming at all; No blocks as function closures (this feature might add to more concise code, but is very confusing – is it a statement? Is it a function? ) ; no syntax that does many things at once – like for statements in C;

(R15) Also one needs array slices in order to return multiple variables from functions.

(R16) String literals;
Please see string literals

(R17) Keys to hash tables
Please see “complex hash keys”../07/feature-complex-hash-keys.html

{{ page.title }} 8 November 2011