layout: post
title: Next project, The Pooh programming language
-
8 November 2011
I want a new tinker project, so a scripting language will be created. The project will be presented as a series of articles; the first article will define the language in some detail.
First the name; It will be the ‘Pooh programming language’ ; I think that if there is a Python in honor of Monty, then there should be a Pooh in honor of Pooh, so to speak, this is a long missing detail, so I will strive to correct it ;-)
The ‘Pooh programming language’ will be an Educational programming language its purpose is to introduce kids to programming;
For me, the practice of programming mainly consists of building bridges / translations between different concepts. Very often information has to be extracted from one source (be it some textual or binary language, or GUI toolkit that receives input from a real person) also very often some existing logical formalism is used to to access structured storage (SQL comes to mind),
What I like about scripting languages is that they make it easy to create bridges between different concepts, bridges that map the concepts directly into features of the scripting language; into combinations of easy to grasp structures like hashes and dynamic arrays – those can be combined into very elaborate structures.
An example of such a bridge between concepts is an XML parsing tool; the Perl module XML::Simple (the idea has been adapted into other scripting languages like Ruby XML::Simple or Groovy XmlSlurper ) XML is a structured way of expression information; data in this format is magically translated into nested hashes that map tag names to the XML tree contained by that tag, XML attributes are mapped to name value pairs within the hash map that corresponds to a tag.
This way of looking at XML is a simplification of sorts, but it is much easier to work with the data this sway rather than working with the standard DOM API ;
In the eighties computers where much simpler; the programming languages available on a home computer would be various types of Basic; Logo and Assembly for serious developers; These languages were easier to master; partly by intent of their design; partly because of the limitation of the computer platform – for example, a mini computer could not have run the whole smalltalk environment or act as a Lisp machine.
Limited computers could run only simple languages – so paradoxically the barrier of entry (into programming) was lower with simple machines.
Also the limitations of the machine meant that one could do meaningful projects (by the standard of the day) with fewer means; When I was in High school, I was given a home computer as present; now it did not have any games on it and for me it was difficult to get them; so I learned how to do my own computer games that went quite well.
What happened during later years? With the advent of Windows, schools switched to teach basic computer literacy skills like using Word and Excel;
Not every household had a PC, so teaching the skill of using a word processor was regarded as a significant preparation for adult life. Also programming might have lost its importants, as some societies thought that they were entering a post industrial age where financial services were more important that productive activity, but that is a digression of sorts.
Also programming languages got more complicated; Serious langauges like C++ are notoriously hard to master; Java still is too complicated and verbose for teaching basic concepts. Scripting languages like Perl, Python and Ruby have lots of features; far too many; even Visual Basic is hard to compare with simple Basic dialects.
Things lacking in old Basic dialects that where added in more powerfull scripting languages are variable scope, hashes, dynamic arrays, references, objects, regular expressions, closures. The designers of Basic probably thought that it is essential to cut all these features out, in order to have a language that is easily taught / simple to acquire; I think it is still a challenge is to create a language that combines these goodies but that still remains simple;
There are still some relatively simple languages left – Squeak; Javascript; Logo. A problem with them is that the only practical usage of these languages is from their own environment, Logo is good for moving ‘turtles’ and playing with words; trie to use it for something else…
So there seems to be a place for some simple scripting language that
makes it easy to introduce programming
is versatile enough to do real / interesting projects on real systems.
For scripting languages its is very important te be versatile; be usable in a multitude of different situations and from varying environments. As observed here
… Smalltalk’s weakness is “at the boundaries:” when you want to try to do some typical unix system maintenance, or interfacing with underlying C libraries, or something similar. As long as you’re staying within the Smalltalk environment, it completely rocks. But it’s definitely painful if you try to reach outside. And it’s especially painful if you want your code to work with different Smalltalks. What Perl got right was making it completely painless to integrate with its environment – In some sense LISP wants to be on a LISP Machine and Smalltalk wants to be in its virtual machine, whereas Perl wants to go out and play with the other kids. The former languages are introverted and Perl is extroverted.
General purpose scripting languages have to cram in a lot of features; often design decissions are made with the aim to simplify the runtime interpreter / execution envionment, these tradeoffs tend to be counter intuitive / hard to explain; Examples of such tradeoffs made by some languages are – function variables are by default global unless declared as of local scope; values can be used before having been assigned a value;
With an educational programming language one should strive to avoid such trade offs.
The Pooh language should come with some basic interactive environment – like a REPL where one can edit the program and try out its parts as they are written. The environment must be friendly to bottom up software construction; one should be able to start with low level functionality, try it out and then work upword and use the simple parts to construct more complicated things.
It must be possible to integrate the environment into ‘software laboratories’ suited for a particular purpose. Each ‘software laboratory’ should integrate into a different environment; One such labority can be a tool that implements a two dimentional plane with moving sprite objects; the Pooh language would then be used to script the movement of the sprites, react to collisions or to additional input from the user ; A different laboratory can be aimed at construction three dimensional objects scenes out of simple geometric figures; yet another ‘software laboratory’ can deal with grammars and parsing text with various means
The idea is that programming language becomes embedded into multiple possible spaces; so the program would manipulate concepts from its concept space. I think that an educational programming language also requires that the developer environment for the language can be plugged into different applications.
The design principles for the Pooh programming language – favour code readability over brevity of notation; leave out complex features that can be left out, those that exist for ‘programming in the large’ etc. Do not make tradeoffs that sacrifice ease of use against efficiency; still try to be efficient ; take the best features from other langauges while avoid things that suck.
And so it goes that most programming languges (especially scripting languages) are created by means of adapting / stealing features from previous programming languages.
This section spells out the general requirements for the Pooh programming language in more detail.
(GR1) Language must serve an educational purpose / be suited for the purpose of introducing kids to
programming; it is a procedural language; that means that its has assignments.
(GR2) The syntax may not force the user to perform repetitive tasks over and over again ; For example it may not require the user to put in semicolon delimiter between statements – that’s an evil; or it should not require the user to declare variables / types of variables.
I think that C and languages with a syntax similar to C are too terse and cryptic for beginners. On the other hand it should be a bit less verbose than Pascal.
(GR3) It must be possible to read the code – meaning that it is essential to be able to understand what a line of code does; no ‘Under the scene’ actions like nested construction/destruction, operator
overloading, no macros (never), No advanced looping construct that does assign an initial value to index, check the exit condition on index and pass to next iteration in one line (like for in C);
(GR4) There is some support for object orientation; Object orientation via prototypes (like javascript / Self) is supported; one can have objects by means of closures.
Please see oo feature
(GR5) Be fun to use. It is very difficult to quantify this or to give a definition for this concept, but we will try:
1. Encourage data driven programming / it must be possible to declare complex lists / hashes in code / Nesting of hashes and lists;
2. Encourage bottom up software construction / exploratory programming; (That means we need a REPL)
3. Quickest way of debugging should be by means of a built in tracing facility. Here we learn from other languages, like in Korn/Bash shells we have set -x; in Chez Scheme there is also a very strong built in trace facility.
4. If there is a syntax error, it should be clear what it means.
(GR6) Must have a wide field of application; must be able to serve as a scripting language (must not be the most efficient one for that purpose); One should get the impression of learning something tangible.
(GR7) Error message must be very detailed and helpful.
(R1) Language must not have strong typing, this would introduce too many compiler errors; the language should
(R2) functions are ‘first class’ – meaning that we have anonymous functions, and
functions can be returned as return values, etc.
(R3) A value may have one of the following types
scalar, with numeric value type
scalar, with string value type
scalar, with lambda value (reference to function)
dynamic array
hash
(R4) Variable binding is created either by assignment (by assigning a value to a variable) or by parameter declaration;
(R3) The following situations should result in compilation error – use of parameter that has not been defined (variables are declared by assigning a value to a name) ; call of function that has not been declared; passing a parameter that is not used by a function. These checks will will be similar to STRICT mode in Perl – these checks are mandatory and cannot be turned off.
(see more in type checking feature
(R4) do not have to define the type of a variable; it will become obvious from how the variable is later used; if a variable is used as an array (a[ 0 ] = 0 ), then a is an array; the same variable name can not be later used as if it were a scalar. Problem: it is not always possible to tell the type of a binding: If you insert something into an array / hash, then later access to this entry is untyped and must be checked at
run time;
General principle: If you can infer the type of something during parsing/compilation then do so, if you can’t then defer type check to run time (‘somewhat’ dynamically typed)
(R5) Semantics of values;
Please see values feature for a detailed discussion of the topic.
(R6) Functions have prototypes; One function can have one prototype (no overloading); The prototype defines the names of the parameters, type of parameters is derived/infered from usage of parameter.
Please see function feature
(R7) Parameter passing / Evaluation strategy.
Please see function feature
(R11) Check function prototype on call: If function is called directly then check number of parameters and if their types match (for example check if scalar is not passed instead of an array); If type is difficult to establish (function reference stored in hash) then defer check to run time; the type of a parameter is determined by how it is used (very simplistic form of type inference)
(R6) Operators should strictly have one meaning; I don’t like operators where the meaning depends on the type operand types.
(R14) Reflection, will be done as a library, so no need for special syntactic constructs here.
(R12) Co-routines / will have generator functions with yield; *must have for nice iteration *;
more on threads
(R13) Things to cut: exceptions, namespaces, classes (also cuts Generic types, so the innovation of doing without classes :really makes things much more feasible); no macros / no meta programming at all; No blocks as function closures (this feature might add to more concise code, but is very confusing – is it a statement? Is it a function? ) ; no syntax that does many things at once – like for statements in C;
(R15) Also one needs array slices in order to return multiple variables from functions.
(R16) String literals;
Please see string literals
(R17) Keys to hash tables
Please see “complex hash keys”../07/feature-complex-hash-keys.html