The Tersest Operator

Just about every programming language has operators, right?  An operator is a piece of syntax (or a function that looks like a piece of syntax) that operates on one or more terms.  Those terms can themselves be composed out of terms and operators, of course.  The most familiar operators are the arithmetical ones (+, -, *, etc.), but programming languages require other operators for a whole host of other things like function definitions, message passing, declarations, etc.

So what is the tersest operator?  It is the operator that is the easiest to type.  The easiest operator to type is the one you don’t have to type: the operator that isn’t there.  When you stick two terms next to one another with no operator inbetween, that is called concatenation.  Some languages are called Concatenative Languages because they believe their use of concatenation is the most fundamental use.

In a well designed language, the things you type most often tend to be the shortest ones.  So the invisible concatenation operator tends to be the thing you want to do the most often, or at least something that happens commonly.  Hence:

  • In Haskell and family, concatenation is used to apply a function.
  • In C and family, it’s used to declare a name with a type (though it’s parsed weirdly).
  • In Smalltalk and family, putting a word after something sends a message to it (calls a method on it).
  • In algebra, putting two letters next to one another means multiplication.
  • In regular expressions, putting two atoms together does sequential matching.
  • In Perl, putting two terms in a row is a syntax error.

Ironic, I know.  Actually, Perl isn’t that bad, because it has noun markers.  Leaving the ‘&’ sigil off of a function name makes it a prefix operator rather than a term, and subsequent terms are arguments to the function.  Thus function calls are just as terse as in Haskell (but with the opposite precedence and associativity).

So this is one of the questions I am considering regarding my language (which I shall soon have to name).  I am mostly fluctuating between C’s usage and Haskell’s usage.  If I adopt concatenation for function calls, type annotations will require a ‘:’ like in ML.  If I adopt concatenation for type annotations, function calls will require parens.  Let’s contrive some sample code for both:

number_finder s:Str = {
    .match = find_number s
    .value : Int = match.parse
}

number_finder (Str s) = {
    .match = find_number(s)
    Int .value = match.parse
}

(Requisite background info: this is a prototype-based object system.  Members of an object are made public by putting a ‘.’ before them when you declare them.)

Both uses of concatenation have their advantages and disadvantages, but we must also consider the nature of language when deciding.  In Haskell pretty much every operation is a function call.  But in this language, many calculations are performed with method calls instead, and an infix ‘.’ is almost as easy to type as a space.  In addition, there will be less currying than in Haskell.  So using concatenation for function calls won’t gain as much as it does in Haskell.  On the flip side, using it for type annotations won’t gain as much as it does in C, because type annotations aren’t as necessary as they are in C (notice we left it off of .match).

The third alternative, I suppose, is using concatenation for method calls, like in Smalltalk and Self.  However, this is a point at which I believe other linguistic concerns also start affecting the picture.  Having a ‘.’ between method calls creates a feeling of high-precedence cohesion that a space would break.  In addition, it would nullify the sweet syntax of using a prefix ‘.’ in a member declaration to make that member public.  Also, again, ‘.’ is also really easy to type.

What are the non-terseness factors affecting the other choices?  One of the principles of clarity is to make the more important parts appear more prominent.  When you’re declaring a member of an object, the name is usually the most important part, which suggests that value : Int = (...) is more clear than Int value = (...), because it puts the name is first and all the names in the object will line up.  But object members aren’t the only names you’re declaring.  When you declare a function parameter, usually the type is more important than the name (at least, from an outsider’s perspective).  This would suggest that find (Str area, Int num) is more clear than find (area:Str, num:Int).  And you’ll probably leave types off of member declarations more often than function parameters.

A language could conceivably use concatenation for both things, provided all the types are predeclared (so the parser knows whether the left side is a type or a function), but I hate predeclarations.

So what do you think?  I’m leaning toward the C usage, because I think it looks a little cleaner; perhaps I’m just more used to it because I learned C++ before Haskell.  Which reminds me that you shouldn’t underestimate historical conventions either.  What should I use the tersest operator for?  Or should I push the waterbed in a different direction like Perl does?

Go-Style Interfaces in C++

So, you’re a C++ programmer, and you’ve heard of Go.  You really like the idea of Go’s automatic interface satisfaction, and you are green with Go-envy.

What you don’t know is you can write those in C++ too.  Yeah, really.  C++’s type system is nutso.  Let’s see how you use them:

 // Person is an interface type with these methods
 // This function takes one as an argument
void talkabout (Person t) {
    printf("%s says:\n", t.name());
    t.talk();
}

 // No "implements" declaration, no virtual methods
struct Foo {
    int x;
    const char* name () { return "Foo"; }
    void talk () {
        printf("I'm a Foo with a %d!\n", x);
    }
};

 // Watch the magic
int main () {
    Foo x = {24};
    talkabout(&x);

    return 0;
}

So, what output does this produce?

Foo says:
I'm a Foo with a 24!

Nice, huh?  What happens when the type doesn’t satisfy the interface?

struct Nameless {
    void talk () {
        printf("I have no name.");
    }
};

int main () {
    Nameless x;
    talkabout(&x);

    return 0;
}
if3.c++: In instantiation of ‘const Person::implemented_by<Nameless> Person::implemented_by<Nameless>::vtable’:
if3.c++:29:35:   instantiated from ‘Person::Person(T*) [with T = Nameless]’
if3.c++:81:14:   instantiated from here
if3.c++:34:33: error: ‘name’ is not a member of ‘Nameless’

Well, C++’s error messages are always a little weird when you’re doing metaprogramming.  If you’ve done a lot of it, you’ll know that this is a surprisingly sane message, given all the magic that’s happening.  It can’t instantiate Person as implemented by Nameless, because name is not a member of Nameless.

So, what does the interface definition look like?  Yeah, um…

struct Unknown { };

class Person {
     // Vtable
    template <class T>
    struct implemented_by {
        const char* (T::* name ) ();
        void (T::* talk ) ();
        static const implemented_by vtable;
    };
     // Interface struct is two pointers long
    const implemented_by<Unknown>*const vt;
    Unknown*const p;

    public:

     // Methods
    inline const char* name () { return (p->*(vt->name))(); }
    inline void talk () { return (p->*(vt->talk))(); }

     // Conversion
    template <class T>
    Person (T* x) :
        vt(reinterpret_cast<const implemented_by<unknown>*>
            (&implemented_by<T>::vtable)),
        p(reinterpret_cast<Unknown*>(x))
    { }
};
 // Define vtables for all compatible types
template <class T>
const Person::implemented_by<T> Person::implemented_by<T>::vtable = {
    &(T::name),
    &(T::talk)
};

You knew it couldn’t be that perfect, right?  C++ allows you to do all sorts of crazy things with its type system as long as you’re okay with writing the ugliest code you’ve written in your life.  This can’t easily be reduced to macros either.

So it isn’t pretty, but maybe it could be useful in some specific scenarios that don’t happen to be part of any project you’ll ever work on.

How far can this idiom be extended?  Actually, you can make it work with overloaded method names as well; even Go doesn’t support that.  My original reference implementation tested overloaded names, but I decided it was too much information to include in this post.  In addition, although I have not tried it, I believe that with even more hideous code you can get it to detect static attributes of a class and consider those when satisfying the interface, because they share enough syntax with methods.  Unfortunately I cannot think of any way to get the interface to use non-static attributes.  As it happens, you can only cheat so far before you’re caught.

Disclaimer: Please don’t actually use this in any real production code you ever write! When you need interfaces, just inherit from pure virtual classes like everyone else.

So You Want to Design a Programming Language

How do you go about doing that?

And now your language is already doomed to obscurity.  What?  Already?  That’s not very fair!  Let me explain why.

Inspired by someone I know who did design a rather successful programming language, I thought it would be the coolest thing to design an even better language.  Because I’m hubristic like that, I guess.  I’m pretty sure he won’t mind.

But I have realized that a programming language is a tool more than it is a work of art.  Art can be created for its own sake.  Tools must be created for a purpose.  If the single thing you want do is to create a programming language, that will determine the nature of the language you create.  It will probably be a beautiful language.  And it will be useless for any real programming.  So nobody will use it except those few people who are interested in art languages.  Humans are good at working toward a purpose; your end result will reflect your purpose.

Oh, but if you want to create an actually useful language, you don’t have to give up on that.  To do that, you have to find a purpose for it, and pretend that was why you wanted to make the language all along.  Humans are good at pretending.

Humans are also good at becoming what they pretend to be.  In many of my other projects, I’ve had to wrestle with the problems presented to me by the underlying language.  This is natural for a programmer, of course; no tool is perfect in every way.  I’m sure some if not most programmers have dreamed of creating their own language at some time or other.  Most of them are content enough to stick with what they’ve got.  A few get frustrated enough to attempt to create their own language.  A few of those succeed.  I hope to be one of them.

So, what will be the purpose of my programming language?  To be multi-purpose, of course!

Right, right.

In order to make a good tool you need to think of a more concrete goal than that.  I can imagine some wise person saying “In order to design for everyone, you must design for yourself.”  Well, it may not be strictly true, but being my own customer does ensure that my product is useful to at least someone.  As large as this world is, there are probably people with the same needs as me, who could use my language.

So clearly, to be as relevant as possible, I need to be as needy as possible.

So here is the need I have come up with.  It happens to match well with my other programming hobby.  I want to create a video game with my programming language.

But there are already plenty of game-oriented languages out there, right?  Not exactly.  Those other languages like GML or Squirrel or Lua or ActionScript are scripting languages, made for the high-level specification of events that happen in a low-level engine that was written in C++ or something.  I want my language to provide for every part of the system.  Including, hopefully, the design process.  So, to sum up my long-term plans, the language needs to:

  • Wrangle complicated data around at compile-time, like polygon shapes, item specifications, room specifications, tilemaps, etc.
  • Allow for easily writing up stateful actor logic.
  • Connect to C and C++ libraries for OpenGL and a physics engine (if I don’t write the latter myself).
  • Produce an executable that is fast and efficient, and during the main loop performs no dynamic memory allocations, to attain realtime performance (I have this going in C++ and it is pretty fun to work around this constraint)
  • Let me create an IDE that serves as a level editor and image editor, and lets me manipulate various kinds of data as both code and WYSIWYG, using both a CLI and a GUI which is itself modifiable.  Features like copying and pasting data and undoing will be implemented on the language level.

Now that you see it, you can admit that this list is much more daunting than the stated purpose of “game programming” would lead you to believe.  In fact, these requirements are already beyond any other programming language I know of.  Oh hey, just for good measure, let’s add in another insane requirement, though it’s kind of unrelated:

  • The core of the language should be flexible enough to compile to weird things like Javascript.  Not all the core features will be available there, of course (such as file IO and unboxed types), but those that are can be taken advantage of.

How on earth is the language going to do all those things, some of which are completely incompatible?  The keyword is as above: flexibility.  Flexibility in letting the core provide slightly different features in different environments, and in letting the programmer sandbox themselves into a more restricted system with a modified core.  Believe it or not, I have fairly concrete ideas on how all of these things can work together and form an elegant language in the end.  You’ll see more specifics here in the months and years to follow.

You are skeptical of my ambition, I can tell.  I am approaching this as a constraint-solving puzzle, and it is definitely the largest puzzle I have ever taken on in my life.  I have been thinking about it for two whole years already.  Only now am I confident enough in my designs to show some of them off in public.  This is a form of art that I am choosing to dedicate much of my time to.

 

My Plans for this Blog

I need some place to write down all my thoughts and experiments on the subject.  So, here is a programming blog.

The topics I will be discussing include:

  • The new programming language I am inventing
  • Some of the programming tricks I have discovered in various languages
  • Philosophy about language design and programming in general
  • Some of my adventures in game programming

There are many things floating around in my brain that I have wanted to write down.  I can’t write them all at once though, so I guess I’ll get to them as I think of them.  I am inclined to approach blogging as a constraint solving puzzle, and organize all my ideas in the best order possible, like I’m writing a book.  However, I need to not spend several hours writing each post and several days organizing them around.  I simply won’t end up writing anything that way.  So, this process will strain against my perfectionist tendencies.  And hopefully fix them.  A little bit.

So, thanks for reading, よろしくお願いします, and look forward to some strange and interesting programming ideas.

Yes, I write double-spaced sentences.  Yes I have heard all of the arguments, multiple times.  This way is easier to read.  It is not as pretty, arguably, but when writing text, just as when writing code, clear and efficient communication is the most important factor.  Maybe some day I will write a longer post about it.