Now that my 2025 blog host is able to do better code blocks, let me use this as a prompt to quickly write about two features of older programming languages that we don’t see anymore, despite there seemingly being a Cambrian explosion of new compilers (apparently LLVM is making that too easy).
1. Whitespace in identifiers #
I’m German. We know compound nouns. There’s schadenfreude, lit. “harm joy”, spiteful joy about someone else’s pain. There’s weltschmerz, lit. “world pain”, the psychological pain of the real world not being up to par to your mental picture of it. We’re a delightful bunch, aren’t we?
Other people seem to have problems just smashing words together, and
prefer having a visual indicator where one word ends and the other
starts. In regular prose writing, that’s usually a space (“
“) or
hyphen (“-”). For technical convenience, both are rare to non-existant
in contemporary languages. We’ll talk about that later a bit.
So since the early days when computer became powerful that you could actually name the important bits of your code with long enough phrases to matter, people were looking for replacements to those common symbols. Probably the most popular one was the underscore (“_”). I’m old enough to remember it’s original purpose: Using it in the places where you already hammered in other letters onto your paper with a typewriter, to underline the word, the only emphasis we had in these dark days.
Code written in that style can look like this:
app = gtk_application_new ("org.gtk.example", G_APPLICATION_DEFAULT_FLAGS);
g_signal_connect (app, "activate", G_CALLBACK (activate), NULL);
status = g_application_run (G_APPLICATION (app), argc, argv);
g_object_unref (app);
As you can see, this isn’t just about separating nouns, but also
prefixes that tell you what library you’re using (in this case
gtk_
for the GUI toolkit “GTK”, and g_
for the generic helper
library “glib”), and verbs like “run” or “new” that represent actions
taken.
Now this whole article is about technical limitations, and one of the odder one was that a lot of systems didn’t have underscores! We’re accustomed to the luxury of easily being able to include all variety of letters into our text, including e.g. Japanese (“スパム”) or Georgian (“ფუმფულა”). But in the 50s and 60s, you were happy when you had distinct upper- and lowercase letters. The famous ASCII standard helped a lot here, giving common ground for storing and exchanging texts in the Western world. But it had one flaw that affects us here.
At its 95th position, the ASCII we all know, and the one used to write the code above has an underscore. But earlier versions had a left-pointing arrow (“←”)
Now this is great if you’re concerned about writing assignments, and early languages did use it for this.
pippo = 42 // 😤✋
if pippo == 23
pippo ← 42 // ☺️👉
if pippo = 23
But you can’t use it to separate words anymore. And thus for systems where this didn’t work or if you were aiming for “cross-platform” portability, you used the newly invented “camel case”, named for its sudden appearance of “humps” in a word.
if (widget) {
auto *form = QDesignerFormWindowInterface::findFormWindow(widget);
if (form)
form->emitSelectionChanged();
}
Well, we got used to it, apparently, as it’s used now despite the limitation not being a big issue for 50+ years.
Even hyphens aren’t a total lost case. The main problem with it, is that we don’t differentiate between a hyphen and a minus character, so what does the following mean?
foo-bar - tizio-pippo
Are we subtracting tizip-pippo
from foo-bar
, or are we subtracting
the variables foo
, bar
, tizio
and pippo
?
If we force the user to have those mandatory spaces like in the example above, it’s possible to differentiate. But languages where any whitespace (spaces, tabs etc.) is significant are rare. Is hyphenation more important than being free to write mathematical notation the way you want to?
There are languages that allow this, which is why we have a name for that style: kebap-case, because it apparently looks like the words are on a shish kebap skewer. Lisp does it because its doesn’t have regular mathematical notation with the subtraction signs between variables, and Raku has a very complex syntax anyway, so they found ways to distinguish.
As you can see, nature, erm, the programming community finds a way. So it’s not surprising that spaces are a possibility, too, the surprising part is that this was done decades ago and fell out of fashion, never to appear again in the mainstream.
One of the first programming languages that looked recognizable to modern programmers regarding its general structure and syntax was ALGOL-60 (it’s the “ALGOrithmic Language” and was standardized in 1960). And it allowed whitespace in identifiers!
procedure end daily tasks;
integer i;
begin
for i := 1 step 1 until 24 do
end hourly tasks(i);
end;
If I’m not mistaken, the two procedures here would be stored as
enddailytasks
and endhourlytask
, but you could use spaces as you
liked to call them.
I’m sure some of you see one problem immediately: This code uses “end”
both to start those two procedures and to signify the end of a block, so
how does the code know that the end
after until 24 do
doesn’t
immediately close the loop’s body here?
The problem is that I “lied” a bit with the above code. You couldn’t have written it exactly like that. Back in those days, a lot of ALGOL-60 was used for teaching and thus ideally presented in printed texts, where it’s easy to distinguish from parts of and identifier and a keyword, as the latter would’ve been bold:
procedure end daily tasks;
integer i;
begin
for i := 1 step 1 until 24 do
end hourly tasks(i);
end;
But how would you write this on an actual computer (or on punch cards passed to an actual computer)? You were lucky if you had lower case or an underscore, after all, there certainly weren’t separate bold fonts available.
This is the ugly part. ALGOL-60 and some of its successors like ALGOL-68 used something called “stropping” to differentiate keywords. The word derives from “apostrophe”, and that’s exactly what was used. So if you were unlucky with the capabilities of your computer, the code might’ve appeared like this initially:
'PROCEDURE' END DAILY TASKS;
'INTEGER' I;
'BEGIN'
'FOR' I := 1 'STEP' 1 'UNTIL' 24 'DO'
END HOURLY TASKS(I);
'END';
Lot’s of additional typing needed. There were alternative methods, like
prefixing keywords with a dot (e.g. .FOR I := .STEP 1 .UNTIL 24 .DO
),
but they all weren’t exactly beautiful.
Another issue that arose a bit later was searching for identifiers. If I
can write the procedure end daily tasks
or enddailytasks
(or even
endd ailyt asks
), what do I enter after my grep
command or in my
editor?
Hiccups like these combined with the fact that programmers got accustomed to camel-case or underscores probably lead to whitespace being in identifiers as a dead end in programming language design. Given that a lot of search these days is done with “language servers”, maybe it’s time for a renaissance?
2. Inner-procedural refinements #
It’s somewhat likely that you never even heard of this particular feature of some programming languages. Its core idea is that there might be a need for having “callable” blocks of code without the overhead of another function/procedure.
Let’s look at a simple function that uses a stack(semi-pseudocode, not actually tested):
function use_stack() {
// initialize data
var stack = create_new_stack();
var incoming_queue = open_current_inbox();
var dumpster = open_new_dumpster();
// put values into stack
var amount = 0;
do {
var value = incoming_queue.get_some_value();
stack.push(value)
amount = amount + 1;
} while (incoming_queue.can_get_values() == true)
// get values from stack
var i = 0;
for (var i = 0; i < amount; i = i + 1) {
var value = stack.pop();
dumpster.dump(value);
}
}
The function doesn’t make much sense, but at least it isn’t too big. We’re using comments to annotate the logical sections of our code, but this has the disadvantages that it’s more a convention than part of the language, and of course the parts are dispersed in a longer function, “sandwiched” between code and easy to miss, even with syntax highlighting. We’d prefer to have them more as a “table of contents”.
The usual solution here are more functions. Refining code with procedures has been part of structured programming for decades now, and there are some people who have really strict and harsh requirements for function / method length (I’ve heard 5 lines from some consultants/writers!).
But this has some issues. First, if our language doesn’t have nested
functions, we’re polluting the global namespace with
get_values_from_stack()
. Not every function can be made generic enough
to justify that easily.
Let’s say this is Javascript, and we’re allowed to nest as much as we want. How do we treat variables here? Do we access the globals from the outer function or are we passing everything? Sometimes we might not have a choice, but quite often this is yet another level of overhead we’re encumbered with.
Some languages dared to differ here that procedures were all we needed, and added another level here. More textual substitution before execution than actual jumping around in memory locations as functions tend to do. The language where I learned this from is ELAN, used a lot for teaching, but also for the L3 microkernel operating system, where I found a piece of code in the documentation that inspired the above (translated and expanded):
init data;
put values into stack;
get values from stack.
put values into stack:
INT VAR amount :: 0, value;
REP
get (value);
push (value);
amount INCR 1
UNTIL end criterium
END REP.
get values from stack:
INT VAR i;
FOR i FROM 1 UPTO amount REP
pop (value);
put (value)
END REP.
end criterium:
amount > inbox_initial_size;
First of all, let’s note that there’s whitespace in the identifiers here (ELAN allowed this for both procedures and these “refinements”)!
You can access variables declared in other refinements, as basically they’re just a preprocessing step, not code blocks. There’s also no option to pass arguments any other way, so conceptually they’re both easy and cheap.
The main problem here is that this might lead to quite some spaghetti code, where you’re hunting for variable declarations, and don’t have a good picture how the “actual” code looks in the end, it’s our old enemy global mutable state.
Would someone accustomed to this programming style be “immune” to this issue? Are the pros worth this big con?
Given that no contemporary language supports this, I guess no one of us will find out. Is it unpopular because it was largely unknown (ELAN didn’t have a large impact), or because it didn’t make much sense?
I’m only aware of a few ways for anyone to experiment with this:
-
Get an old ELAN interpreter (maybe from the L3 kernel?)
-
Lutz Prechelt wrote a paper (PDF) about this concept that introduced a refinement preprocessor for C. Given that this doesn’t need a lot of knowledge about its target language, it might be usable for something like Javascript, too?
-
The Algol-68 “Genie” interpreter seems to support something similar.
-
Donald Knuth’s literate programming system has some overlap here, where you can extract parts of the code and rearrange it as you want to in your basic code, and then the “weave” part of the WEB system creates the proper structure for e.g. C.
I might experiment a bit with the C-Refine preprocessor in the future. It does “strop” the refinements with a backtick, so there might be some issue with modern Javascript. Maybe I’ll have a C/C++ project in the near future or maybe it’s time to go beyond ASCII…