Domain Specific Languages

2021-08-25

If you know what domain specific languages are then hopefully you're aware of how much easier they can make your life. Tools like awk are so brilliant at what they do that for a sizeable selection of tasks an awk script could be 20 times shorter to write than an equivalent python script while being easier to understand and more performant. If you don't know what they are then surely I've peaked your interest a little.

What is a Domain Specific Language (DSL)?

According to Wikipedia:

A domain-specific language is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language, which is broadly applicable across domains.

Let's consider a general purpose language like python. Python can be used to make networking software like webservers, graphical software like games, computational tools like an interpreter for another language, and pretty much anything else you could think of. Python isn't designed to do any particular thing, it's designed to do anything so we say it is a general-purpose language.

Awk is insanely good at iterating through lines in a string and doing some small tweaks to each line based around column splitting and regex matching, but you'd be mad to try to make a game, webserver or interpreter with it (note that it is Turing complete though). Because of these we say it is specific to a certain domain (range of problems) so it is a domain-specific language.

Some other good examples are markup languages (e.g. HTML), Unix Shell and the arguments to the find program.

Can we use loads of DSLs to make the shell really really good?

My favourite DSL is awk, which is usually embedded in a shell script to help process text line-by-line. There are other DSLs like this (sed, find, jq, shell) but I think we could do a lot better.

Suppose we had loads of DSLs for all sorts of things, and they were designed to be easily connected together. We would need a more advanced shell to connect them together, since the existing shell focuses on passing strings through a fairly linear pipeline of programs which take strings as arguments. The new shell should be good at passing any sort of data between programs, but since the arguments are likely programs for other DSLs it should stay focused on string arguments. This would allow data to be input from the user or files, passed through a network of programs in all sorts of different languages each dealing with the data in a different way, then written to files or output to the screen. This could be considered to be a DSL with a focus on string literals, calling procedures (other programs) and passing data around, meaning potentially it could be simpler than the existing shell as it wouldn't need a lot of what the shell has.

Bonus Ideas

DSL for extracting scalars (maybe unnested maps and arrays) from deeply nested hierarchical data, similar to jq but accepting data in a standardised binary format so it can be used for more than just JSON. Also way simpler than jq, it's not doing any processing of the data, just extracting it so a different better suited DSL can do the processing.
DSL for flow control, since the new shell won't be able to do this. Not 100% on this tho, maybe should still be in the shell.
DSL for mathematical calculations, similar to bc.