Regex Code Translation

21 Aug, 2018 - Articles

Making a good compiler can get rather complicated. BNF grammar, tokenisation, abstract syntax tree generation…

If you want a solid and strict compiler, of course you will need to fully consider all of these things, but what if all you need is a tool to help translate bits of code - especially between similar languages like ActionScript and C# for example. Do you really need to waste so much effort making a program to build up a syntax tree just to break it down again when you could have just carried over sections verbatim because of the similarity of the languages?

I learnt the hard way that making a compiler to help translate some bits of code is way too much effort for the results. In the time you make a proper compiler, you will probably have been able to translate all the code using a simpler system such as RgxC AND have enough time spare to translate all the code by hand as well just because.

Also: if you had an idea like “why don’t I make a program that looks at the generated bytecode and reverse-engineers it into the other language”. Please, no. I’ve been there… just… no. (Slightly traumatised)

Let’s look at code translation how a human might look at it:

  • Break down the code into different sections.
  • Swap around and slightly change some sections here and there depending on some basic rules.
  • Break down the sections themselves into more sections and do the same and so on.

Enter: RgxC

Here’s how to translate with RgxC:

  • Select some text into sections
  • give names to some of the sections you’re interested in
  • you are then free to easily…:
    • further split a section into child-sections
    • replace any selection with any text
    • include the text from other selections in your replacements using substitutions like ${name}
    • (remove sections, re-order sections, etc)

RgxC keeps track of all the frustrating bits behind the scenes when it comes to having sections in text for you - such as updating offsets, indexes, etc so that every character points to the correct character in the source text and so that changes to any of the children selections are successfully propagated back up to the source text.

It also allows you to split the text into sections any way you like, but has great support for Regex. After all, Regex is one of the most intuitive ways of selecting text :)

Now, rather than using traditional methods to build compilers, you can do so using the intuitive use of Regex - which can be as lenient or as strict as you want it to be, as well as be able to target whichever sections you want it to.

Note:

RgxC is currently implemented in C#. Can be found here (it’s a bit messy and WIP though)

Here’s what some code for translation looks like - it speaks for itself, really:

private void TranslateParameters(Selection sel)
{
    foreach (RSelection parameter in sel.Matches(PARAMETER)) // split the selection 'sel' into further selections - 'parameter's
    {
        //parts translation
        parameter.Replace(new Dictionary<string, ReplaceDelegate>
                          {
                              {"type",(Selection input)=>
                              {
                                  return TranslateType(input.Value); //replace the sub-selection 'type' of 'parameter'
                              }},
                          });
        //order translation
        parameter.Replace("${type} ${identifier}"); //using substitution to order around selections
    }
}

Demonstrations

Speed:

EBNF Grammar -> CSharp

ActionScript -> CSharp