For the last year and a half or so, I've been working on a transpiler from C++ through Racket and back to C++, to bring the metaprogramming benefits of a real macro system to C++ and CUDA. The motivation for this has a lot to do with GPU programming and performance portability, but it also allows you to do some pretty cool things as a standalone project too, so I thought I'd share it here. There are still a number of rough edges, mostly related to language coverage, but it already lets you handle a lot of stuff.
Consider the following program, in C++ (with an extended syntax to recognize macro forms beginning with @):
Code:
One of the key problems with C++'s syntax from the perspective of macro authoring is that it's very hard to parse, so the builtin preprocessor limits you to blind text manipulations. This means you can't easily compose blocks of code in the same way you would function calls, because you have to worry about their scopes getting mixed up. So the first thing we do is use (my fork of) clang to turn it into a new form, where we can easily manipulate the syntax tree without worrying about parsing too much:
Code:
Surprise! The code looks awfully LISP-y now. That's because it is! All Racket programs begin with a #lang directive telling them what module should be used for the initial identifier bindings used for macro expansion. The Racket compiler then recursively expands macros until only core forms remain! Our Cxx module essentially binds all of the core syntax of C++ as simple macros to aid in walking the syntax tree, along with @ that binds our special macros we'll using for transforming the C++ code. So eventually our program will turn into something like this:
Code:
And running it will re-emit the displayed string containing our final C++ syntax.
Before we jump ahead, let's look at what goes on with our Loop1d macro:
Code:
Look at the very last line: it produces a for loop with a bunch of values substituted in, that we've extracted from the macro syntax. But it always uses "j" as the loop variable (and introduces a new macro, available when we expand the child code, that we can use to access that value, since the programmer doesn't necessarily know what's going on internally), which seems like it could cause some problems.
Thankfully, unlike C++'s usual preprocessor macros, the Racket macro system isn't just operating on dumb text, each element of a syntax object also stores meta-data keeping tracking of where it came from, so all of those "j"s have a unique identity, even though they share a name. So our Cxx module checks this metadata on each identifier it finds as it writes the expanded program back out as regular old C++, and generates a unique identifier for every identifier introduced by a macro rather than the programmer:
Code:
So when we run the Racket program, we get this well-behaved set of nested loops, as if by magic:
Code:
To see how this can be put to use for even cooler applications, take a look at the examples in the document I linked above, where I walk through some example CUDA code from NVIDIA and rewrite it with Racket macros.
Consider the following program, in C++ (with an extended syntax to recognize macro forms beginning with @):
Code:
int main(int argc, char ** argv) {
@Loop1d(test_loop)[@ I][= 0 + 0][= 0 + argc] {
int j1;
@Loop1d(test_loop)[@ J][= 0][= argc] {
const int j = static_cast<int>(@J);
j1 = static_cast<int>(@I);
if(strlen(*(argv + j1)) > 0)
(*(argv + j1))++;
puts(*(argv + j1));
if(strlen(*(argv + j)) > 0)
(*(argv + j))++;
puts(*(argv + j));
}
}
const char nl = '\n';
puts(&nl);
return 0;
}
One of the key problems with C++'s syntax from the perspective of macro authoring is that it's very hard to parse, so the builtin preprocessor limits you to blind text manipulations. This means you can't easily compose blocks of code in the same way you would function calls, because you have to worry about their scopes getting mixed up. So the first thing we do is use (my fork of) clang to turn it into a new form, where we can easily manipulate the syntax tree without worrying about parsing too much:
Code:
#lang Cxx
(translation-unit
(skeletons:
(Loop1d "Documents/Brown/Proteins/racket-tests/SkelImpls/Loop1d.rkt")
"Documents/Brown/Proteins/racket-tests/test-params.json")
(defun () (int (!)) main ((() (int (!)) argc) (() (char * * (!)) argv))
(block
(@ Loop1d (test_loop) ([@ I][= (+ 0 0)][= (+ 0 argc)])
(block
(def (() (int (!)) j1))
(@ Loop1d (test_loop) ([@ J][= 0][= argc])
(block
(def (() (const int (!)) j = (static_cast (int(!)) (@ J () ()))))
(= j1 (static_cast (int(!)) (@ I () ())))
(if (> (call strlen (* ((+ argv j1)))) 0)
(>++ ((* ((+ argv j1))))))
(call puts (* ((+ argv j1))))
(if (> (call strlen (* ((+ argv j)))) 0)
(>++ ((* ((+ argv j))))))
(call puts (* ((+ argv j))))))))
(def (() (const char (!)) nl = #\u0a))
(call puts (& nl))
(return 0))))
Surprise! The code looks awfully LISP-y now. That's because it is! All Racket programs begin with a #lang directive telling them what module should be used for the initial identifier bindings used for macro expansion. The Racket compiler then recursively expands macros until only core forms remain! Our Cxx module essentially binds all of the core syntax of C++ as simple macros to aid in walking the syntax tree, along with @ that binds our special macros we'll using for transforming the C++ code. So eventually our program will turn into something like this:
Code:
(module anonymous-module Cxx
(#%module-begin
(module configure-runtime '#%kernel
(#%module-begin (#%require racket/runtime-config) (#%app configure '#f)))
(#%app
call-with-values
(lambda ()
(let-values ()
(#%app
map
display
'("/* The expanded C++ code ends up here as a string*/\n"))
(#%app void)))
print-values)))
And running it will re-emit the displayed string containing our final C++ syntax.
Before we jump ahead, let's look at what goes on with our Loop1d macro:
Code:
(define Loop1d
(skeleton-factory
(lambda (params-table) ; This allows the requiring module to pass through important bits of configuration, should they be necessary
(lambda (kind name args child)
(let*-values ([(itr-id itr-macro) (values #'j (macroize (extract-id-arg (car args))))]
[(itr-init itr-final) (as-values extract-expr-arg (cdr args))])
(expand-with-macros
(list itr-macro)
(with-syntax ([itr-id itr-id])
#'(skeleton-factory
(thunk*
(with-syntax ([itr-id (syntax-local-introduce #'itr-id)]) #'itr-id)) #:no-table #t))
(with-syntax
([itr-id itr-id][itr-init itr-init][itr-final itr-final][child child])
#'(for ((def (() (int (!)) itr-id = (itr-init))) (< itr-id (itr-final)) (++< itr-id)) child))))))))
Look at the very last line: it produces a for loop with a bunch of values substituted in, that we've extracted from the macro syntax. But it always uses "j" as the loop variable (and introduces a new macro, available when we expand the child code, that we can use to access that value, since the programmer doesn't necessarily know what's going on internally), which seems like it could cause some problems.
Thankfully, unlike C++'s usual preprocessor macros, the Racket macro system isn't just operating on dumb text, each element of a syntax object also stores meta-data keeping tracking of where it came from, so all of those "j"s have a unique identity, even though they share a name. So our Cxx module checks this metadata on each identifier it finds as it writes the expanded program back out as regular old C++, and generates a unique identifier for every identifier introduced by a macro rather than the programmer:
Code:
(if (syntax-original? (syntax-local-introduce id))
id
(let loop
([btv (generate-temporary id)])
(if (set-member? uniq-table (syntax->datum btv))
(loop (generate-temporary btv))
(begin
(dict-set! bind-table id btv)
(set-add! uniq-table (syntax->datum btv))
btv))))
So when we run the Racket program, we get this well-behaved set of nested loops, as if by magic:
Code:
int main(int argc, char **argv) {
for (int j12 = ((0 + 0)); j12 < (0 + argc); ++j12) {
int j1;
for (int j3 = (0); j3 < argc; ++j3) {
const int j = (static_cast<int>(j3));
j1 = static_cast<int>(j12);
if (strlen(*(argv + j1)) > 0)
(*(argv + j1))++;
puts(*(argv + j1));
if (strlen(*(argv + j)) > 0)
(*(argv + j))++;
puts(*(argv + j));
}
}
const char nl = '\x0a';
puts(&nl);
return 0;
}
To see how this can be put to use for even cooler applications, take a look at the examples in the document I linked above, where I walk through some example CUDA code from NVIDIA and rewrite it with Racket macros.