Some compilers use actual complex pattern matching to recognize complex patterns. For instance the optimizer module in TXR Lisp uses pattern matching over the virtual machine instruction sequences.
A nice, compact example of this is at the very bottom, in the function named late-peephole
:
(defun early-peephole (code) (rewrite-case insns code (((mov (t @t1) (d @d1)) (jmp @lab2) @(symbolp @lab1) (mov (t @t1) (t 0)) @lab2 (ifq (t @t1) (t 0) @lab3) . @rest) ^((mov (t ,t1) (d ,d1)) (jmp ,lab3) ,lab1 (mov (t ,t1) (t 0)) ,lab2 ,*rest)) (@else else)))
This looks for an instruction sequence like:
(mov (t 3) (d 9))(jmp :foo):bar(mov (t 3) (t 0)) ;; (t 0) is an immutable register that holds nil:foo(ifq (t 3) (d 9) :xyzzy) ;; if operands eq, keep going else jump :xyzzy...
and rewrites it to:
(mov (t 3) (d 9))(jmp :xyzzy):bar(mov (t 3) (t 0)) ;; (t 0) is an immutable register that holds nil:foo
The idea is that we know (t 3)
and (d 9)
are the same since we moved the latter to the former, so we can just jump to :xyzzy
after that, and nuke the ifq
instruction. (The :foo
label should have been removed also; if it is referenced anywhere other than the (jmp :foo)
instruction that was removed, that would be bad.)
The rewrite-case
macro moves down the list of instructions one by one, testing all the patterns, and doing the rewrites that are possible; it is defined in the same file.
The pattern matching has a lot of power; it can backreference among instructions to match certain registers that have to be the same, and test arbitrary predicates, like that a certain register that is matched must be "dead" (no next use) and whatnot. Some pattern matches in that file test something in one basic block of code, but then also follow a label and test something in a target basic block.
If we have a complex operation that we would like to recognize that has many variants, it can be done in multiple passes. We can recognize smaller numbers of variations in the subexpressions of the whole operation, and rewrite them to a normalized/canonicalized variant. Then later, our larger pattern just matches the normalized variants of the subexpressions.