Since my original post, I've taken the time to read LEGO's executable file format spec and the hardware SDK.
While enhanced program logic seems attractive, we really need highly-optimized programs. People have observed the .RXE files created by NXT-G grow extremely large (several KB) even with only a few simple-seeming graphical blocks in the program. I think we need to be able to write programs using language constructs which directly compile to VM opcodes. I think we need an OP_MOV block. I think we need an OP_STRCAT block. I think we should be able to explicitly define clumps, acquire and release our own mutexes, etc. This may be the only way to get an impressively complex program to fit in 32 KB.
I need to make clear, though, that I'm not saying NI did anything wrong in designing NXT-G the way they did. NXT-G needs to be reliable and predictable. It needs to not freak out when some kid tries to experiment with multithreaded programming without realizing the size of the can of worms she's opened. I'm NOT saying these exposed-opcode blocks should make it out into mainstream NXT-G.
OK, so suppose I buy a copy of LabView and the Mindstorms toolkit. Suppose I release a few HUNDRED user-created blocks -- each one representing not only a VM opcode, but a VM opcode in a certain context. (So you use one block for OP_CMP where the operands are TC_ULONG, and a different block for OP_CMP where the operands are TC_ARRAY and the output is boolean, and a different block for OP_CMP where the operands are TC_ARRAY and the output is a TC_ARRAY.)
It would be painful to use -- basically, it'd be like coding in assembler but with drag-and-drop icons -- but this would give us complete control over the contents of the program. Or would it?
I know NXT-G's compiler needs to understand the meaning of any custom blocks we create. That meaning is probably expressed in source code form: we describe the custom block's inputs and dependencies, describe what work it accomplishes, and describes the block's outputs. The NXT-G compiler needs to be able to intelligently mix these custom VM opcode blocks with the instruction blocks shipped with NXT-G. The compiler probably won't be too keen about letting a user create their own clump, run their own instructions, and then call OP_FINCLUMP 0xFFFF, 0xFFFF (pages 32 and 40 of the .RXE spec). If I understand correctly, that would terminate the whole program, even if there were other "threads" which should have finished executing. So the compiler shouldn't let us do that.
So how do we balance our need for simple, optimized RXE files with our desire for convenience and simplicity? Maybe this "exposing VM opcodes" approach is the wrong one. Maybe we need to roll up a LARGE number of alternative implementations for each of the supplied user blocks, so when simpler functionality is called for, the compiler substitutes a much simpler (yet still correct) procedure.
A few years ago I took Programming Languages from the author of HATS: High Assurance Transformation Language.
http://faculty.ist.unomaha.edu/winter/hats-uno/publications/index.html (People complained that class was very difficult, but I got an A somehow.) Our class project was to write an interpreter for a 'toy' C-like language, so I have a basic understanding of how to describe a language with a BNF grammar, write functions to resolve the meaning of each language element (using Standard ML of New Jersey), and then 'evaluate' a program as input. What I learned there could be extended to transforming MINDSTORMS NXT VM bytecode into human-readable language, and should help me understand language design. I might understand enough to BARELY comprehend the answer to the questions I posed above.