If you want an imprecise (conservative) garbage collector, the GC and the compiler are almost disjoint. The GC only needs to know where the heap and stack are, and be able to capture register values at GC time. In other words, it's a runtime-library issue, not a compiler issue. (See the Boehm GC for an example that works with a stock C compiler.)
If you want a precise garbage collector, you need type information to know which heap and stack slots and registers are pointers. This is much harder because your compiler needs to emit metadata for each struct and then keep stack frame info or somesuch at the GC safe points so the runtime knows what the thread's roots are.
Most of the precise GCs I know of are in JIT'd or interpreted languages where you have this type info anyway... AOT-compiled Lisp is the one counterexample I can think of, but usually Lisps solve this in a different way by using tagged pointers (so you know if any heap slot is a pointer by e.g. its LSB).
The issue is not having the type info available. AOT compilers certainly have that (or at least, enough of it to get the job done).
The issue is that most compilers with precise GC---AOT or otherwise---are built around that assumption from the very beginning. Therefore, they choose representations, algorithms, and optimizations that are amenable to GC, and in particular don't destroy the type info necessary to make GC work.
It is possible in theory to do the same with LLVM, but difficult in practice. Most existing LLVM-based GC'd languages compromise on performance by requiring LLVM to "root" pointers, effectively forcing them to stay on the stack (instead of registers) so that the GC can see them when it needs to. LLVM's optimizations (particularly at the machine-specific level) were previously allowed to do arbitrary things with register-based pointers, which is obviously bad for a GC if no copies of those pointers exist on the stack. It just takes a long time to take a code base that large and untangle the code from such basic assumptions.
This appears to be changing, as the release nodes for LLVM 3.6 indicated. But it will take time. In the mean time languages on LLVM are either going with conservative GCs, or with precise GCs with the performance degradation noted above (e.g. OCaml), or with no GC (e.g. Rust for the time being).
Most of the precise GCs I know of are in JIT'd or interpreted languages
GC is orthogonal to JITing. OCaml and Haskell are two examples of languages with non-JIT AOT compilers (ocamlopt and ghc, respectively) that do precise GC. I'm sure there are plenty of others.
Yep, you're right, it is. And I totally forgot about those compiled functional languages. Thanks! I mentioned JITs and interpreters because those are the obvious cases where type information still exists at runtime, but the compiler can emit that metadata too (as I mentioned).
GCing has nothing to do with a language being functional.
All you need is access to the roots and the ability always to determine which data is another pointer. The main 'enemy' of this is the ability to cast integers to pointers (as you may in C/C++).
Interpreters don't necessarily do GC: if the language that the interpreter is written in has GC, then the interpreted language is automatically GCed. If the interpreter is written in a non-GC language you have to manage the interpreted program's memory manually too (which you may do using a GC for the interpreted language).
As an aside: this is comparable to evaluation order: if the interpreter is written in a call-by-value language, then the interpreted language is CBV, if the interpreter is written in a lazy language, the interpreted language has lazy evaluation. (Assuming you are not doing something fancy like translation to continuation-passing-style.)
> GCing has nothing to do with a language being functional
Yep, I didn't say it did! You mentioned Haskell and OCaml as counterexamples. I referred to them as "the compiled functional languages". This isn't a claim that only compiled functional languages do GC.
> Interpreters don't do GC
Sure they do! One example: Ruby (the original interpreter in C). It has a full mark/sweep collector that traverses user data structures. Custom object types implement their own 'mark' routines, so it's fully precise. (I've implemented a C extension that uses this API.)
Another example: often, Java runtimes interpret Java bytecode on the first pass, before JIT'ing, to give fast startup on cold code. The JRE would be written in C or C++ (non-GC'd), and Java is GC'd.
In general an interpreter can add any additional or different semantics it needs to on top of the host language's semantics. I don't think your call-by-value claim is correct either, because function calls in the interpreted program do not map one-to-one to function calls in the interpreter. (The interpreter would generally manage a data structure representing the interpreted program's stack.)
My goodness, this is going off the rails. Sorry to OP for derailing. Just trying to fix things that are Wrong On The Internet...
One example: Ruby (the original interpreter in C).
OK, I should have said, and have now modified my original statement to "Interpreters don't necessarily do GC".
In general an interpreter can add any additional or different semantics
Yes, and that falls under "something fancy"! The naive way of interpreting object-language application by meta-language application transfers the latter's calling discipline to the former.
Of course, but no one claimed that interpreters necessarily do GC either, so this statement is not very useful. You keep arguing against claims that aren't there. The only claim ever made in response to OP was that precise GC requires runtime information, and that this can sometimes be easier when it exists anyway (as in JITs and real-world interpreters), but can also be implemented when the compiler produces it AOT. I hope that's clear enough :-)
What about all the ML-family functional languages? JIT compilation is a fairly recent innovation and I don't think its accurate to say that garbage collection is restricted to JIT compilers.
I never said precise GC (or even GC in general -- see Boehm GC which I cited) is restricted to JIT! Only that this is perhaps the most common case, because type info still exists at runtime. As I mentioned, the compiler can emit the necessary metadata for AOT cases. And I forgot to mention ML and Haskell and friends, sorry.
Even e.g. SBCL which is an AOT lisp runs into the issue if you share the C stack (needed for FFI) with the lisp control stack. On targets with few registers (i.e. not SPARC or Power) it uses a conservative GC that pins any memory pointed to on the control stack, and a precise GC for all heap objects.
If you want a precise garbage collector, you need type information to know which heap and stack slots and registers are pointers. This is much harder because your compiler needs to emit metadata for each struct and then keep stack frame info or somesuch at the GC safe points so the runtime knows what the thread's roots are.
Most of the precise GCs I know of are in JIT'd or interpreted languages where you have this type info anyway... AOT-compiled Lisp is the one counterexample I can think of, but usually Lisps solve this in a different way by using tagged pointers (so you know if any heap slot is a pointer by e.g. its LSB).