If you want an imprecise (conservative) garbage collector, the GC and the compil...

eslaught · on Aug 3, 2015

The issue is not having the type info available. AOT compilers certainly have that (or at least, enough of it to get the job done).

The issue is that most compilers with precise GC---AOT or otherwise---are built around that assumption from the very beginning. Therefore, they choose representations, algorithms, and optimizations that are amenable to GC, and in particular don't destroy the type info necessary to make GC work.

It is possible in theory to do the same with LLVM, but difficult in practice. Most existing LLVM-based GC'd languages compromise on performance by requiring LLVM to "root" pointers, effectively forcing them to stay on the stack (instead of registers) so that the GC can see them when it needs to. LLVM's optimizations (particularly at the machine-specific level) were previously allowed to do arbitrary things with register-based pointers, which is obviously bad for a GC if no copies of those pointers exist on the stack. It just takes a long time to take a code base that large and untangle the code from such basic assumptions.

This appears to be changing, as the release nodes for LLVM 3.6 indicated. But it will take time. In the mean time languages on LLVM are either going with conservative GCs, or with precise GCs with the performance degradation noted above (e.g. OCaml), or with no GC (e.g. Rust for the time being).

Source: My talk on precise GC for Rust (from 2012): https://github.com/elliottslaughter/rust-gc-talk

mafribe · on Aug 3, 2015

   Most of the precise GCs I know of are in JIT'd or interpreted languages

GC is orthogonal to JITing. OCaml and Haskell are two examples of languages with non-JIT AOT compilers (ocamlopt and ghc, respectively) that do precise GC. I'm sure there are plenty of others.

cfallin · on Aug 3, 2015

Yep, you're right, it is. And I totally forgot about those compiled functional languages. Thanks! I mentioned JITs and interpreters because those are the obvious cases where type information still exists at runtime, but the compiler can emit that metadata too (as I mentioned).

mafribe · on Aug 3, 2015

GCing has nothing to do with a language being functional. All you need is access to the roots and the ability always to determine which data is another pointer. The main 'enemy' of this is the ability to cast integers to pointers (as you may in C/C++).

Interpreters don't necessarily do GC: if the language that the interpreter is written in has GC, then the interpreted language is automatically GCed. If the interpreter is written in a non-GC language you have to manage the interpreted program's memory manually too (which you may do using a GC for the interpreted language). As an aside: this is comparable to evaluation order: if the interpreter is written in a call-by-value language, then the interpreted language is CBV, if the interpreter is written in a lazy language, the interpreted language has lazy evaluation. (Assuming you are not doing something fancy like translation to continuation-passing-style.)

cfallin · on Aug 3, 2015

> GCing has nothing to do with a language being functional

Yep, I didn't say it did! You mentioned Haskell and OCaml as counterexamples. I referred to them as "the compiled functional languages". This isn't a claim that only compiled functional languages do GC.

> Interpreters don't do GC

Sure they do! One example: Ruby (the original interpreter in C). It has a full mark/sweep collector that traverses user data structures. Custom object types implement their own 'mark' routines, so it's fully precise. (I've implemented a C extension that uses this API.)

Another example: often, Java runtimes interpret Java bytecode on the first pass, before JIT'ing, to give fast startup on cold code. The JRE would be written in C or C++ (non-GC'd), and Java is GC'd.

In general an interpreter can add any additional or different semantics it needs to on top of the host language's semantics. I don't think your call-by-value claim is correct either, because function calls in the interpreted program do not map one-to-one to function calls in the interpreter. (The interpreter would generally manage a data structure representing the interpreted program's stack.)

My goodness, this is going off the rails. Sorry to OP for derailing. Just trying to fix things that are Wrong On The Internet...

mafribe · on Aug 3, 2015

   One example: Ruby (the original interpreter in C).

OK, I should have said, and have now modified my original statement to "Interpreters don't necessarily do GC".

    In general an interpreter can add any additional or different semantics

Yes, and that falls under "something fancy"! The naive way of interpreting object-language application by meta-language application transfers the latter's calling discipline to the former.

cfallin · on Aug 3, 2015

> Interpreters don't necessarily do GC

Of course, but no one claimed that interpreters necessarily do GC either, so this statement is not very useful. You keep arguing against claims that aren't there. The only claim ever made in response to OP was that precise GC requires runtime information, and that this can sometimes be easier when it exists anyway (as in JITs and real-world interpreters), but can also be implemented when the compiler produces it AOT. I hope that's clear enough :-)

ufo · on Aug 3, 2015

What about all the ML-family functional languages? JIT compilation is a fairly recent innovation and I don't think its accurate to say that garbage collection is restricted to JIT compilers.

cfallin · on Aug 3, 2015

I never said precise GC (or even GC in general -- see Boehm GC which I cited) is restricted to JIT! Only that this is perhaps the most common case, because type info still exists at runtime. As I mentioned, the compiler can emit the necessary metadata for AOT cases. And I forgot to mention ML and Haskell and friends, sorry.

slrz · on Aug 3, 2015

Go also has a precise garbage collector, by the way.

kragen · on Aug 3, 2015

AOT-compiled Lisp generally retains the type info you need at runtime in one way or another.

A precise garbage collector for a normal-ish implementation of C, assembly, or Forth is potentially very difficult, though.

aidenn0 · on Aug 4, 2015

Even e.g. SBCL which is an AOT lisp runs into the issue if you share the C stack (needed for FFI) with the lisp control stack. On targets with few registers (i.e. not SPARC or Power) it uses a conservative GC that pins any memory pointed to on the control stack, and a precise GC for all heap objects.