Lean Build Bootstrapping
Since version 4, Lean is a partially bootstrapped program: most parts of the frontend and compiler are written in Lean itself and thus need to be built before building Lean itself - which is needed to again build those parts. This cycle is broken by using pre-built C files checked into the repository (which ultimately go back to a point where the Lean compiler was not written in Lean) in place of these Lean inputs and then compiling everything in multiple stages up to a fixed point. The build directory is organized in these stages:
stage0/
# Bootstrap binary built from stage0/src/.
# We do not use any other files from this directory in further stages.
bin/lean
stage1/
include/
config.h # config variables used to build `lean` such as used allocator
runtime/lean.h # runtime header, used by extracted C code, uses `config.h`
share/lean/
lean.mk # used by `leanmake`
lib/
lean/**/*.olean # the Lean library (incl. the compiler) compiled by the previous stage's `lean`
temp/**/*.{c,o} # the library extracted to C and compiled by `leanc`
libInit.a libLean.a # static libraries of the Lean library
libleancpp.a # a static library of the C++ sources of Lean
libleanshared.so # a dynamic library including the static libraries above
bin/
lean # the Lean compiler & server, a small executable that calls directly into libleanshared.so
leanc # a wrapper around a C compiler supplying search paths etc
leanmake # a wrapper around `make` supplying the Makefile above
stage2/...
stage3/...
Stage 0 can be viewed as a blackbox since it does not depend on any local
changes and is equivalent to downloading a bootstrapping binary as done in other
compilers. The build for any other stage starts by building the runtime and
standard library from src/
, using the lean
binary from the previous stage in
the latter case, which are then assembled into a new bin/lean
binary.
Each stage can be built by calling make stageN
in the root build folder.
Running just make
will default to stage 1, which is usually sufficient for
testing changes on the test suite or other files outside of the stdlib. However,
it might happen that the stage 1 compiler is not able to load its own stdlib,
e.g. when changing the .olean format: the stage 1 stdlib will use the format
generated by the stage 0 compiler, but the stage 1 compiler will expect the new
format. In this case, we should continue with building and testing stage 2
instead, which will both build and expect the new format. Note that this is only
possible because when building a stage's stdlib, we use the previous compiler
but never load the previous stdlib (since everything is prelude
). We can also
use stage 2 to test changes in the compiler or other "meta" parts, i.e. changes
that affect the produced (.olean or .c) code, on the stdlib and compiler itself.
We are not aware of any "meta-meta" parts that influence more than two stages of
compilation, so stage 3 should always be identical to stage 2 and only exists as
a sanity check.
In summary, doing a standard build via make
internally involves these steps:
- compile the
stage0/src
archived sources intostage0/bin/lean
- use it to compile the current library (including your changes) into
stage1/lib
- link that and the current C++ code from
src/
intostage1/bin/lean
You now have a Lean binary and library that include your changes, though their own compilation was not influenced by them, that you can use to test your changes on test programs whose compilation will be influenced by the changes.
Finally, when we want to use new language features in the library, we need to
update the stage 0 compiler, which can be done via make -C stageN update-stage0
.
make update-stage0
without -C
defaults to stage1.
Further Bootstrapping Complications
As written above, changes in meta code in the current stage usually will only affect later stages. This is an issue in two specific cases.
-
For non-builtin meta code such as
notation
s ormacro
s inNotation.lean
, we expect changes to affect the current file and all later files of the same stage immediately, just like outside the stdlib. To ensure this, we need to build the stage using-Dinterpreter.prefer_native=false
- otherwise, when executing a macro, the interpreter would notice that there is already a native symbol available for this function and run it instead of the new IR, but the symbol is from the previous stage!To make matters more complicated, while
false
is a reasonable default incurring only minor overhead (ParserDescr
s and simple macros are cheap to interpret), there are situations where we need to set the option totrue
: when the interpreter is executed from the native code of the previous stage, the type of the value it computes must be identical to/ABI-compatible with the type in the previous stage. For example, if we add a new parameter toMacro
or reorder constructors inParserDescr
, building the stage with the interpreter will likely fail. Thus we need to setinterpreter.prefer_native
totrue
in such cases to "freeze" meta code at their versions in the previous stage; no new meta code should be introduced in this stage. Any further stages (e.g. after anupdate-stage0
) will then need to be compiled with the flag set tofalse
again since they will expect the new signature.For an example, see https://github.com/leanprover/lean4/commit/da4c46370d85add64ef7ca5e7cc4638b62823fbb.
-
For the special case of quotations, it is desirable to have changes in built-in parsers affect them immediately: when the changes in the parser become active in the next stage, macros implemented via quotations should generate syntax trees compatible with the new parser, and quotation patterns in macro and elaborators should be able to match syntax created by the new parser and macros. Since quotations capture the syntax tree structure during execution of the current stage and turn it into code for the next stage, we need to run the current stage's built-in parsers in quotation via the interpreter for this to work. Caveats:
- Since interpreting full parsers is not nearly as cheap and we rarely change
built-in syntax, this needs to be opted in using
-Dinternal.parseQuotWithCurrentStage=true
. - The parser needs to be reachable via an
import
statement, otherwise the version of the previous stage will silently be used. - Only the parser code (
Parser.fn
) is affected; all metadata such as leading tokens is taken from the previous stage.
For an example, see https://github.com/leanprover/lean4/commit/f9dcbbddc48ccab22c7674ba20c5f409823b4cc1#diff-371387aed38bb02bf7761084fd9460e4168ae16d1ffe5de041b47d3ad2d22422 (from before the flag defaulted to
false
). - Since interpreting full parsers is not nearly as cheap and we rarely change
built-in syntax, this needs to be opted in using
To modify either of these flags both for building and editing the stdlib, adjust
the code in stage0/src/stdlib_flags.h
. The flags will automatically be reset
on the next update-stage0
when the file is overwritten with the original
version in src/
.