Structure of bitcode file

An LLVM bitcode file is organized as a sequence of blocks and data records as shown in Figure 2.1. A ile header is at the beginning of the bitcode file. The first two bytes of the bitcode file are `BC' and the second two bytes contains the application specific magic number.

Figure 2.1

The bitcode file is processed sequentially as a stream of bits by a bitcode parser. Such stream of bits is referred as a bitstream or just a stream in this document.

Blocks in a bitsstream can have the nested blocks called sub-blocks or the data records. Both the blocks and the records start with an abbreviation ID also referred as record ID in this document. It can either be a builtin ID with a special meaning or the one defined in stream itself. There are four builtin abbreviation IDs as shown in Table 2.1.

ID Type Description
0 END_SUBBLOCK Mark the end of the current block
1 ENTER_SUBBLOCK Marks the start of a new block
2 DEFINE_ABBREV Defines a new abbreviation
3 UNABBREV_RECORD Defines an unabbreviated record

Table 2.1

Record ID need to be defined before its usage. Stream defined IDs use the DEFINE_ABBREV records for their definition. This will be explained in Section \ref{sect:defineAbbrev}.

Listing 2.1 shows a simple C code example that will be used for explaining the organisation of the bitcode file.

int main(){
    int a = 2;
    a = a + 5;
    return a;
}

Listing 2.1

An LLVM IR for the sample.c file is generated using the following command.

clang -emit-llvm -c sample.c -o sample.bc

The llvm-bcanalyzer tool can be used to inspect a bitstream which is not human readable otherwise. The following command can help inspecting the sample.bc file generated earlier.

llvm-bcanalyzer -dump sample.bc

Listing 2.2 shows a high level block structure from the output of llvm-bcanalyzer for sample.bc.

<MODULE_BLOCK NumWords=222 BlockCodeSize=>
  ...
   <TYPE_BLOCK_ID...>
  ...
  </TYPE_BLOCK_ID>
  <TRIPLE .../>
  <DATALAYOUT .../>
  ...
  <FUNCTION_BLOCK NumWords=17 BlockCodeSize=4>
    <DECLAREBLOCKS op0=1/>
    <CONSTANTS_BLOCK NumWords=2 BlockCodeSize=4>
      ...
    </CONSTANTS_BLOCK>
    <INST_ALLOCA .../>
    ...
    <INST_RET abbrevid=9 op0=1/>
    ...
  </FUNCTION_BLOCK>
</MODULE\_BLOCK>

Listing 2.2

In the Listing 2.2 a sequence of blocks in the bitcode file could clearly be seen. Here the MODULE_BLOCK wraps all the other blocks. It contains a record TRIPLE for the target triple specification string and a record DATALAYOUT for the datalayout specification. FUNCTION_BLOCK is an example of the nested blocks having a sub-block of type CONSTANTS_BLOCK for listing the constants and has records for representing the instructions for a code shown in the Listing 2.1.

Figure 2.2 shows a logical structure of a bitcode file as a combination of blocks and records. Listing 2.2 shows the content of a bitcode file in XML like format. However, physically a bitcode file is stored simply as a sequence of varied size records as shown in Figure 2.1.

Figure 2.2

Figure 2.2 represents the sample.bc file. Interpretation of the records in a bitcode file gives it the logical structure of nested block and records as shown in the above figure. This interpretation is performed by the bitcode parser.

results matching ""

    No results matching ""