MLIR-AIE A-Z | 工具篇 - Albresky's Blog

MLIR-AIE A-Z | 工具篇

Albresky 收录于 AIE MLIR-AIR

2025-02-18 2026-02-07 - 次阅读 - 条评论

Xilinx 开源工具 MLIR-AIR/AIE 的构建与配置

官方维护的文档有点乱，并且存在关键性细节没讲、部分信息 outdated，这里整理一下 Xilinx/MLIR-AIR 的构建与配置过程。

一、前置环境

需确保系统中已安装以下工具：

1
2
3
4
5
6
7
8


cmake 3.20.6
clang/llvm 10+
lld
python 3.8.x
ninja 1.10.0
Xilinx Vitis 2024.2 
libelf
ROCm 5.6

Vitis 工具包含了编译单个 AIE 内核的编译工具 xchesscc，而 aienginev2 库则是一个用于配置整个 AIE 阵列以及 AIR 运行时的驱动库。 xchesscc 工具和 aienginev2 库，二者构成了编译 AIE 运行时二进制的全部工具。

Vitis IDE 内置的 settings64.sh 脚本可能会引起 AIR 本地构建的工具冲突，这里只需要 Vitis/bin 和 aietools 的工具集即可。通过如下脚本将它们加入环境变量：

1
2


export PATH=$PATH:<Vitis_install_path>/Vitis/2024.2/aietools/bin
export PATH=$PATH:<Vitis_install_path>/Vitis/2024.2/bin

MLIR-AIR 的构建依赖于以下仓库：

1.1 从源码编译 `aienginev2`

克隆 stephenneuendorffer/aie-rt 仓库，切换到 phoenix_v2023.2 分支，然后编译 aiengine。

1
2
3
4
5


git clone https://github.com/stephenneuendorffer/aie-rt
cd aie-rt
git checkout phoenix_v2023.2
cd driver/src
make -f Makefile.Linux CFLAGS="-D__AIEAMDAIR__"

将编译好的 libxaie 链接库和头文件拷贝到自定义 AIE 运行时目录下（如 /usr/xilinx/aiengine）：

1
2


sudo cp -r ../include /usr/xilinx/aiengine/
sudo cp libxaiengine.so* /usr/xilinx/aiengine/lib/

并配置环境变量：

1

export LD_LIBRARY_PATH=/usr/xilinx/xaiengine/lib:${LD_LIBRARY_PATH}

二、从源码编译 MLIR-AIR 工具

2.1. 克隆 MLIR-AIR 源码到本地

1
2


git clone https://github.com/Xilinx/mlir-air.git mlir-air
cd mlir-air

2.2. 安装 Python 依赖包

1

source utils/setup_python_packages.sh

2.3. 编译 LLVM

1
2


./utils/clone-llvm.sh
./utils/build-llvm-local.sh llvm

其中，clone-llvm.sh 脚本默认会将 llvm 仓库克隆到 ./llvm 目录下。build-mlir-aie 是否带 local 标签分别用于区分本地编译还是 CI 编译。

2.4. 编译 MLIR-AIE

这一步需要 MLIR-AIE 依赖于之前编译好的 llvm、aiengine，以及 Xilinx/cmakeModules 仓库。

1
2


./utils/clone-mlir-aie.sh
./utils/build-mlir-aie-local.sh llvm mlir-aie/cmake/modulesXilinx /usr/xilinx/xaiengine mlir-aie

其中，clone-mlir-aie.sh 脚本默认会将 mlir-aie 仓库克隆到 ./mlir-aie 目录下。

2.5. 为 MLIR python bindings 打补丁

在克隆的 mlir-aie/utils 目录下创建脚本 patch-py-mlir-extras.sh，内容如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58


#!/bin/bash
set -e

# Convert installation directory to absolute path
INSTALL_DIR=$(realpath "$1")
if [ -z "$INSTALL_DIR" ]; then
    echo "Error: Installation directory not specified"
    echo "Usage: $0 <install_directory>"
    exit 1
fi

# Check if python directory exists
if [ ! -d "${INSTALL_DIR}/python" ]; then
    echo "Error: ${INSTALL_DIR}/python directory does not exist"
    exit 1
fi

# Check if aie package exists
if [ ! -d "${INSTALL_DIR}/python/aie" ]; then
    echo "Error: ${INSTALL_DIR}/python/aie directory does not exist"
    exit 1
fi

# Set MLIR Python package prefix for AIE
export HOST_MLIR_PYTHON_PACKAGE_PREFIX=aie

# Create temp directory for installation
TEMP_DIR=$(mktemp -d)
trap 'rm -rf "$TEMP_DIR"' EXIT

# Create a virtual environment for building
python3 -m venv "${TEMP_DIR}/venv"
source "${TEMP_DIR}/venv/bin/activate"

# Clone mlir-python-extras
git clone https://github.com/makslevental/mlir-python-extras.git "${TEMP_DIR}/src"
cd "${TEMP_DIR}/src"

# Install build dependencies in the virtual environment
pip install -r requirements.txt

# Build the package
python setup.py build

# Only copy the extras directory
cp -r build/lib/aie/extras "${INSTALL_DIR}/python/aie/"

# Deactivate virtual environment
deactivate

# Verify installation
echo "Verifying installation..."
PYTHONPATH="${INSTALL_DIR}/python" python3 -c "
import aie.extras.runtime
print('MLIR extras installation successful')
"

echo "Result: Patching successfully"

然后执行补丁脚本：

1
2


chmod +x utils/patch-py-mlir-extras.sh
./utils/patch-py-mlir-extras.sh <absolute_path_to_install_directory>

脚本最后输出：Result: Patching successfully，则表示 patch 成功。

2.6. 安装 AIR 特供的 ROCm 运行时

克隆 Xilinx/ROCm-air-platforms 仓库：

1

./utils/clone-rocm-air-platforms.sh

为 PCI-e 设备（VCK5000 卡）构建 ROCm 运行时，构建结果和安装目录分别为 ./build-pcie、./install-pcie：

1

./utils/build-mlir-air-pcie.sh llvm/ mlir-aie/cmake/modulesXilinx/ mlir-aie/ /usr/xilinx/xaiengine/ ${ROCM_ROOT}/lib/cmake/hsa-runtime64/ ${ROCM_ROOT}/lib/cmake/hsakmt/

(或) 构建本地的在 sysroot 下的运行时，构建结果和安装目录分别为 ./build、./install：

1

./utils/build-mlir-air.sh $SYSROOT_DIR .llvm/ mlir-aie/cmake/modulesXilinx/ mlir-aie/ ${ROCM_ROOT}/lib/cmake/hsa-runtime64/ ${ROCM_ROOT}/lib/cmake/hsakmt/

2.7. 配置 MLIR-AIR 的环境变量

这里先将 MLIR-AIR 的编译好的 install 目录拷贝到系统工具目录下。编译结果中有三类目录需要用到：

mlir-air/mlir-aie/install: （Ryzen AI 设备）包含了 AIE、AIE2 的头文件、库文件、二进制工具集、Python bindings 等；
mlir-air/install-pcie: 包含了 AIR 针对 PCI-e 设备（VCK5000）的上述文件。
（或）mlir-air/install: 包含了 AIR 在 sysroot 下进行编译的上述文件。

拷贝上述文件夹至 /usr/tools/mlir-air/目录下。

完整的环境变量配置如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35


function air_on() {
  export XILINX_VITIS=/usr/xilinx/Vitis/2024.2
  export XILINX_XAIE=/usr/xilinx/xaiengine
  export XILINX_XAIE_INCLUDE_DIR=$XILINX_XAIE/include
  export XILINX_XAIE_LIB_DIR=$XILINX_XAIE/lib

  export ROCM_ROOT=/usr/tools/rocm
  export MLIR_AIR_DIR=/usr/tools/mlir-air
  export SYSROOT_DIR=/usr/xilinx/petalinux/2021.2/sysroots/cortexa72-cortexa53-xilinx-linux
  
  export PATH=$XILINX_VITIS/bin:$PATH
  export PATH=$XILINX_VITIS/aietools/bin:$PATH
  export PATH=$MLIR_AIR_DIR/install-air/bin:$PATH
  export PATH=$MLIR_AIR_DIR/install-aie/bin:$PATH
  
  export PYTHONPATH=$MLIR_AIR_DIR/install-air/python:$PYTHONPATH
  export PYTHONPATH=$MLIR_AIR_DIR/install-aie/python:$PYTHONPATH

  export LD_LIBRARY_PATH=$XILINX_XAIE_LIB_DIR:$LD_LIBRARY_PATH
  export LD_LIBRARY_PATH=$MLIR_AIR_DIR/install-aie/lib:$LD_LIBRARY_PATH
  export LD_LIBRARY_PATH=$MLIR_AIR_DIR/install-air/lib:$LD_LIBRARY_PATH

  export CPLUS_INCLUDE_PATH="$MLIR_AIR_DIR/install-pcie/runtime_lib/x86_64/airhost/include:$CPLUS_INCLUDE_PATH"
  export CPLUS_INCLUDE_PATH="/opt/rocm/hsa/include:$CPLUS_INCLUDE_PATH"
  export CPLUS_INCLUDE_PATH="$XILINX_XAIE_INCLUDE_DIR:$CPLUS_INCLUDE_PATH"

  source /opt/xilinx/xrt/setup.sh

  # Use venv under sandbox
  AIR_PY_ENV=/home/shikai/myhome/repos/mlir-air-new/sandbox
  if [ ! -d "$AIR_PY_ENV" ]; then
      echo -e "\$AIR_PY_ENV 不存在"
  fi

  source $AIR_PY_ENV/bin/activate

三、验证 MLIR-AIR 工具

这里对刚才编译好的 MLIR-AIR 工具进行测试：

1
2


cd ./test/airhost/40_air_8x4_2d_square
aircc.py -xchesscc -xbridge -row-offset=4 -col-offset=16 ./air.mlir -o ./air.a

编译过程中应该看到如下的进度条：

1
2
3
4
5
6


MLIR compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- 0:00:02 0/1 1 Workerwarning: xxx
MLIR compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- 0:00:02 0/1 1 Worker1 warning generated
MLIR compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- 0:00:02 0/1 1 WorkerWarning in "xxx"
MLIR compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- 0:00:02 0/1 1 WorkerWarning in "xxx"
MLIR compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- 0:00:03 0/1 1 WorkerWarning in "xxx"
AIE Compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:01:22 33/33 4 Workers

编译成功后将产生类似以下的目录结构：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


(sandbox) 40_air_8x4_2d_square$ tree -L 2 .
.
├── air.a
├── air.mlir
├── air_project
│   ├── aie.air.mlir
│   ├── aiecc.graph_0.mlir
│   ├── aie_ctrl.air.mlir
│   ├── aie.graph_0.mlir
│   ├── air.mlir.a
│   ├── air.mlir.graph_0.cpp
│   ├── air.mlir.graph_0.inc
│   ├── air.mlir.graph_0.o
│   ├── air.mlir.ll
│   ├── air.mlir.o
│   ├── air.mlir.opt.bc
│   ├── air.mlir.opt.ll
│   ├── airrt.air.mlir
│   ├── graph_0
│   ├── llvm.air.mlir
│   ├── placed.air.mlir
│   └── refback.air.mlir
├── air_test.cpp
├── graph_0_core_16_4.elf
├── graph_0_core_16_4.elf.calltree
├── graph_0_core_16_4.elf.cmic2
├── graph_0_core_xxxx
├── ...
├── README.md
└── run.lit

四、`aircc` / `aiecc` / `air-runner` 工具帮助

aiecc -help

aiecc 帮助

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86


usage: aiecc [-h] [--version] [--sysroot sysroot] [--tmpdir tmpdir]
             [--verbose] [--vectorize] [--xbridge] [--no-xbridge] [--aiesim]
             [--no-aiesim] [--xchesscc] [--no-xchesscc]
             [--peano PEANO_INSTALL_DIR] [--compile] [--no-compile]
             [--host-target HOST_TARGET] [--compile-host] [--no-compile-host]
             [--link] [--no-link] [--alloc-scheme ALLOC_SCHEME]
             [--generate-ctrl-pkt-overlay] [--dynamic-objFifos]
             [--aie-generate-airbin] [-j NTHREADS] [--profile] [--unified]
             [--no-unified] [-n] [--progress] [--aie-generate-npu]
             [--aie-only-generate-npu] [--npu-insts-name INSTS_NAME]
             [--aie-generate-cdo] [--aie-generate-txn]
             [--aie-generate-ctrlpkt] [--aie-generate-xclbin]
             [--xclbin-input XCLBIN_INPUT] [--link_against_hsa]
             [--xclbin-name XCLBIN_NAME] [--xclbin-kernel-name KERNEL_NAME]
             [--xclbin-instance-name INSTANCE_NAME]
             [--xclbin-kernel-id KERNEL_ID]
             [file] ...

positional arguments:
  file                  MLIR file to compile
  host_args             arguments for host compiler

options:
  -h, --help            show this help message and exit
  --version             Output commit at which the compiler was built and
                        exit.
  --sysroot sysroot     sysroot for cross-compilation
  --tmpdir tmpdir       directory used for temporary file storage
  --verbose, -v         Trace commands as they are executed
  --vectorize           Enable MLIR vectorization
  --xbridge             Link using xbridge
  --no-xbridge          Link using peano
  --aiesim              Generate aiesim Work folder
  --no-aiesim           Do not generate aiesim Work folder
  --xchesscc            Compile using xchesscc
  --no-xchesscc         Compile using peano
  --peano PEANO_INSTALL_DIR
                        Root directory where peano compiler is installed
  --compile             Enable compiling of AIE code
  --no-compile          Disable compiling of AIE code
  --host-target HOST_TARGET
                        Target architecture of the host program
  --compile-host        Enable compiling of the host program
  --no-compile-host     Disable compiling of the host program
  --link                Enable linking of AIE code
  --no-link             Disable linking of AIE code
  --alloc-scheme ALLOC_SCHEME
                        Allocation scheme for AIE buffers: basic-sequential,
                        bank-aware (default).
  --generate-ctrl-pkt-overlay
                        Generate column-wise overlay of control packet
                        routings
  --dynamic-objFifos    Use dynamic object fifos for the for loops
  --aie-generate-airbin
                        Generate airbin configuration (default is off)
  -j NTHREADS           Compile with max n-threads in the machine (default is
                        4). An argument of zero corresponds to the maximum
                        number of threads on the machine.
  --profile             Profile commands to find the most expensive
                        executions.
  --unified             Compile all cores together in a single process
  --no-unified          Compile cores independently in separate processes
  -n                    Disable actually executing any commands.
  --progress            Show progress visualization
  --aie-generate-npu    Generate npu instruction stream
  --aie-only-generate-npu
                        Generate npu instruction stream only
  --npu-insts-name INSTS_NAME
                        Output instructions filename for NPU target
  --aie-generate-cdo    Generate libxaie v2 for CDO
  --aie-generate-txn    Generate txn binary for configuration
  --aie-generate-ctrlpkt
                        Generate control packets for configuration
  --aie-generate-xclbin
                        Generate xclbin
  --xclbin-input XCLBIN_INPUT
                        Generate kernel into existing xclbin file
  --link_against_hsa    Link runtime against ROCm runtime HSA interface
  --xclbin-name XCLBIN_NAME
                        Output xclbin filename for CDO/XCLBIN target
  --xclbin-kernel-name KERNEL_NAME
                        Kernel name in xclbin file
  --xclbin-instance-name INSTANCE_NAME
                        Instance name in xclbin metadata
  --xclbin-kernel-id KERNEL_ID
                        Kernel id in xclbin file

aircc -help

aircc 帮助

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52


usage: aircc [-h] [-o OUTPUT_FILE] [-i INSTS_FILE] [--tmpdir tmpdir] [-v]
             [-row-offset ROW_OFFSET] [-col-offset COL_OFFSET]
             [-num-rows NUM_ROWS] [-num-cols NUM_COLS]
             [-trace-size TRACE_SIZE] [-trace-offset TRACE_OFFSET] [-cc CC]
             [--sysroot sysroot] [--host-target host_target] [--shared]
             [-xbridge] [-xchesscc] [--device target_device]
             [--experimental-passes] [--omit-while-true-loop]
             [--omit-ping-pong-transform]
             air_mlir_file

positional arguments:
  air_mlir_file         AIR Dialect mlir file

options:
  -h, --help            show this help message and exit
  -o OUTPUT_FILE        Output filename
  -i INSTS_FILE         Output insts file name. Only used for compilation on
                        an NPU.
  --tmpdir tmpdir       directory used for temporary file storage
  -v                    Trace commands as they are executed
  -row-offset ROW_OFFSET
                        Default row offset for generated segments
  -col-offset COL_OFFSET
                        Default column offset for generated segments
  -num-rows NUM_ROWS    Default number of rows for generated segments
  -num-cols NUM_COLS    Default number of rows for generated segments
  -trace-size TRACE_SIZE
                        Create packet routed traces for cores and memtiles
  -trace-offset TRACE_OFFSET
                        Trace buffer offset appended to output
  -cc CC                Compiler to use
  --sysroot sysroot     sysroot for cross-compilation
  --host-target host_target
                        Target architecture of the host program
  --shared              Generate a shared library (.so) instead of the default
                        of a static library (.a)
  -xbridge              pass --xbridge to aiecc, otherwise pass --no-xbridge
  -xchesscc             pass --xchesscc to aiecc, otherwise pass --no-xchesscc
  --device target_device
                        Target AIE device
  --experimental-passes
                        Whether to run experimental passes or not. This will
                        only change the behavior for this program for npu
                        devices
  --omit-while-true-loop
                        By default, aircc may output a while(true) loop around
                        per-core logic. If this option is specified, a
                        while(true) loop will not be added.
  --omit-ping-pong-transform
                        Whether to run passes which generate ping-pong
                        buffering patterns or not. This will only change the
                        behavior for this program for npu devices

air-runner –help

air-runner 帮助

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131


OVERVIEW: AIR MLIR Modeling Tool
USAGE: air-runner [options] <input file>

OPTIONS:

Color Options:

  --color                                                    - Use colors in output (default=autodetect)

General options:

  --abort-on-max-devirt-iterations-reached                   - Abort when the max iterations for devirtualization CGSCC repeat pass is reached
  --atomic-counter-update-promoted                           - Do counter update using atomic fetch add  for promoted counters only
  --atomic-first-counter                                     - Use atomic fetch add for first counter in a function (usually the entry counter)
  --bounds-checking-single-trap                              - Use one trap block per function
  --cfg-hide-cold-paths=<number>                             - Hide blocks with relative frequency below the given value
  --cfg-hide-deoptimize-paths                                - 
  --cfg-hide-unreachable-paths                               - 
  --check-functions-filter=<regex>                           - Only emit checks for arguments of functions whose names match the given regular expression
  --conditional-counter-update                               - Do conditional counter updates in single byte counters mode)
  --cost-kind=<value>                                        - Target cost kind
    =throughput                                              -   Reciprocal throughput
    =latency                                                 -   Instruction latency
    =code-size                                               -   Code size
    =size-latency                                            -   Code size and latency
  --debug-info-correlate                                     - Use debug info to correlate profiles. (Deprecated, use -profile-correlate=debug-info)
  --debugify-func-limit=<ulong>                              - Set max number of processed functions per pass.
  --debugify-level=<value>                                   - Kind of debug info to add
    =locations                                               -   Locations only
    =location+variables                                      -   Locations and Variables
  --debugify-quiet                                           - Suppress verbose debugify output
  --disable-auto-upgrade-debug-info                          - Disable autoupgrade of debug info
  --disable-i2p-p2i-opt                                      - Disables inttoptr/ptrtoint roundtrip optimization
  --do-counter-promotion                                     - Do counter register promotion
  --dot-cfg-mssa=<file name for generated dot file>          - file name for generated dot file
  --enable-gvn-hoist                                         - Enable the GVN hoisting pass (default = off)
  --enable-gvn-memdep                                        - 
  --enable-gvn-memoryssa                                     - 
  --enable-gvn-sink                                          - Enable the GVN sinking pass (default = off)
  --enable-jump-table-to-switch                              - Enable JumpTableToSwitch pass (default = off)
  --enable-load-in-loop-pre                                  - 
  --enable-load-pre                                          - 
  --enable-loop-simplifycfg-term-folding                     - 
  --enable-name-compression                                  - Enable name/filename string compression
  --enable-split-backedge-in-load-pre                        - 
  --enable-split-loopiv-heuristic                            - Enable loop iv regalloc heuristic
  --enable-vtable-profile-use                                - If ThinLTO and WPD is enabled and this option is true, vtable profiles will be used by ICP pass for more efficient indirect call sequence. If false, type profiles won't be used.
  --enable-vtable-value-profiling                            - If true, the virtual table address will be instrumented to know the types of a C++ pointer. The information is used in indirect call promotion to do selective vtable-based comparison.
  --expand-variadics-override=<value>                        - Override the behaviour of expand-variadics
    =unspecified                                             -   Use the implementation defaults
    =disable                                                 -   Disable the pass entirely
    =optimize                                                -   Optimise without changing ABI
    =lowering                                                -   Change variadic calling convention
  --experimental-debug-variable-locations                    - Use experimental new value-tracking variable locations
  --experimental-debuginfo-iterators                         - Enable communicating debuginfo positions through iterators, eliminating intrinsics. Has no effect if --preserve-input-debuginfo-format=true.
  -f <function>                                              - top-level function name
  --force-tail-folding-style=<value>                         - Force the tail folding style
    =none                                                    -   Disable tail folding
    =data                                                    -   Create lane mask for data only, using active.lane.mask intrinsic
    =data-without-lane-mask                                  -   Create lane mask with compare/stepvector
    =data-and-control                                        -   Create lane mask using active.lane.mask intrinsic, and use it for both data and control flow
    =data-and-control-without-rt-check                       -   Similar to data-and-control, but remove the runtime check
    =data-with-evl                                           -   Use predicated EVL instructions for tail folding. If EVL is unsupported, fallback to data-without-lane-mask.
  --fs-profile-debug-bw-threshold=<uint>                     - Only show debug message if the source branch weight is greater  than this value.
  --fs-profile-debug-prob-diff-threshold=<uint>              - Only show debug message if the branch probability is greater than this value (in percentage).
  -g <string>                                                - lowest level architectural hierarchy to simulate (pick from herd and core)
  --generate-merged-base-profiles                            - When generating nested context-sensitive profiles, always generate extra base profile for function with all its context profiles merged into it.
  --hash-based-counter-split                                 - Rename counter variable of a comdat function based on cfg hash
  --hot-cold-split                                           - Enable hot-cold splitting pass
  --hwasan-percentile-cutoff-hot=<int>                       - Hot percentile cutoff.
  --hwasan-random-rate=<number>                              - Probability value in the range [0.0, 1.0] to keep instrumentation of a function. Note: instrumentation can be skipped randomly OR because of the hot percentile cutoff, if both are supplied.
  --import-all-index                                         - Import all external functions in index.
  --instcombine-code-sinking                                 - Enable code sinking
  --instcombine-guard-widening-window=<uint>                 - How wide an instruction window to bypass looking for another guard
  --instcombine-max-num-phis=<uint>                          - Maximum number phis to handle in intptr/ptrint folding
  --instcombine-max-sink-users=<uint>                        - Maximum number of undroppable users for instruction sinking
  --instcombine-maxarray-size=<uint>                         - Maximum array size considered when doing a combine
  --instcombine-negator-enabled                              - Should we attempt to sink negations?
  --instcombine-negator-max-depth=<uint>                     - What is the maximal lookup depth when trying to check for viability of negation sinking.
  --instrprof-atomic-counter-update-all                      - Make all profile counter updates atomic (for testing only)
  --internalize-public-api-file=<filename>                   - A file containing list of symbol names to preserve
  --internalize-public-api-list=<list>                       - A list of symbol names to preserve
  --iterative-counter-promotion                              - Allow counter promotion across the whole loop nest.
  --lint-abort-on-error                                      - In the Lint pass, abort on errors.
  --lower-allow-check-percentile-cutoff-hot=<int>            - Hot percentile cutoff.
  --lower-allow-check-random-rate=<number>                   - Probability value in the range [0.0, 1.0] of unconditional pseudo-random checks.
  -m <filename>                                              - json model filename
  --matrix-default-layout=<value>                            - Sets the default matrix layout
    =column-major                                            -   Use column-major layout
    =row-major                                               -   Use row-major layout
  --matrix-print-after-transpose-opt                         - 
  --max-counter-promotions=<int>                             - Max number of allowed counter promotions
  --max-counter-promotions-per-loop=<uint>                   - Max number counter promotions per loop to avoid increasing register pressure too much
  --mir-strip-debugify-only                                  - Should mir-strip-debug only strip debug info from debugified modules by default
  --misexpect-tolerance=<uint>                               - Prevents emitting diagnostics when profile counts are within N% of the threshold..
  --no-discriminators                                        - Disable generation of discriminator information.
  -o <filename>                                              - Output filename
  --object-size-offset-visitor-max-visit-instructions=<uint> - Maximum number of instructions for ObjectSizeOffsetVisitor to look at
  --pgo-block-coverage                                       - Use this option to enable basic block coverage instrumentation
  --pgo-temporal-instrumentation                             - Use this option to enable temporal instrumentation
  --pgo-view-block-coverage-graph                            - Create a dot file of CFGs with block coverage inference information
  --print-pipeline-passes                                    - Print a '-passes' compatible string describing the pipeline (best-effort only).
  --profile-correlate=<value>                                - Use debug info or binary file to correlate profiles.
    =<empty>                                                 -   No profile correlation
    =debug-info                                              -   Use debug info to correlate
    =binary                                                  -   Use binary to correlate
  --runtime-counter-relocation                               - Enable relocating counters at runtime.
  --safepoint-ir-verifier-print-only                         - 
  --sample-profile-check-record-coverage=<N>                 - Emit a warning if less than N% of records in the input profile are matched to the IR.
  --sample-profile-check-sample-coverage=<N>                 - Emit a warning if less than N% of samples in the input profile are matched to the IR.
  --sample-profile-max-propagate-iterations=<uint>           - Maximum number of iterations to go through when propagating sample block/edge weights through the CFG.
  --sampled-instr-burst-duration=<uint>                      - Set the profile instrumentation burst duration, which can range from 1 to the value of 'sampled-instr-period' (0 is invalid). This number of samples will be recorded for each 'sampled-instr-period' count update. Setting to 1 enables simple sampling, in which case it is recommended to set 'sampled-instr-period' to a prime number.
  --sampled-instr-period=<uint>                              - Set the profile instrumentation sample period. A sample period of 0 is invalid. For each sample period, a fixed number of consecutive samples will be recorded. The number is controlled by 'sampled-instr-burst-duration' flag. The default sample period of 65536 is optimized for generating efficient code that leverages unsigned short integer wrapping in overflow, but this is disabled under simple sampling (burst duration = 1).
  --sampled-instrumentation                                  - Do PGO instrumentation sampling
  --skip-ret-exit-block                                      - Suppress counter promotion if exit blocks contain ret.
  --speculative-counter-promotion-max-exiting=<uint>         - The max number of exiting blocks of a loop to allow  speculative counter promotion
  --speculative-counter-promotion-to-loop                    - When the option is false, if the target block is in a loop, the promotion will be disallowed unless the promoted counter  update can be further/iteratively promoted into an acyclic  region.
  --summary-file=<string>                                    - The summary file to use for function importing.
  --type-based-intrinsic-cost                                - Calculate intrinsics cost based only on argument types
  -v                                                         - verbose
  --verify-region-info                                       - Verify region info (time consuming)
  --vp-counters-per-site=<number>                            - The average number of profile counters allocated per value profiling site.
  --vp-static-alloc                                          - Do static counter allocation for value profiler
  --wholeprogramdevirt-cutoff=<uint>                         - Max number of devirtualizations for devirt module pass
  --write-experimental-debuginfo                             - Write debug info in the new non-intrinsic format. Has no effect if --preserve-input-debuginfo-format=true.

Generic Options:

  --help                                                     - Display available options (--help-hidden for more)
  --help-list                                                - Display list of available options (--help-list-hidden for more)
  --version                                                  - Display the version of this program

`air-runner` 隐藏选项

air-runner 隐藏选项

   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727


OVERVIEW: AIR MLIR Modeling Tool
USAGE: air-runner [options] <input file>

OPTIONS:

Color Options:

  --color                                                                    - Use colors in output (default=autodetect)

General options:

  --aa-trace                                                                 - 
  --abort-on-max-devirt-iterations-reached                                   - Abort when the max iterations for devirtualization CGSCC repeat pass is reached
  --adce-remove-control-flow                                                 - 
  --adce-remove-loops                                                        - 
  --addr-sink-combine-base-gv                                                - Allow combining of BaseGV field in Address sinking.
  --addr-sink-combine-base-offs                                              - Allow combining of BaseOffs field in Address sinking.
  --addr-sink-combine-base-reg                                               - Allow combining of BaseReg field in Address sinking.
  --addr-sink-combine-scaled-reg                                             - Allow combining of ScaledReg field in Address sinking.
  --addr-sink-new-phis                                                       - Allow creation of Phis in Address sinking.
  --addr-sink-new-select                                                     - Allow creation of selects in Address sinking.
  --addr-sink-using-gep                                                      - Address sinking in CGP using GEPs.
  --agg-antidep-debugdiv=<int>                                               - Debug control for aggressive anti-dep breaker
  --agg-antidep-debugmod=<int>                                               - Debug control for aggressive anti-dep breaker
  --aggregate-extracted-args                                                 - Aggregate arguments to code-extracted functions
  --aggressive-ext-opt                                                       - Aggressive extension optimization
  --aggressive-instcombine-max-scan-instrs=<uint>                            - Max number of instructions to scan for aggressive instcombine.
  --aggressive-machine-cse                                                   - Override the profitability heuristics for Machine CSE
  --alias-set-saturation-threshold=<uint>                                    - The maximum total number of memory locations alias sets may contain before degradation
  --align-all-blocks=<uint>                                                  - Force the alignment of all blocks in the function in log2 format (e.g 4 means align on 16B boundaries).
  --align-all-functions=<uint>                                               - Force the alignment of all functions in log2 format (e.g. 4 means align on 16B boundaries).
  --align-all-nofallthru-blocks=<uint>                                       - Force the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed). In log2 format (e.g 4 means align on 16B boundaries).
  --allow-incomplete-ir                                                      - Allow incomplete IR on a best effort basis (references to unknown metadata will be dropped)
  --allow-unroll-and-jam                                                     - Allows loops to be unroll-and-jammed.
  --annotate-inline-phase                                                    - If true, annotate inline advisor remarks with LTO and pass information.
  --annotate-sample-profile-inline-phase                                     - Annotate LTO phase (prelink / postlink), or main (no LTO) for sample-profile inline pass name.
  --append-content-hash-outlined-name                                        - This appends the content hash to the globally outlined function name. It's beneficial for enhancing the precision of the stable hash and for ordering the outlined functions.
  --apply-ext-tsp-for-size                                                   - Use ext-tsp for size-aware block placement.
  --arc-opt-max-ptr-states=<uint>                                            - Maximum number of ptr states the optimizer keeps track of
  --asan-always-slow-path                                                    - use instrumentation with slow path for all accesses
  --asan-constructor-kind=<value>                                            - Sets the ASan constructor kind
    =none                                                                    -   No constructors
    =global                                                                  -   Use global constructors
  --asan-debug=<int>                                                         - debug
  --asan-debug-func=<string>                                                 - Debug func
  --asan-debug-max=<int>                                                     - Debug max inst
  --asan-debug-min=<int>                                                     - Debug min inst
  --asan-debug-stack=<int>                                                   - debug stack
  --asan-destructor-kind=<value>                                             - Sets the ASan destructor kind. The default is to use the value provided to the pass constructor
    =none                                                                    -   No destructors
    =global                                                                  -   Use global destructors
  --asan-detect-invalid-pointer-cmp                                          - Instrument <, <=, >, >= with pointer operands
  --asan-detect-invalid-pointer-pair                                         - Instrument <, <=, >, >=, - with pointer operands
  --asan-detect-invalid-pointer-sub                                          - Instrument - operations with pointer operands
  --asan-force-dynamic-shadow                                                - Load shadow address into a local variable for each function
  --asan-force-experiment=<uint>                                             - Force optimization experiment (for testing)
  --asan-globals                                                             - Handle global objects
  --asan-globals-live-support                                                - Use linker features to support dead code stripping of globals
  --asan-guard-against-version-mismatch                                      - Guard against compiler/runtime version mismatch.
  --asan-initialization-order                                                - Handle C++ initializer order
  --asan-instrument-atomics                                                  - instrument atomic instructions (rmw, cmpxchg)
  --asan-instrument-byval                                                    - instrument byval call arguments
  --asan-instrument-dynamic-allocas                                          - instrument dynamic allocas
  --asan-instrument-reads                                                    - instrument read instructions
  --asan-instrument-writes                                                   - instrument write instructions
  --asan-instrumentation-with-call-threshold=<int>                           - If the function being instrumented contains more than this number of memory accesses, use callbacks instead of inline checks (-1 means never use callbacks).
  --asan-kernel                                                              - Enable KernelAddressSanitizer instrumentation
  --asan-kernel-mem-intrinsic-prefix                                         - Use prefix for memory intrinsics in KASAN mode
  --asan-mapping-offset=<ulong>                                              - offset of asan shadow mapping [EXPERIMENTAL]
  --asan-mapping-scale=<int>                                                 - scale of asan shadow mapping
  --asan-max-inline-poisoning-size=<uint>                                    - Inline shadow poisoning for blocks up to the given size in bytes.
  --asan-max-ins-per-bb=<int>                                                - maximal number of instructions to instrument in any given BB
  --asan-memory-access-callback-prefix=<string>                              - Prefix for memory access callbacks
  --asan-opt                                                                 - Optimize instrumentation
  --asan-opt-globals                                                         - Don't instrument scalar globals
  --asan-opt-same-temp                                                       - Instrument the same temp just once
  --asan-opt-stack                                                           - Don't instrument scalar stack variables
  --asan-optimize-callbacks                                                  - Optimize callbacks
  --asan-realign-stack=<uint>                                                - Realign stack to the value of this flag (power of two)
  --asan-recover                                                             - Enable recovery mode (continue-after-error).
  --asan-redzone-byval-args                                                  - Create redzones for byval arguments (extra copy required)
  --asan-skip-promotable-allocas                                             - Do not instrument promotable allocas
  --asan-stack                                                               - Handle stack memory
  --asan-stack-dynamic-alloca                                                - Use dynamic alloca to represent stack variables
  --asan-use-after-return=<value>                                            - Sets the mode of detection for stack-use-after-return.
    =never                                                                   -   Never detect stack use after return.
    =runtime                                                                 -   Detect stack use after return if binary flag 'ASAN_OPTIONS=detect_stack_use_after_return' is set.
    =always                                                                  -   Always detect stack use after return.
  --asan-use-after-scope                                                     - Check stack-use-after-scope
  --asan-use-odr-indicator                                                   - Use odr indicators to improve ODR reporting
  --asan-use-private-alias                                                   - Use private aliases for global variables
  --asan-use-stack-safety                                                    - Use Stack Safety analysis results
  --asan-with-comdat                                                         - Place ASan constructors in comdat sections
  --asan-with-ifunc                                                          - Access dynamic shadow through an ifunc global on platforms that support this
  --asan-with-ifunc-suppress-remat                                           - Suppress rematerialization of dynamic shadow address by passing it through inline asm in prologue.
  --asm-macro-max-nesting-depth=<uint>                                       - The maximum nesting depth allowed for assembly macros.
  --assume-preserve-all                                                      - enable preservation of all attributes. even those that are unlikely to be useful
  --atomic-counter-update-promoted                                           - Do counter update using atomic fetch add  for promoted counters only
  --atomic-first-counter                                                     - Use atomic fetch add for first counter in a function (usually the entry counter)
  --attributor-allow-deep-wrappers                                           - Allow the Attributor to use IP information derived from non-exact functions via cloning
  --attributor-allow-shallow-wrappers                                        - Allow the Attributor to create shallow wrappers for non-exact definitions.
  --attributor-annotate-decl-cs                                              - Annotate call sites of function declarations.
  --attributor-assume-closed-world                                           - Should a closed world be assumed, or not. Default if not set.
  --attributor-depgraph-dot-filename-prefix=<string>                         - The prefix used for the CallGraph dot file names.
  --attributor-dump-dep-graph                                                - Dump the dependency graph to dot files.
  --attributor-enable=<value>                                                - Enable the attributor inter-procedural deduction pass
    =all                                                                     -   enable all attributor runs
    =module                                                                  -   enable module-wide attributor runs
    =cgscc                                                                   -   enable call graph SCC attributor runs
    =none                                                                    -   disable attributor runs
  --attributor-enable-call-site-specific-deduction                           - Allow the Attributor to do call site specific analysis
  --attributor-function-seed-allow-list=<string>                             - Comma separated list of function names that are allowed to be seeded.
  --attributor-manifest-internal                                             - Manifest Attributor internal string attributes.
  --attributor-max-initialization-chain-length=<uint>                        - Maximal number of chained initializations (to avoid stack overflows)
  --attributor-max-iterations=<uint>                                         - Maximal number of fixpoint iterations.
  --attributor-max-potential-values=<uint>                                   - Maximum number of potential values to be tracked for each position.
  --attributor-max-potential-values-iterations=<int>                         - Maximum number of iterations we keep dismantling potential values.
  --attributor-max-specializations-per-call-base=<uint>                      - Maximal number of callees specialized for a call base
  --attributor-print-call-graph                                              - Print Attributor's internal call graph
  --attributor-print-dep                                                     - Print attribute dependencies
  --attributor-seed-allow-list=<string>                                      - Comma separated list of attribute names that are allowed to be seeded.
  --attributor-simplify-all-loads                                            - Try to simplify all loads.
  --attributor-view-dep-graph                                                - View the dependency graph.
  --avail-extern-to-local                                                    - Convert available_externally into locals, renaming them to avoid link-time clashes.
  --available-load-scan-limit=<uint>                                         - Use this to specify the default maximum number of instructions to scan backward from a given instruction, when searching for available loaded value
  --avoid-speculation                                                        - MachineLICM should avoid speculation
  --basic-aa-recphi                                                          - 
  --basic-aa-separate-storage                                                - 
  --bbsections-cold-text-prefix=<string>                                     - The text prefix to use for cold basic block clusters
  --bbsections-detect-source-drift                                           - This checks if there is a fdo instr. profile hash mismatch for this function
  --bbsections-guided-section-prefix                                         - Use the basic-block-sections profile to determine the text section prefix for hot functions. Functions with basic-block-sections profile will be placed in `.text.hot` regardless of their FDO profile info. Other functions won't be impacted, i.e., their prefixes will be decided by FDO/sampleFDO profiles.
  --big-basic-block-instruction-threshold=<uint>                             - The minimum number of instructions a basic block should contain before being considered big.
  --bitcode-flush-threshold=<uint>                                           - The threshold (unit M) for flushing LLVM bitcode.
  --bitcode-mdindex-threshold=<uint>                                         - Number of metadatas above which we emit an index to enable lazy-loading
  --block-freq-ratio-threshold=<uint>                                        - Do not hoist instructions if targetblock is N times hotter than the source.
  --block-placement-exit-block-bias=<uint>                                   - Block frequency percentage a loop exit block needs over the original exit to be considered the new exit.
  --bonus-inst-threshold=<uint>                                              - Control the number of bonus instructions (default = 1)
  --bounds-checking-single-trap                                              - Use one trap block per function
  --branch-fold-placement                                                    - Perform branch folding during placement. Reduces code size.
  --break-anti-dependencies=<string>                                         - Break post-RA scheduling anti-dependencies: "critical", "all", or "none"
  --cache-line-size=<uint>                                                   - Use this to override the target cache line size when specified by the user.
  --call-with-many-arguments-threshold=<uint>                                - The minimum number of arguments a function call must have before it is considered having many arguments.
  --callgraph-dot-filename-prefix=<string>                                   - The prefix used for the CallGraph dot file names.
  --callgraph-heat-colors                                                    - Show heat colors in call-graph
  --callgraph-multigraph                                                     - Show call-multigraph (do not remove parallel edges)
  --callgraph-show-weights                                                   - Show edges labeled with weights
  --callsite-splitting-duplication-threshold=<uint>                          - Only allow instructions before a call, if their cost is below DuplicationThreshold
  --canon-nth-function=<N>                                                   - Function number to canonicalize.
  --capture-tracking-max-uses-to-explore=<uint>                              - Maximal number of uses to explore.
  --cfg-dot-filename-prefix=<string>                                         - The prefix used for the CFG dot file names.
  --cfg-func-name=<string>                                                   - The name of a function (or its substring) whose CFG is viewed/printed.
  --cfg-heat-colors                                                          - Show heat colors in CFG
  --cfg-hide-cold-paths=<number>                                             - Hide blocks with relative frequency below the given value
  --cfg-hide-deoptimize-paths                                                - 
  --cfg-hide-unreachable-paths                                               - 
  --cfg-raw-weights                                                          - Use raw weights for labels. Use percentages as default.
  --cfg-weights                                                              - Show edges labeled with weights
  --cgp-freq-ratio-to-skip-merge=<ulong>                                     - Skip merging empty blocks if (frequency of empty block) / (frequency of destination block) is greater than this ratio
  --cgp-icmp-eq2icmp-st                                                      - Enable ICMP_EQ to ICMP_S(L|G)T conversion.
  --cgp-max-address-users-to-scan=<uint>                                     - Max number of address users to look at
  --cgp-optimize-phi-types                                                   - Enable converting phi types in CodeGenPrepare
  --cgp-split-large-offset-gep                                               - Enable splitting large offset of GEP.
  --cgp-type-promotion-merge                                                 - Enable merging of redundant sexts when one is dominating the other.
  --cgp-verify-bfi-updates                                                   - Enable BFI update verification for CodeGenPrepare.
  --cgpp-huge-func=<uint>                                                    - Least BB number of huge function.
  --cgscc-inline-replay=<filename>                                           - Optimization remarks file containing inline remarks to be replayed by cgscc inlining.
  --cgscc-inline-replay-fallback=<value>                                     - How cgscc inline replay treats sites that don't come from the replay. Original: defers to original advisor, AlwaysInline: inline all sites not in replay, NeverInline: inline no sites not in replay
    =Original                                                                -   All decisions not in replay send to original advisor (default)
    =AlwaysInline                                                            -   All decisions not in replay are inlined
    =NeverInline                                                             -   All decisions not in replay are not inlined
  --cgscc-inline-replay-format=<value>                                       - How cgscc inline replay file is formatted
    =Line                                                                    -   <Line Number>
    =LineColumn                                                              -   <Line Number>:<Column Number>
    =LineDiscriminator                                                       -   <Line Number>.<Discriminator>
    =LineColumnDiscriminator                                                 -   <Line Number>:<Column Number>.<Discriminator> (default)
  --cgscc-inline-replay-scope=<value>                                        - Whether inline replay should be applied to the entire Module or just the Functions (default) that are present as callers in remarks during cgscc inlining.
    =Function                                                                -   Replay on functions that have remarks associated with them (default)
    =Module                                                                  -   Replay on the entire module
  --check-bfi-unknown-block-queries                                          - Check if block frequency is queried for an unknown block for debugging missed BFI updates
  --check-functions-filter=<regex>                                           - Only emit checks for arguments of functions whose names match the given regular expression
  --chr-bias-threshold=<number>                                              - CHR considers a branch bias greater than this ratio as biased
  --chr-dup-threshold=<uint>                                                 - Max number of duplications by CHR for a region
  --chr-function-list=<string>                                               - Specify file to retrieve the list of functions to apply CHR to
  --chr-merge-threshold=<uint>                                               - CHR merges a group of N branches/selects where N >= this value
  --chr-module-list=<string>                                                 - Specify file to retrieve the list of modules to apply CHR to
  --codegen-data-generate                                                    - Emit CodeGen Data into custom sections
  --codegen-data-thinlto-two-rounds                                          - Enable two-round ThinLTO code generation. The first round emits codegen data, while the second round uses the emitted codegen data for further optimizations.
  --codegen-data-use-path=<string>                                           - File path to where .cgdata file is read
  --cold-branch-ratio=<number>                                               - Minimum BranchProbability to consider a region cold.
  --cold-callsite-rel-freq=<int>                                             - Maximum block frequency, expressed as a percentage of caller's entry frequency, for a callsite to be cold in the absence of profile information.
  --cold-new-hint-value=<uint>                                               - Value to pass to hot/cold operator new for cold allocation
  --cold-operand-max-cost-multiplier=<uint>                                  - Maximum cost multiplier of TCC_expensive for the dependence slice of a cold operand to be considered inexpensive.
  --cold-operand-threshold=<uint>                                            - Maximum frequency of path for an operand to be considered cold.
  --coldcc-rel-freq=<int>                                                    - Maximum block frequency, expressed as a percentage of caller's entry frequency, for a call site to be considered cold for enabling coldcc
  --compute-dead                                                             - Compute dead symbols
  --conditional-counter-update                                               - Do conditional counter updates in single byte counters mode)
  --consthoist-gep                                                           - Try hoisting constant gep expressions
  --consthoist-min-num-to-rebase=<uint>                                      - Do not rebase if number of dependent constants of a Base is less than this number.
  --consthoist-with-block-frequency                                          - Enable the use of the block frequency analysis to reduce the chance to execute const materialization more frequently than without hoisting.
  --constraint-elimination-dump-reproducers                                  - Dump IR to reproduce successful transformations.
  --constraint-elimination-max-rows=<uint>                                   - Maximum number of rows to keep in constraint system
  --coro-elide-info-output-file=<filename>                                   - File to record the coroutines got elided
  --cost-kind=<value>                                                        - Target cost kind
    =throughput                                                              -   Reciprocal throughput
    =latency                                                                 -   Instruction latency
    =code-size                                                               -   Code size
    =size-latency                                                            -   Code size and latency
  --costmodel-reduxcost                                                      - Recognize reduction patterns.
  --crash-diagnostics-dir=<directory>                                        - Directory for crash diagnostic files.
  --csuses-threshold=<int>                                                   - Threshold for the size of CSUses
  --ctx-prof-promote-alwaysinline                                            - If using a contextual profile in this module, and an indirect call target is marked as alwaysinline, perform indirect call promotion for that target. If multiple targets for an indirect call site fit this description, they are all promoted.
  --ctx-profile-printer-level=<value>                                        - Verbosity level of the contextual profile printer pass.
    =everything                                                              -   print everything - most verbose
    =yaml                                                                    -   just the yaml representation of the profile
  --cvp-max-functions-per-value=<uint>                                       - The maximum number of functions to track per lattice value
  --da-delinearize                                                           - Try to delinearize array references.
  --da-disable-delinearization-checks                                        - Disable checks that try to statically verify validity of delinearized subscripts. Enabling this option may result in incorrect dependence vectors for languages that allow the subscript of one dimension to underflow or overflow into another dimension.
  --da-miv-max-level-threshold=<uint>                                        - Maximum depth allowed for the recursive algorithm used to explore MIV direction vectors.
  --dag-maps-huge-region=<uint>                                              - The limit to use while constructing the DAG prior to scheduling, at which point a trade-off is made to avoid excessive compile time.
  --dag-maps-reduction-size=<uint>                                           - A huge scheduling region will have maps reduced by this many nodes at a time. Defaults to HugeRegion / 2.
  --dataflow-edge-limit=<uint>                                               - Maximum number of dataflow edges to traverse when evaluating the benefit of commuting operands
  --ddg-pi-blocks                                                            - Create pi-block nodes.
  --ddg-simplify                                                             - Simplify DDG by merging nodes that have less interesting edges.
  --debug                                                                    - Enable debug output
  --debug-ata-coalesce-frags                                                 - 
  --debug-ata-max-blocks=<uint>                                              - Maximum num basic blocks before debug info dropped
  --debug-buffer-size=<uint>                                                 - Buffer the last N characters of debug output until program termination. [default 0 -- immediate print-out]
  -debug-counter                                                             - Comma separated list of debug counter skip and count
    =num-abstract-attributes                                                 -   How many AAs should be initialized
    =attributor-manifest                                                     -   Determine what attributes are manifested in the IR
    =slp-vectorized                                                          -   Controls which SLP graphs should be vectorized.
    =msan-insert-check                                                       -   Controls which checks to insert
    =msan-instrument-instruction                                             -   Controls which instruction to instrument
    =machine-cp-fwd                                                          -   Controls which register COPYs are forwarded
    =conds-eliminated                                                        -   Controls which conditions are eliminated
    =dce-transform                                                           -   Controls which instructions are eliminated
    =dse-memoryssa                                                           -   Controls which MemoryDefs are eliminated.
    =div-rem-pairs-transform                                                 -   Controls transformations in div-rem-pairs pass
    =early-cse                                                               -   Controls which instructions are removed
    =newgvn-vn                                                               -   Controls which instructions are value numbered
    =newgvn-phi                                                              -   Controls which instructions we create phi of ops for
    =partially-inline-libcalls-transform                                     -   Controls transformations in partially-inline-libcalls
    =slsr-counter                                                            -   Controls whether rewriteCandidateWithBasis is executed.
    =instcombine-visit                                                       -   Controls which instructions are visited
    =instcombine-negator                                                     -   Controls Negator transformations in InstCombine pass
    =assume-builder-counter                                                  -   Controls which assumes gets created
    =predicateinfo-rename                                                    -   Controls which variables are renamed with predicateinfo
    =assume-queries-counter                                                  -   Controls which assumes gets created
  --debug-counter-break-on-last                                              - Insert a break point on the last enabled count of a chunks list
  --debug-info-correlate                                                     - Use debug info to correlate profiles. (Deprecated, use -profile-correlate=debug-info)
  --debug-only=<debug string>                                                - Enable a specific type of debug output (comma separated list of types)
  --debug-pass=<value>                                                       - Print legacy PassManager debugging information
    =Disabled                                                                -   disable debug output
    =Arguments                                                               -   print pass arguments to pass to 'opt'
    =Structure                                                               -   print pass structure before run()
    =Executions                                                              -   print pass name before it is executed
    =Details                                                                 -   print pass details when it is executed
  --debugify-and-strip-all-safe                                              - Debugify MIR before and Strip debug after each pass except those known to be unsafe when debug info is present
  --debugify-check-and-strip-all-safe                                        - Debugify MIR before, by checking and stripping the debug info after, each pass except those known to be unsafe when debug info is present
  --debugify-func-limit=<ulong>                                              - Set max number of processed functions per pass.
  --debugify-level=<value>                                                   - Kind of debug info to add
    =locations                                                               -   Locations only
    =location+variables                                                      -   Locations and Variables
  --debugify-quiet                                                           - Suppress verbose debugify output
  --default-gcov-version=<string>                                            - 
  --default-trip-count=<uint>                                                - Use this to specify the default trip count of a loop
  --demote-catchswitch-only                                                  - Demote catchswitch BBs only (for wasm EH)
  --dfa-cost-threshold=<uint>                                                - Maximum cost accepted for the transformation
  --dfa-early-exit-heuristic                                                 - Exit early if an unpredictable value come from the same loop
  --dfa-instr-limit=<uint>                                                   - If present, stops packetizing after N instructions
  --dfa-jump-view-cfg-before                                                 - View the CFG before DFA Jump Threading
  --dfa-max-num-paths=<uint>                                                 - Max number of paths enumerated around a switch
  --dfa-max-num-visited-paths=<uint>                                         - Max number of blocks visited while enumerating paths around a switch
  --dfa-max-path-length=<uint>                                               - Max number of blocks searched to find a threading path
  --dfsan-abilist=<string>                                                   - File listing native ABI functions and how the pass treats them
  --dfsan-combine-offset-labels-on-gep                                       - Combine the label of the offset with the label of the pointer when doing pointer arithmetic.
  --dfsan-combine-pointer-labels-on-load                                     - Combine the label of the pointer with the label of the data when loading from memory.
  --dfsan-combine-pointer-labels-on-store                                    - Combine the label of the pointer with the label of the data when storing in memory.
  --dfsan-combine-taint-lookup-table=<string>                                - When dfsan-combine-offset-labels-on-gep and/or dfsan-combine-pointer-labels-on-load are false, this flag can be used to re-enable combining offset and/or pointer taint when loading specific constant global variables (i.e. lookup tables).
  --dfsan-conditional-callbacks                                              - Insert calls to callback functions on conditionals.
  --dfsan-debug-nonzero-labels                                               - Insert calls to __dfsan_nonzero_label on observing a parameter, load or return with a nonzero label
  --dfsan-event-callbacks                                                    - Insert calls to __dfsan_*_callback functions on data events.
  --dfsan-ignore-personality-routine                                         - If a personality routine is marked uninstrumented from the ABI list, do not create a wrapper for it.
  --dfsan-instrument-with-call-threshold=<int>                               - If the function being instrumented requires more than this number of origin stores, use callbacks instead of inline checks (-1 means never use callbacks).
  --dfsan-preserve-alignment                                                 - respect alignment requirements provided by input IR
  --dfsan-reaches-function-callbacks                                         - Insert calls to callback functions on data reaching a function.
  --dfsan-track-origins=<int>                                                - Track origins of labels
  --dfsan-track-select-control-flow                                          - Propagate labels from condition values of select instructions to results.
  --disable-adv-copy-opt                                                     - Disable advanced copy optimization
  --disable-advanced-peeling                                                 - Disable advance peeling. Issues for convergent targets (D134803).
  --disable-atexit-based-global-dtor-lowering                                - For MachO, disable atexit()-based global destructor lowering
  --disable-auto-upgrade-debug-info                                          - Disable autoupgrade of debug info
  --disable-basic-aa                                                         - 
  --disable-binop-extract-shuffle                                            - Disable binop extract to shuffle transforms
  --disable-bitcode-version-upgrade                                          - Disable automatic bitcode upgrade for version mismatch
  --disable-block-placement                                                  - Disable probability-driven block placement
  --disable-branch-fold                                                      - Disable branch folding
  --disable-cfi-fixup                                                        - Disable the CFI fixup pass
  --disable-cgdata-for-merging                                               - Disable codegen data for function merging. Local merging is still enabled within a module.
  --disable-cgp                                                              - Disable Codegen Prepare
  --disable-cgp-branch-opts                                                  - Disable branch optimizations in CodeGenPrepare
  --disable-cgp-delete-phis                                                  - Disable elimination of dead PHI nodes.
  --disable-cgp-ext-ld-promotion                                             - Disable ext(promotable(ld)) -> promoted(ext(ld)) optimization in CodeGenPrepare
  --disable-cgp-gc-opts                                                      - Disable GC optimizations in CodeGenPrepare
  --disable-cgp-select2branch                                                - Disable select to branch conversion.
  --disable-cgp-store-extract                                                - Disable store(extract) optimizations in CodeGenPrepare
  --disable-check-noreturn-call                                              - 
  --disable-chr                                                              - Disable CHR for all functions
  --disable-cleanups                                                         - Do not remove implausible terminators or other similar cleanups
  --disable-complex-addr-modes                                               - Disables combining addressing modes with different parts in optimizeMemoryInst.
  --disable-constant-hoisting                                                - Disable ConstantHoisting
  --disable-copyprop                                                         - Disable Copy Propagation pass
  --disable-demotion                                                         - Clone multicolor basic blocks but do not demote cross scopes
  --disable-early-ifcvt                                                      - Disable Early If-conversion
  --disable-early-taildup                                                    - Disable pre-register allocation tail duplication
  --disable-expand-reductions                                                - Disable the expand reduction intrinsics pass from running
  --disable-gep-const-evaluation                                             - Disables evaluation of GetElementPtr with constant operands
  --disable-global-outlining                                                 - Disable global outlining only by ignoring the codegen data generation or use
  --disable-hoisting-to-hotter-blocks=<value>                                - Disable hoisting instructions to hotter blocks
    =none                                                                    -   disable the feature
    =pgo                                                                     -   enable the feature when using profile data
    =all                                                                     -   enable the feature with/wo profile data
  --disable-i2p-p2i-opt                                                      - Disables inttoptr/ptrtoint roundtrip optimization
  --disable-icp                                                              - Disable indirect call promotion
  --disable-ifcvt-diamond                                                    - 
  --disable-ifcvt-forked-diamond                                             - 
  --disable-ifcvt-simple                                                     - 
  --disable-ifcvt-simple-false                                               - 
  --disable-ifcvt-triangle                                                   - 
  --disable-ifcvt-triangle-false                                             - 
  --disable-ifcvt-triangle-rev                                               - 
  --disable-interleaved-load-combine                                         - Disable combining of interleaved loads
  --disable-last-run-tracking                                                - Disable last run tracking
  --disable-layout-fsprofile-loader                                          - Disable MIRProfileLoader before BlockPlacement
  --disable-lftr                                                             - Disable Linear Function Test Replace optimization
  --disable-licm-promotion                                                   - Disable memory promotion in LICM pass
  --disable-loop-idiom-vectorize-all                                         - Disable Loop Idiom Vectorize Pass.
  --disable-loop-idiom-vectorize-bytecmp                                     - Proceed with Loop Idiom Vectorize Pass, but do not convert byte-compare loop(s).
  --disable-loop-idiom-vectorize-find-first-byte                             - Do not convert find-first-byte loop(s).
  --disable-loop-level-heuristics                                            - Disable loop-level heuristics.
  --disable-lsr                                                              - Disable Loop Strength Reduction Pass
  --disable-machine-cse                                                      - Disable Machine Common Subexpression Elimination
  --disable-machine-dce                                                      - Disable Machine Dead Code Elimination
  --disable-machine-licm                                                     - Disable Machine LICM
  --disable-machine-sink                                                     - Disable Machine Sinking
  --disable-memop-opt                                                        - Disable optimize
  --disable-mergeicmps                                                       - Disable MergeICmps Pass
  --disable-mr-partial-inlining                                              - Disable multi-region partial inlining
  --disable-nofree-inference                                                 - Stop inferring nofree attribute during function-attrs pass
  --disable-non-allocatable-phys-copy-opt                                    - Disable non-allocatable physical register copy optimization
  --disable-nounwind-inference                                               - Stop inferring nounwind attribute during function-attrs pass
  --disable-ondemand-mds-loading                                             - Force disable the lazy-loading on-demand of metadata when loading bitcode for importing.
  --disable-partial-inlining                                                 - Disable partial inlining
  --disable-partial-libcall-inlining                                         - Disable Partial Libcall Inlining
  --disable-peephole                                                         - Disable the peephole optimizer
  --disable-phi-elim-edge-splitting                                          - Disable critical edge splitting during PHI elimination
  --disable-post-ra                                                          - Disable Post Regalloc Scheduler
  --disable-postra-machine-licm                                              - Disable Machine LICM
  --disable-postra-machine-sink                                              - Disable PostRA Machine Sinking
  --disable-preheader-prot                                                   - Disable protection against removing loop preheaders
  --disable-preinline                                                        - Disable pre-instrumentation inliner
  --disable-ra-fsprofile-loader                                              - Disable MIRProfileLoader before RegAlloc
  --disable-replace-with-vec-lib                                             - Disable replace with vector math call pass
  --disable-sample-loader-inlining                                           - If true, artificially skip inline transformation in sample-loader pass, and merge (or scale) profiles (as configured by --sample-profile-merge-inlinee).
  --disable-sched-hazard                                                     - Disable hazard detection during preRA scheduling
  --disable-select-optimize                                                  - Disable the select-optimization pass from running
  --disable-separate-const-offset-from-gep                                   - Do not separate the constant offset from a GEP instruction
  --disable-ssc                                                              - Disable Stack Slot Coloring
  --disable-strictnode-mutation                                              - Don't mutate strict-float node to a legalize node
  --disable-symbolication                                                    - Disable symbolizing crash backtraces.
  --disable-tail-duplicate                                                   - Disable tail duplication
  --disable-thinlto-funcattrs                                                - Don't propagate function-attrs in thinLTO
  --disable-type-promotion                                                   - Disable type promotion pass
  --disable-vector-combine                                                   - Disable all vector combine transforms
  --disable-vp                                                               - Disable Value Profiling
  --disable-whole-program-visibility                                         - Disable whole program visibility (overrides enabling options)
  --do-comdat-renaming                                                       - Append function hash to the name of COMDAT function to avoid function hash mismatch due to the preinliner
  --do-counter-promotion                                                     - Do counter register promotion
  --dom-conditions-max-uses=<uint>                                           - 
  --dom-tree-reachability-max-bbs-to-explore=<uint>                          - Max number of BBs to explore for reachability analysis
  --dot-cfg-mssa=<file name for generated dot file>                          - file name for generated dot file
  --dot-ddg-filename-prefix=<string>                                         - The prefix used for the DDG dot file names.
  --dot-ddg-only                                                             - simple ddg dot graph
  --dot-mcfg-only                                                            - Print only the CFG without blocks body
  --dropped-variable-stats-mir                                               - Dump dropped debug variables stats for MIR passes
  --dse-memoryssa-defs-per-block-limit=<uint>                                - The number of MemoryDefs we consider as candidates to eliminated other stores per basic block (default = 5000)
  --dse-memoryssa-otherbb-cost=<uint>                                        - The cost of a step in a different basic block than the killing MemoryDef(default = 5)
  --dse-memoryssa-partial-store-limit=<uint>                                 - The maximum number candidates that only partially overwrite the killing MemoryDef to consider (default = 5)
  --dse-memoryssa-path-check-limit=<uint>                                    - The maximum number of blocks to check when trying to prove that all paths to an exit go through a killing block (default = 50)
  --dse-memoryssa-samebb-cost=<uint>                                         - The cost of a step in the same basic block as the killing MemoryDef(default = 1)
  --dse-memoryssa-scanlimit=<uint>                                           - The number of memory instructions to scan for dead store elimination (default = 150)
  --dse-memoryssa-walklimit=<uint>                                           - The maximum number of steps while walking upwards to find MemoryDefs that may be killed (default = 90)
  --dse-optimize-memoryssa                                                   - Allow DSE to optimize memory accesses.
  --dwarf-extended-loc=<value>                                               - Disable emission of the extended flags in .loc directives.
    =Default                                                                 -   Default for platform
    =Enable                                                                  -   Enabled
    =Disable                                                                 -   Disabled
  --eagerly-invalidate-analyses                                              - Eagerly invalidate more analyses in default pipelines
  --early-ifcvt-limit=<uint>                                                 - Maximum number of instructions per speculated block.
  --early-live-intervals                                                     - Run live interval analysis earlier in the pipeline
  --earlycse-debug-hash                                                      - Perform extra assertion checking to verify that SimpleValue's hash function is well-behaved w.r.t. its isEqual predicate
  --earlycse-mssa-optimization-cap=<uint>                                    - Enable imprecision in EarlyCSE in pathological cases, in exchange for faster compile. Caps the MemorySSA clobbering calls.
  --emulate-old-livedebugvalues                                              - Act like old LiveDebugValues did
  --enable-aa-sched-mi                                                       - Enable use of AA during MI DAG construction
  --enable-andcmp-sinking                                                    - Enable sinking and/cmp into branches.
  --enable-block-placement-stats                                             - Collect probability-driven block placement stats
  --enable-chr                                                               - Enable control height reduction optimization (CHR)
  --enable-cold-section                                                      - Enable placement of extracted cold functions into a separate section after hot-cold splitting.
  --enable-coldcc-stress-test                                                - Enable stress test of coldcc by adding calling conv to all internal functions.
  --enable-complex-deinterleaving                                            - Enable generation of complex instructions
  --enable-cond-stores-vec                                                   - Enable if predication of stores during vectorization.
  --enable-constraint-elimination                                            - Enable pass to eliminate conditions based on linear constraints
  --enable-deferred-spilling                                                 - Instead of spilling a variable right away, defer the actual code insertion to the end of the allocation. That way the allocator might still find a suitable coloring for this variable because of other evicted variables.
  --enable-detailed-function-properties                                      - Whether or not to compute detailed function properties.
  --enable-dfa-jump-thread                                                   - Enable DFA jump threading
  --enable-double-float-shrink                                               - Enable unsafe double to float shrinking for math lib calls
  --enable-dse-initializes-attr-improvement                                  - Enable the initializes attr improvement in DSE
  --enable-dse-partial-overwrite-tracking                                    - Enable partial-overwrite tracking in DSE
  --enable-dse-partial-store-merging                                         - Enable partial store merging in DSE
  --enable-early-exit-vectorization                                          - Enable vectorization of early exit loops with uncountable exits.
  --enable-epilogue-vectorization                                            - Enable vectorization of epilogue loops.
  --enable-ext-tsp-block-placement                                           - Enable machine block placement based on the ext-tsp model, optimizing I-cache utilization.
  --enable-fs-discriminator                                                  - Enable adding flow sensitive discriminators
  --enable-global-analyses                                                   - Enable inter-procedural analyses
  --enable-global-merge                                                      - Enable the global merge pass
  --enable-global-merge-func                                                 - Enable global merge functions that are based on hash function
  --enable-gvn-hoist                                                         - Enable the GVN hoisting pass (default = off)
  --enable-gvn-memdep                                                        - 
  --enable-gvn-memoryssa                                                     - 
  --enable-gvn-sink                                                          - Enable the GVN sinking pass (default = off)
  --enable-heap-to-stack-conversion                                          - 
  --enable-histogram-loop-vectorization                                      - Enables autovectorization of some loops containing histograms
  --enable-if-conversion                                                     - Enable if-conversion during vectorization.
  --enable-implicit-null-checks                                              - Fold null checks into faulting memory operations
  --enable-import-metadata                                                   - Enable import metadata like 'thinlto_src_module' and 'thinlto_src_file'
  --enable-ind-var-reg-heur                                                  - Count the induction variable only once when interleaving
  --enable-interleaved-mem-accesses                                          - Enable vectorization on interleaved memory accesses in a loop
  --enable-ipra                                                              - Enable interprocedural register allocation to reduce load/store at procedure calls.
  --enable-jump-table-to-switch                                              - Enable JumpTableToSwitch pass (default = off)
  --enable-knowledge-retention                                               - enable preservation of attributes throughout code transformation
  --enable-linkonceodr-ir-outlining                                          - Enable the IR outliner on linkonceodr functions
  --enable-linkonceodr-outlining                                             - Enable the machine outliner on linkonceodr functions
  --enable-load-in-loop-pre                                                  - 
  --enable-load-pre                                                          - 
  --enable-loadstore-runtime-interleave                                      - Enable runtime interleaving until load/store ports are saturated
  --enable-local-reassign                                                    - Local reassignment can yield better allocation decisions, but may be compile time intensive
  --enable-loop-distribute                                                   - Enable the new, experimental LoopDistribution Pass
  --enable-loop-flatten                                                      - Enable the LoopFlatten Pass
  --enable-loop-header-duplication                                           - Enable loop header duplication at any optimization level
  --enable-loop-simplifycfg-term-folding                                     - 
  --enable-loop-versioning-licm                                              - Enable the experimental Loop Versioning LICM pass
  --enable-loopinterchange                                                   - Enable the LoopInterchange Pass
  --enable-lsr-phielim                                                       - Enable LSR phi elimination
  --enable-machine-outliner                                                  - Enable the machine outliner
  --enable-machine-outliner=<value>                                          - Enable the machine outliner
    =always                                                                  -   Run on all functions guaranteed to be beneficial
    =never                                                                   -   Disable all outlining
  --enable-masked-interleaved-mem-accesses                                   - Enable vectorization on masked interleaved memory accesses in a loop
  --enable-matrix                                                            - Enable lowering of the matrix intrinsics
  --enable-mem-access-versioning                                             - Enable symbolic stride memory access versioning
  --enable-memcpyopt-without-libcalls                                        - Enable memcpyopt even when libcalls are disabled
  --enable-memprof-context-disambiguation                                    - Enable MemProf context disambiguation
  --enable-memprof-indirect-call-support                                     - Enable MemProf support for summarizing and cloning indirect calls
  --enable-merge-functions                                                   - Enable function merging as part of the optimization pipeline
  --enable-misched                                                           - Enable the machine instruction scheduling pass.
  --enable-ml-inliner=<value>                                                - Enable ML policy for inliner. Currently trained for -Oz only
    =default                                                                 -   Heuristics-based inliner version
    =development                                                             -   Use development mode (runtime-loadable model)
    =release                                                                 -   Use release mode (AOT-compiled model)
  --enable-module-inliner                                                    - Enable module inliner
  --enable-name-compression                                                  - Enable name/filename string compression
  --enable-newgvn                                                            - Run the NewGVN pass
  --enable-noalias-to-md-conversion                                          - Convert noalias attributes to metadata during inlining.
  --enable-nonnull-arg-prop                                                  - Try to propagate nonnull argument attributes from callsites to caller functions.
  --enable-nontrivial-unswitch                                               - Forcibly enables non-trivial loop unswitching rather than following the configuration passed into the pass.
  --enable-npm-pgo-inline-deferral                                           - Enable inline deferral during PGO
  --enable-objc-arc-opts                                                     - enable/disable all ARC Optimizations
  --enable-order-file-instrumentation                                        - Enable order file instrumentation (default = off)
  --enable-partial-inlining                                                  - Run Partial inlining pass
  --enable-patchpoint-liveness                                               - Enable PatchPoint Liveness Analysis Pass
  --enable-phi-of-ops                                                        - 
  --enable-pipeliner                                                         - Enable Software Pipelining
  --enable-pipeliner-opt-size                                                - Enable SWP at Os.
  --enable-post-misched                                                      - Enable the post-ra machine instruction scheduling pass.
  --enable-post-pgo-loop-rotation                                            - Run the loop rotation transformation after PGO instrumentation
  --enable-pre                                                               - 
  --enable-sampled-instrumentation                                           - Enable profile instrumentation sampling (default = off)
  --enable-scc-inline-advisor-printing                                       - 
  --enable-scoped-noalias                                                    - 
  --enable-selectiondag-sp                                                   - 
  --enable-shrink-wrap                                                       - enable the shrink-wrapping pass
  --enable-shrink-wrap-region-split                                          - enable splitting of the restore block if possible
  --enable-spill-copy-elim                                                   - 
  --enable-split-backedge-in-load-pre                                        - 
  --enable-split-loopiv-heuristic                                            - Enable loop iv regalloc heuristic
  --enable-split-machine-functions                                           - Split out cold blocks from machine functions based on profile information.
  --enable-store-refinement                                                  - 
  --enable-subreg-liveness                                                   - Enable subregister liveness tracking.
  --enable-tail-merge                                                        - 
  --enable-tbaa                                                              - 
  --enable-unroll-and-jam                                                    - Enable Unroll And Jam Pass
  --enable-unsafe-globalsmodref-alias-results                                - 
  --enable-unswitch-cost-multiplier                                          - Enable unswitch cost multiplier that prohibits exponential explosion in nontrivial unswitch.
  --enable-vfe                                                               - Enable virtual function elimination
  --enable-vplan-native-path                                                 - Enable VPlan-native vectorization path with support for outer loop vectorization.
  --enable-vtable-profile-use                                                - If ThinLTO and WPD is enabled and this option is true, vtable profiles will be used by ICP pass for more efficient indirect call sequence. If false, type profiles won't be used.
  --enable-vtable-value-profiling                                            - If true, the virtual table address will be instrumented to know the types of a C++ pointer. The information is used in indirect call promotion to do selective vtable-based comparison.
  --epilogue-vectorization-force-VF=<uint>                                   - When epilogue vectorization is enabled, and a value greater than 1 is specified, forces the given VF for all applicable epilogue loops.
  --epilogue-vectorization-minimum-VF=<uint>                                 - Only loops with vectorization factor equal to or larger than the specified value are considered for epilogue vectorization.
  --exhaustive-register-search                                               - Exhaustive Search for registers bypassing the depth and interference cutoffs of last chance recoloring
  --expand-constant-exprs                                                    - Expand constant expressions to instructions for testing purposes
  --expand-div-rem-bits=<uint>                                               - div and rem instructions on integers with more than <N> bits are expanded.
  --expand-fp-convert-bits=<uint>                                            - fp convert instructions on integers with more than <N> bits are expanded.
  --expand-variadics-override=<value>                                        - Override the behaviour of expand-variadics
    =unspecified                                                             -   Use the implementation defaults
    =disable                                                                 -   Disable the pass entirely
    =optimize                                                                -   Optimise without changing ABI
    =lowering                                                                -   Change variadic calling convention
  --expandvp-override-evl-transform=<string>                                 - Options: <empty>|Legal|Discard|Convert. If non-empty, ignore TargetTransformInfo and always use this transformation for the %evl parameter (Used in testing).
  --expandvp-override-mask-transform=<string>                                - Options: <empty>|Legal|Discard|Convert. If non-empty, Ignore TargetTransformInfo and always use this transformation for the %mask parameter (Used in testing).
  --experimental-debug-variable-locations                                    - Use experimental new value-tracking variable locations
  --experimental-debuginfo-iterators                                         - Enable communicating debuginfo positions through iterators, eliminating intrinsics. Has no effect if --preserve-input-debuginfo-format=true.
  --ext-tsp-apply-without-profile                                            - Whether to apply ext-tsp placement for instances w/o profile
  --ext-tsp-block-placement-max-blocks=<uint>                                - Maximum number of basic blocks in a function to run ext-TSP block placement.
  --extra-vectorizer-passes                                                  - Run cleanup optimization passes after vectorization
  --extract-blocks-erase-funcs                                               - Erase the existing functions
  --extract-blocks-file=<filename>                                           - A file containing list of basic blocks to extract
  -f <function>                                                              - top-level function name
  --fast-cluster-threshold=<uint>                                            - The threshold for fast cluster
  --fast-isel                                                                - Enable the "fast" instruction selector
  --filter-passes=<pass names>                                               - Only consider IR changes for passes whose names match the specified value. No-op without -print-changed
  --filter-print-funcs=<function names>                                      - Only print IR for functions whose name match this for all print-[before|after][-all] options
  --fixup-allow-gcptr-in-csr                                                 - Allow passing GC Pointer arguments in callee saved registers
  --fixup-max-csr-statepoints=<uint>                                         - Max number of statepoints allowed to pass GC Ptrs in registers
  --fixup-scs-enable-copy-propagation                                        - Enable simple copy propagation during register reloading
  --fixup-scs-extend-slot-size                                               - Allow spill in spill slot of greater size than register size
  --flat-loop-tripcount-threshold=<uint>                                     - If the runtime tripcount for the loop is lower than the threshold, the loop is considered as flat and will be less aggressively unrolled.
  --flattened-profile-used                                                   - Indicate the sample profile being used is flattened, i.e., no inline hierarchy exists in the profile
  --float2int-max-integer-bw=<uint>                                          - Max integer bitwidth to consider in float2int(default=64)
  --force-attribute=<string>                                                 - Add an attribute to a function. This can be a pair of 'function-name:attribute-name', to apply an attribute to a specific function. For example -force-attribute=foo:noinline. Specifying only an attribute will apply the attribute to every function in the module. This option can be specified multiple times.
  --force-chr                                                                - Apply CHR for all functions
  --force-fast-cluster                                                       - Switch to fast cluster algorithm with the lost of some fusion opportunities
  --force-fuse-matrix                                                        - Force matrix instruction fusion even if not profitable.
  --force-hardware-loop-guard                                                - Force generation of loop guard intrinsic
  --force-hardware-loop-phi                                                  - Force hardware loop counter to be updated through a phi
  --force-hardware-loops                                                     - Force hardware loops intrinsics to be inserted
  --force-import-all                                                         - Import functions with noinline attribute
  --force-instr-ref-livedebugvalues                                          - Use instruction-ref based LiveDebugValues with normal DBG_VALUE inputs
  --force-loop-cold-block                                                    - Force outlining cold blocks from loops.
  --force-nested-hardware-loop                                               - Force allowance of nested hardware loops
  --force-ordered-reductions                                                 - Enable the vectorisation of loops with in-order (strict) FP reductions
  --force-pgso                                                               - Force the (profiled-guided) size optimizations. 
  --force-precise-rotation-cost                                              - Force the use of precise cost loop rotation strategy.
  --force-remove-attribute=<string>                                          - Remove an attribute from a function. This can be a pair of 'function-name:attribute-name' to remove an attribute from a specific function. For example -force-remove-attribute=foo:noinline. Specifying only an attribute will remove the attribute from all functions in the module. This option can be specified multiple times.
  --force-specialization                                                     - Force function specialization for every call site with a constant argument
  --force-split-store                                                        - Force store splitting no matter what the target query says.
  --force-summary-edges-cold=<value>                                         - Force all edges in the function summary to cold
    =none                                                                    -   None.
    =all-non-critical                                                        -   All non-critical edges.
    =all                                                                     -   All edges.
  --force-tail-folding-style=<value>                                         - Force the tail folding style
    =none                                                                    -   Disable tail folding
    =data                                                                    -   Create lane mask for data only, using active.lane.mask intrinsic
    =data-without-lane-mask                                                  -   Create lane mask with compare/stepvector
    =data-and-control                                                        -   Create lane mask using active.lane.mask intrinsic, and use it for both data and control flow
    =data-and-control-without-rt-check                                       -   Similar to data-and-control, but remove the runtime check
    =data-with-evl                                                           -   Use predicated EVL instructions for tail folding. If EVL is unsupported, fallback to data-without-lane-mask.
  --force-target-instruction-cost=<uint>                                     - A flag that overrides the target's expected cost for an instruction to a single constant value. Mostly useful for getting consistent testing.
  --force-target-max-scalar-interleave=<uint>                                - A flag that overrides the target's max interleave factor for scalar loops.
  --force-target-max-vector-interleave=<uint>                                - A flag that overrides the target's max interleave factor for vectorized loops.
  --force-target-num-scalar-regs=<uint>                                      - A flag that overrides the target's number of scalar registers.
  --force-target-num-vector-regs=<uint>                                      - A flag that overrides the target's number of vector registers.
  --force-target-supports-scalable-vectors                                   - Pretend that scalable vectors are supported, even if the target does not support them. This flag should only be used for testing.
  --force-vector-interleave=<uint>                                           - Sets the vectorization interleave count. Zero is autoselect.
  --force-vector-width=<uint>                                                - Sets the SIMD width. Zero is autoselect.
  --force-widen-divrem-via-safe-divisor                                      - Override cost based safe divisor widening for div/rem instructions
  --forceattrs-csv-path=<string>                                             - Path to CSV file containing lines of function names and attributes to add to them in the form of `f1,attr1` or `f2,attr2=str`.
  --forget-scev-loop-unroll                                                  - Forget everything in SCEV when doing LoopUnroll, instead of just the current top-most loop. This is sometimes preferred to reduce compile time.
  --forward-switch-cond                                                      - Forward switch condition to phi ops (default = false)
  --freeze-loop-unswitch-cond                                                - If enabled, the freeze instruction will be added to condition of loop unswitch to prevent miscompilation.
  --fs-profile-debug-bw-threshold=<uint>                                     - Only show debug message if the source branch weight is greater  than this value.
  --fs-profile-debug-prob-diff-threshold=<uint>                              - Only show debug message if the branch probability is greater than this value (in percentage).
  --fs-profile-file=<filename>                                               - Flow Sensitive profile file name.
  --fs-remapping-file=<filename>                                             - Flow Sensitive profile remapping file name.
  --fs-viewbfi-after                                                         - View BFI after MIR loader
  --fs-viewbfi-before                                                        - View BFI before MIR loader
  --func-profile-similarity-threshold=<uint>                                 - Consider a profile matches a function if the similarity of their callee sequences is above the specified percentile.
  --funcspec-for-literal-constant                                            - Enable specialization of functions that take a literal constant as an argument
  --funcspec-max-block-predecessors=<uint>                                   - The maximum number of predecessors a basic block can have to be considered during the estimation of dead code
  --funcspec-max-clones=<uint>                                               - The maximum number of clones allowed for a single function specialization
  --funcspec-max-codesize-growth=<uint>                                      - Maximum codesize growth allowed per function
  --funcspec-max-discovery-iterations=<uint>                                 - The maximum number of iterations allowed when searching for transitive phis
  --funcspec-max-incoming-phi-values=<uint>                                  - The maximum number of incoming values a PHI node can have to be considered during the specialization bonus estimation
  --funcspec-max-iters=<uint>                                                - The maximum number of iterations function specialization is run
  --funcspec-min-codesize-savings=<uint>                                     - Reject specializations whose codesize savings are less than this much percent of the original function size
  --funcspec-min-function-size=<uint>                                        - Don't specialize functions that have less than this number of instructions
  --funcspec-min-inlining-bonus=<uint>                                       - Reject specializations whose inlining bonus is less than this much percent of the original function size
  --funcspec-min-latency-savings=<uint>                                      - Reject specializations whose latency savings are less than this much percent of the original function size
  --funcspec-on-address                                                      - Enable function specialization on the address of global values
  --fuse-matrix                                                              - Enable/disable fusing matrix instructions.
  --fuse-matrix-tile-size=<uint>                                             - Tile size for matrix instruction fusion using square-shaped tiles.
  --fuse-matrix-use-loops                                                    - Generate loop nest for tiling.
  -g <string>                                                                - lowest level architectural hierarchy to simulate (pick from herd and core)
  --gc-empty-basic-blocks                                                    - Enable garbage-collecting empty basic blocks
  --gcov-atomic-counter                                                      - Make counter updates atomic
  --generate-merged-base-profiles                                            - When generating nested context-sensitive profiles, always generate extra base profile for function with all its context profiles merged into it.
  --global-isel                                                              - Enable the "global" instruction selector
  --global-isel-abort=<value>                                                - Enable abort calls when "global" instruction selection fails to lower/select an instruction
    =0                                                                       -   Disable the abort
    =1                                                                       -   Enable the abort
    =2                                                                       -   Disable the abort but emit a diagnostic on failure
  --global-merge-all-const                                                   - Merge all const globals without looking at uses
  --global-merge-group-by-use                                                - Improve global merge pass to look at uses
  --global-merge-ignore-single-use                                           - Improve global merge pass to ignore globals only used alone
  --global-merge-max-offset=<uint>                                           - Set maximum offset for global merge pass
  --global-merge-min-data-size=<uint>                                        - The minimum size in bytes of each global that should considered in merging.
  --global-merge-on-const                                                    - Enable global merge pass on constants
  --global-merge-on-external                                                 - Enable global merge pass on external linkage
  --global-merging-call-overhead=<number>                                    - The overhead cost associated with each function call when merging functions.
  --global-merging-extra-threshold=<number>                                  - An additional cost threshold that must be exceeded for merging to be considered beneficial.
  --global-merging-inst-overhead=<number>                                    - The overhead cost associated with each instruction when lowering to machine instruction.
  --global-merging-max-params=<uint>                                         - The maximum number of parameters allowed when merging functions.
  --global-merging-min-instrs=<uint>                                         - The minimum instruction count required when merging functions.
  --global-merging-min-merges=<uint>                                         - Minimum number of similar functions with the same hash required for merging.
  --global-merging-param-overhead=<number>                                   - The overhead cost associated with each parameter when merging functions.
  --global-merging-skip-no-params                                            - Skip merging functions with no parameters.
  --greedy-regclass-priority-trumps-globalness                               - Change the greedy register allocator's live range priority calculation to make the AllocationPriority of the register class more important then whether the range is global
  --greedy-reverse-local-assignment                                          - Reverse allocation order of local live ranges, such that shorter local live ranges will tend to be allocated first
  --grow-region-complexity-budget=<ulong>                                    - growRegion() does not scale with the number of BB edges, so limit its budget and bail out once we reach the limit.
  --guard-widening-widen-branch-guards                                       - Whether or not we should widen guards  expressed as branches by widenable conditions
  --guards-predicate-pass-branch-weight=<uint>                               - The probability of a guard failing is assumed to be the reciprocal of this value (default = 1 << 20)
  --gvn-add-phi-translation                                                  - Enable phi-translation of add instructions
  --gvn-hoist-max-bbs=<int>                                                  - Max number of basic blocks on the path between hoisting locations (default = 4, unlimited = -1)
  --gvn-hoist-max-chain-length=<int>                                         - Maximum length of dependent chains to hoist (default = 10, unlimited = -1)
  --gvn-hoist-max-depth=<int>                                                - Hoist instructions from the beginning of the BB up to the maximum specified depth (default = 100, unlimited = -1)
  --gvn-max-block-speculations=<uint>                                        - Max number of blocks we're willing to speculate on (and recurse into) when deducing if a value is fully available or not in GVN (default = 600)
  --gvn-max-hoisted=<int>                                                    - Max number of instructions to hoist (default unlimited = -1)
  --gvn-max-num-deps=<uint>                                                  - Max number of dependences to attempt Load PRE (default = 100)
  --gvn-max-num-insns=<uint>                                                 - Max number of instructions to scan in each basic block in GVN (default = 100)
  --gvn-max-num-visited-insts=<uint>                                         - Max number of visited instructions when trying to find dominating value of select dependency (default = 100)
  --hardware-loop-counter-bitwidth=<uint>                                    - Set the loop counter bitwidth
  --hardware-loop-decrement=<uint>                                           - Set the loop decrement value
  --hash-based-counter-split                                                 - Rename counter variable of a comdat function based on cfg hash
  --hints-allow-reordering                                                   - Allow enabling loop hints to reorder FP operations during vectorization.
  --hoist-cheap-insts                                                        - MachineLICM should hoist even cheap instructions
  --hoist-common-insts                                                       - hoist common instructions (default = false)
  --hoist-const-loads                                                        - Hoist invariant loads
  --hoist-const-stores                                                       - Hoist invariant stores
  --hoist-loads-stores-with-cond-faulting                                    - Hoist loads/stores if the target supports conditional faulting (default = false)
  --hoist-loads-stores-with-cond-faulting-threshold=<uint>                   - Control the maximal conditional load/store that we are willing to speculatively execute to eliminate conditional branch (default = 6)
  --hoist-runtime-checks                                                     - Hoist inner loop runtime memory checks to outer loop if possible
  --hot-callsite-rel-freq=<ulong>                                            - Minimum block frequency, expressed as a multiple of caller's entry frequency, for a callsite to be hot in the absence of profile information.
  --hot-callsite-threshold=<int>                                             - Threshold for hot callsites 
  --hot-cold-split                                                           - Enable hot-cold splitting pass
  --hot-cold-static-analysis                                                 - 
  --hot-func-cutoff-for-staleness-error=<uint>                               - A function is considered hot for staleness error check if its total sample count is above the specified percentile
  --hot-new-hint-value=<uint>                                                - Value to pass to hot/cold operator new for hot allocation
  --hotcoldsplit-cold-probability-denom=<int>                                - Divisor of cold branch probability.BranchProbability = 1/ColdBranchProbDenom
  --hotcoldsplit-cold-section-name=<string>                                  - Name for the section containing cold functions extracted by hot-cold splitting.
  --hotcoldsplit-max-params=<int>                                            - Maximum number of parameters for a split function
  --hotcoldsplit-threshold=<int>                                             - Base penalty for splitting cold code (as a multiple of TCC_Basic)
  --huge-size-for-split=<uint>                                               - A threshold of live range size which may cause high compile time cost in global splitting.
  --hwasan-experimental-use-page-aliases                                     - Use page aliasing in HWASan
  --hwasan-generate-tags-with-calls                                          - generate new tags with runtime library calls
  --hwasan-globals                                                           - Instrument globals
  --hwasan-inline-all-checks                                                 - inline all checks
  --hwasan-inline-fast-path-checks                                           - inline all checks
  --hwasan-instrument-atomics                                                - instrument atomic instructions (rmw, cmpxchg)
  --hwasan-instrument-byval                                                  - instrument byval arguments
  --hwasan-instrument-landing-pads                                           - instrument landing pads
  --hwasan-instrument-mem-intrinsics                                         - instrument memory intrinsics
  --hwasan-instrument-personality-functions                                  - instrument personality functions
  --hwasan-instrument-reads                                                  - instrument read instructions
  --hwasan-instrument-stack                                                  - instrument stack (allocas)
  --hwasan-instrument-with-calls                                             - instrument reads and writes with callbacks
  --hwasan-instrument-writes                                                 - instrument write instructions
  --hwasan-kernel                                                            - Enable KernelHWAddressSanitizer instrumentation
  --hwasan-kernel-mem-intrinsic-prefix                                       - Use prefix for memory intrinsics in KASAN mode
  --hwasan-mapping-offset=<ulong>                                            - HWASan shadow mapping offset [EXPERIMENTAL]
  --hwasan-mapping-offset-dynamic=<value>                                    - HWASan shadow mapping dynamic offset location
    =global                                                                  -   Use global
    =ifunc                                                                   -   Use ifunc global
    =tls                                                                     -   Use TLS
  --hwasan-match-all-tag=<int>                                               - don't report bad accesses via pointers with this tag
  --hwasan-memory-access-callback-prefix=<string>                            - Prefix for memory access callbacks
  --hwasan-percentile-cutoff-hot=<int>                                       - Hot percentile cutoff.
  --hwasan-random-rate=<number>                                              - Probability value in the range [0.0, 1.0] to keep instrumentation of a function. Note: instrumentation can be skipped randomly OR because of the hot percentile cutoff, if both are supplied.
  --hwasan-record-stack-history=<value>                                      - Record stack frames with tagged allocations in a thread-local ring buffer
    =none                                                                    -   Do not record stack ring history
    =instr                                                                   -   Insert instructions into the prologue for storing into the stack ring buffer directly
    =libcall                                                                 -   Add a call to __hwasan_add_frame_record for storing into the stack ring buffer
  --hwasan-recover                                                           - Enable recovery mode (continue-after-error).
  --hwasan-use-after-scope                                                   - detect use after scope within function
  --hwasan-use-short-granules                                                - use short granules in allocas and outlined checks
  --hwasan-use-stack-safety                                                  - Use Stack Safety analysis results
  --hwasan-with-frame-record                                                 - Use ring buffer for stack allocations
  --icp-call-only                                                            - Run indirect-call promotion for call instructions only
  --icp-csskip=<uint>                                                        - Skip Callsite up to this number for this compilation
  --icp-cutoff=<uint>                                                        - Max number of promotions for this compilation
  --icp-dumpafter                                                            - Dump IR after transformation happens
  --icp-ignored-base-types=<string>                                          - A list of mangled vtable type info names. Classes specified by the type info names and their derived ones will not be vtable-ICP'ed. Useful when the profiled types and actual types in the optimized binary could be different due to profiling limitations. Type info names are those string literals used in LLVM type metadata
  --icp-invoke-only                                                          - Run indirect-call promotion for invoke instruction only
  --icp-lto                                                                  - Run indirect-call promotion in LTO mode
  --icp-max-annotations=<uint>                                               - Max number of annotations for a single indirect call callsite
  --icp-max-num-vtable-last-candidate=<int>                                  - The maximum number of vtable for the last candidate.
  --icp-max-num-vtables=<uint>                                               - Max number of vtables annotated for a vtable load instruction.
  --icp-max-prom=<uint>                                                      - Max number of promotions for a single indirect call callsite
  --icp-remaining-percent-threshold=<uint>                                   - The percentage threshold against remaining unpromoted indirect call count for the promotion
  --icp-samplepgo                                                            - Run indirect-call promotion in SamplePGO mode
  --icp-total-percent-threshold=<uint>                                       - The percentage threshold against total count for the promotion
  --icp-vtable-percentage-threshold=<number>                                 - The percentage threshold of vtable-count / function-count for cost-benefit analysis.
  --ifcvt-branch-fold                                                        - 
  --ifcvt-fn-start=<int>                                                     - 
  --ifcvt-fn-stop=<int>                                                      - 
  --ifcvt-limit=<int>                                                        - 
  --ignore-redundant-instrumentation                                         - Ignore redundant instrumentation
  --ignore-tti-inline-compatible                                             - Ignore TTI attributes compatibility check between callee/caller during inline cost calculation
  --imp-null-check-page-size=<int>                                           - The page size of the target in bytes
  --imp-null-max-insts-to-consider=<uint>                                    - The max number of instructions to consider hoisting loads over (the algorithm is quadratic over this number)
  --import-all-index                                                         - Import all external functions in index.
  --import-cold-multiplier=<N>                                               - Multiply the `import-instr-limit` threshold for cold callsites
  --import-constants-with-refs                                               - Import constant global variables with references
  --import-critical-multiplier=<x>                                           - Multiply the `import-instr-limit` threshold for critical callsites
  --import-cutoff=<N>                                                        - Only import first N functions if N>=0 (default -1)
  --import-declaration                                                       - If true, import function declaration as fallback if the function definition is not imported.
  --import-full-type-definitions                                             - Import full type definitions for ThinLTO.
  --import-hot-evolution-factor=<x>                                          - As we import functions called from hot callsite, multiply the `import-instr-limit` threshold by this factor before processing newly imported functions
  --import-hot-multiplier=<x>                                                - Multiply the `import-instr-limit` threshold for hot callsites
  --import-instr-evolution-factor=<x>                                        - As we import functions, multiply the `import-instr-limit` threshold by this factor before processing newly imported functions
  --import-instr-limit=<N>                                                   - Only import functions with less than N instructions
  --improved-fs-discriminator                                                - New FS discriminators encoding (incompatible with the original encoding)
  --indvars-post-increment-ranges                                            - Use post increment control-dependent ranges in IndVarSimplify
  --indvars-predicate-loops                                                  - Predicate conditions in read only loops
  --indvars-widen-indvars                                                    - Allow widening of indvars to eliminate s/zext
  --info-output-file=<filename>                                              - File to append -stats and -timer output to
  --inline-call-penalty=<int>                                                - Call penalty that is applied per callsite when inlining
  --inline-caller-superset-nobuiltin                                         - Allow inlining when caller has a superset of callee's nobuiltin attributes.
  --inline-cold-callsite-threshold=<int>                                     - Threshold for inlining cold callsites
  --inline-cost-full                                                         - Compute the full inline cost of a call site even when the cost exceeds the threshold.
  --inline-deferral                                                          - Enable deferred inlining
  --inline-deferral-scale=<int>                                              - Scale to limit the cost of inline deferral
  --inline-enable-cost-benefit-analysis                                      - Enable the cost-benefit analysis for the inliner
  --inline-instr-cost=<int>                                                  - Cost of a single instruction when inlining
  --inline-max-stacksize=<ulong>                                             - Do not inline functions with a stack size that exceeds the specified limit
  --inline-memaccess-cost=<int>                                              - Cost of load/store instruction when inlining
  --inline-priority-mode=<value>                                             - Choose the priority mode to use in module inline
    =size                                                                    -   Use callee size priority.
    =cost                                                                    -   Use inline cost priority.
    =cost-benefit                                                            -   Use cost-benefit ratio.
    =ml                                                                      -   Use ML.
  --inline-remark-attribute                                                  - Enable adding inline-remark attribute to callsites processed by inliner but decided to be not inlined
  --inline-savings-multiplier=<int>                                          - Multiplier to multiply cycle savings by during inlining
  --inline-savings-profitable-multiplier=<int>                               - A multiplier on top of cycle savings to decide whether the savings won't justify the cost
  --inline-size-allowance=<int>                                              - The maximum size of a callee that get's inlined without sufficient cycle savings
  --inline-threshold=<int>                                                   - Control the amount of inlining to perform (default = 225)
  --inlinecold-threshold=<int>                                               - Threshold for inlining functions with cold attribute
  --inlinedefault-threshold=<int>                                            - Default amount of inlining to perform
  --inlinehint-threshold=<int>                                               - Threshold for inlining functions with inline hint
  --inliner-function-import-stats=<value>                                    - Enable inliner stats for imported functions
    =basic                                                                   -   basic statistics
    =verbose                                                                 -   printing of statistics for each inlined function
  --inliner-interactive-channel-base=<string>                                - Base file path for the interactive mode. The incoming filename should have the name <inliner-interactive-channel-base>.in, while the outgoing name should be <inliner-interactive-channel-base>.out
  --inliner-interactive-include-default                                      - In interactive mode, also send the default policy decision: inlining_default.
  --instcombine-code-sinking                                                 - Enable code sinking
  --instcombine-guard-widening-window=<uint>                                 - How wide an instruction window to bypass looking for another guard
  --instcombine-lower-dbg-declare=<uint>                                     - 
  --instcombine-max-copied-from-constant-users=<uint>                        - Maximum users to visit in copy from constant transform
  --instcombine-max-num-phis=<uint>                                          - Maximum number phis to handle in intptr/ptrint folding
  --instcombine-max-sink-users=<uint>                                        - Maximum number of undroppable users for instruction sinking
  --instcombine-maxarray-size=<uint>                                         - Maximum array size considered when doing a combine
  --instcombine-negator-enabled                                              - Should we attempt to sink negations?
  --instcombine-negator-max-depth=<uint>                                     - What is the maximal lookup depth when trying to check for viability of negation sinking.
  --instcombine-simplify-vector-elts-depth=<uint>                            - Depth limit when simplifying vector instructions and their operands
  --instcombine-verify-known-bits                                            - Verify that computeKnownBits() and SimplifyDemandedBits() are consistent
  --instrprof-atomic-counter-update-all                                      - Make all profile counter updates atomic (for testing only)
  --instrument-cold-function-only-path=<string>                              - File path for cold function only instrumentation(requires use with --pgo-instrument-cold-function-only)
  --interactive-model-runner-echo-reply                                      - The InteractiveModelRunner will echo back to stderr the data received from the host (for debugging purposes).
  --interleave-loops                                                         - Enable loop interleaving in Loop vectorization passes
  --internalize-public-api-file=<filename>                                   - A file containing list of symbol names to preserve
  --internalize-public-api-list=<list>                                       - A list of symbol names to preserve
  --intra-scc-cost-multiplier=<int>                                          - Cost multiplier to multiply onto inlined call sites where the new call was previously an intra-SCC call (not relevant when the original call was already intra-SCC). This can accumulate over multiple inlinings (e.g. if a call site already had a cost multiplier and one of its inlined calls was also subject to this, the inlined call would have the original multiplier multiplied by intra-scc-cost-multiplier). This is to prevent tons of inlining through a child SCC which can cause terrible compile times
  --ipt-expensive-asserts                                                    - Perform expensive assert validation on every query to Instruction Precedence Tracking
  --ir-outliner                                                              - Enable ir outliner pass
  --irce-allow-narrow-latch                                                  - If set to true, IRCE may eliminate wide range checks in loops with narrow latch condition.
  --irce-allow-unsigned-latch                                                - 
  --irce-loop-size-cutoff=<uint>                                             - 
  --irce-max-type-size-for-overflow-check=<uint>                             - Maximum size of range check type for which can be produced runtime overflow check of its limit's computation
  --irce-min-eliminated-checks=<uint>                                        - 
  --irce-print-changed-loops                                                 - 
  --irce-print-range-checks                                                  - 
  --irce-print-scaled-boundary-range-checks                                  - 
  --irce-skip-profitability-checks                                           - 
  --iterative-bfi-max-iterations-per-block=<uint>                            - Iterative inference: maximum number of update iterations per block
  --iterative-bfi-precision=<number>                                         - Iterative inference: delta convergence precision; smaller values typically lead to better results at the cost of worsen runtime
  --iterative-counter-promotion                                              - Allow counter promotion across the whole loop nest.
  --join-globalcopies                                                        - Coalesce copies that span blocks (default=subtarget)
  --join-liveintervals                                                       - Coalesce copies (default=true)
  --join-splitedges                                                          - Coalesce copies on split edges (default=subtarget)
  --jump-inst-cost=<uint>                                                    - Cost of jump instructions.
  --jump-is-expensive                                                        - Do not create extra branches to split comparison logic.
  --jump-table-density=<uint>                                                - Minimum density for building a jump table in a normal function
  --jump-table-to-switch-function-size-threshold=<uint>                      - Only split jump tables containing functions whose sizes are less or equal than this threshold.
  --jump-table-to-switch-size-threshold=<uint>                               - Only split jump tables with size less or equal than JumpTableSizeThreshold.
  --jump-threading-across-loop-headers                                       - Allow JumpThreading to thread across loop headers, for testing
  --jump-threading-implication-search-threshold=<uint>                       - The number of predecessors to search for a stronger condition to use to thread over a weaker condition
  --jump-threading-phi-threshold=<uint>                                      - Max PHIs in BB to duplicate for jump threading
  --jump-threading-threshold=<uint>                                          - Max block size to duplicate for jump threading
  --keep-inline-advisor-for-printing                                         - 
  --keep-loops                                                               - Preserve canonical loop structure (default = true)
  --laa-speculate-unit-stride                                                - Speculate that non-constant strides are unit in LAA
  --large-interval-freq-threshold=<uint>                                     - For a large interval, if it is coalesced with other live intervals many times more than the threshold, stop its coalescing to control the compile time. 
  --large-interval-size-threshold=<uint>                                     - If the valnos size of an interval is larger than the threshold, it is regarded as a large interval. 
  --late-remat-update-threshold=<uint>                                       - During rematerialization for a copy, if the def instruction has many other copy uses to be rematerialized, delay the multiple separate live interval update work and do them all at once after all those rematerialization are done. It will save a lot of repeated work. 
  --lcr-max-depth=<uint>                                                     - Last chance recoloring max depth
  --lcr-max-interf=<uint>                                                    - Last chance recoloring maximum number of considered interference at a time
  --licm-control-flow-hoisting                                               - Enable control flow (and PHI) hoisting in LICM
  --licm-force-thread-model-single                                           - Force thread model single in LICM pass
  --licm-max-num-fp-reassociations=<uint>                                    - Set upper limit for the number of transformations performed during a single round of hoisting the reassociated expressions.
  --licm-max-num-int-reassociations=<uint>                                   - Set upper limit for the number of transformations performed during a single round of hoisting the reassociated expressions.
  --licm-max-num-uses-traversed=<uint>                                       - Max num uses visited for identifying load invariance in loop using invariant start (default = 8)
  --licm-mssa-max-acc-promotion=<uint>                                       - [LICM & MemorySSA] When MSSA in LICM is disabled, this has no effect. When MSSA in LICM is enabled, then this is the maximum number of accesses allowed to be present in a loop in order to enable memory promotion.
  --licm-mssa-optimization-cap=<uint>                                        - Enable imprecision in LICM in pathological cases, in exchange for faster compile. Caps the MemorySSA clobbering calls.
  --licm-versioning-invariant-threshold=<number>                             - LoopVersioningLICM's minimum allowed percentage of possible invariant instructions per loop
  --licm-versioning-max-depth-threshold=<uint>                               - LoopVersioningLICM's threshold for maximum allowed loop nest/depth
  --likely-branch-weight=<uint>                                              - Weight of the branch likely to be taken (default = 2000)
  --lint-abort-on-error                                                      - In the Lint pass, abort on errors.
  --live-debug-variables                                                     - Enable the live debug variables pass
  --livedebugvalues-input-bb-limit=<uint>                                    - Maximum input basic blocks before DBG_VALUE limit applies
  --livedebugvalues-input-dbg-value-limit=<uint>                             - Maximum input DBG_VALUE insts supported by debug range extension
  --livedebugvalues-max-stack-slots=<uint>                                   - livedebugvalues-stack-ws-limit
  --load-bitcode-into-experimental-debuginfo-iterators                       - Load bitcode directly into the new debug info format (regardless of input format)
  --load-func-profile-for-cg-matching                                        - Load top-level profiles that the sample reader initially skipped for the call-graph matching (only meaningful for extended binary format)
  --locally-hot-callsite-threshold=<int>                                     - Threshold for locally hot callsites 
  --loop-deletion-enable-symbolic-execution                                  - Break backedge through symbolic execution of 1st iteration attempting to prove that the backedge is never taken
  --loop-distribute-non-if-convertible                                       - Whether to distribute into a loop that may not be if-convertible by the loop vectorizer
  --loop-distribute-scev-check-threshold=<uint>                              - The maximum number of SCEV checks allowed for Loop Distribution
  --loop-distribute-scev-check-threshold-with-pragma=<uint>                  - The maximum number of SCEV checks allowed for Loop Distribution for loop marked with #pragma clang loop distribute(enable)
  --loop-distribute-verify                                                   - Turn on DominatorTree and LoopInfo verification after Loop Distribution
  --loop-flatten-assume-no-overflow                                          - Assume that the product of the two iteration trip counts will never overflow
  --loop-flatten-cost-threshold=<uint>                                       - Limit on the cost of instructions that can be repeated due to loop flattening
  --loop-flatten-version-loops                                               - Version loops if flattened loop could overflow
  --loop-flatten-widen-iv                                                    - Widen the loop induction variables, if possible, so overflow checks won't reject flattening
  --loop-fusion-dependence-analysis=<value>                                  - Which dependence analysis should loop fusion use?
    =scev                                                                    -   Use the scalar evolution interface
    =da                                                                      -   Use the dependence analysis interface
    =all                                                                     -   Use all available analyses
  --loop-fusion-peel-max-count=<uint>                                        - Max number of iterations to be peeled from a loop, such that fusion can take place
  --loop-fusion-verbose-debug                                                - Enable verbose debugging for Loop Fusion
  --loop-idiom-vectorize-bytecmp-vf=<uint>                                   - The vectorization factor for byte-compare patterns.
  --loop-idiom-vectorize-style=<value>                                       - The vectorization style for loop idiom transform.
    =masked                                                                  -   Use masked vector intrinsics
    =predicated                                                              -   Use VP intrinsics
  --loop-idiom-vectorize-verify                                              - Verify loops generated Loop Idiom Vectorize Pass.
  --loop-interchange-max-loop-nest-depth=<uint>                              - Maximum depth of loop nest considered for the transform
  --loop-interchange-max-meminstr-count=<uint>                               - Maximum number of load-store instructions that should be handled in the dependency matrix. Higher value may lead to more interchanges at the cost of compile-time
  --loop-interchange-min-loop-nest-depth=<uint>                              - Minimum depth of loop nest considered for the transform
  --loop-interchange-threshold=<int>                                         - Interchange if you gain more than this number
  --loop-load-elimination-scev-check-threshold=<uint>                        - The maximum number of SCEV checks allowed for Loop Load Elimination
  --loop-predication-enable-count-down-loop                                  - 
  --loop-predication-enable-iv-truncation                                    - 
  --loop-predication-insert-assumes-of-predicated-guards-conditions          - Whether or not we should insert assumes of conditions of predicated guards
  --loop-predication-latch-probability-scale=<number>                        - scale factor for the latch probability. Value should be greater than 1. Lower values are ignored
  --loop-predication-predicate-widenable-branches-to-deopt                   - Whether or not we should predicate guards expressed as widenable branches to deoptimize blocks
  --loop-predication-skip-profitability-checks                               - 
  --loop-prefetch-writes                                                     - Prefetch write addresses
  --loop-rotate-multi                                                        - Allow loop rotation multiple times in order to reach a better latch exit
  --loop-to-cold-block-ratio=<uint>                                          - Outline loop blocks from loop chain if (frequency of loop) / (frequency of block) is greater than this ratio
  --loop-vectorize-with-block-frequency                                      - Enable the use of the block frequency analysis to access PGO heuristics minimizing code growth in cold regions and being more aggressive in hot regions.
  --loop-version-annotate-no-alias                                           - Add no-alias annotation for instructions that are disambiguated by memchecks
  --lower-allow-check-percentile-cutoff-hot=<int>                            - Hot percentile cutoff.
  --lower-allow-check-random-rate=<number>                                   - Probability value in the range [0.0, 1.0] of unconditional pseudo-random checks.
  --lower-interleaved-accesses                                               - Enable lowering interleaved accesses to intrinsics
  --lowertypetests-avoid-reuse                                               - Try to avoid reuse of byte array addresses using aliases
  --lowertypetests-drop-type-tests=<value>                                   - Simply drop type test sequences
    =none                                                                    -   Do not drop any type tests
    =assume                                                                  -   Drop type test assume sequences
    =all                                                                     -   Drop all type test sequences
  --lowertypetests-read-summary=<string>                                     - Read summary from given YAML file before running pass
  --lowertypetests-summary-action=<value>                                    - What to do with the summary when running this pass
    =none                                                                    -   Do nothing
    =import                                                                  -   Import typeid resolutions from summary and globals
    =export                                                                  -   Export typeid resolutions to summary and globals
  --lowertypetests-write-summary=<string>                                    - Write summary to given YAML file after running pass
  --lsr-complexity-limit=<uint>                                              - LSR search space complexity limit
  --lsr-drop-scaled-reg-for-vscale                                           - Avoid using scaled registers with vscale-relative addressing
  --lsr-drop-solution                                                        - Attempt to drop solution if it is less profitable
  --lsr-enable-vscale-immediates                                             - Enable analysis of vscale-relative immediates in LSR
  --lsr-exp-narrow                                                           - Narrow LSR complex solution using expectation of registers number
  --lsr-filter-same-scaled-reg                                               - Narrow LSR search space by filtering non-optimal formulae with the same ScaledReg and Scale
  --lsr-insns-cost                                                           - Add instruction count to a LSR cost model
  --lsr-preferred-addressing-mode=<value>                                    - A flag that overrides the target's preferred addressing mode.
    =none                                                                    -   Don't prefer any addressing mode
    =preindexed                                                              -   Prefer pre-indexed addressing mode
    =postindexed                                                             -   Prefer post-indexed addressing mode
  --lsr-setupcost-depth-limit=<uint>                                         - The limit on recursion depth for LSRs setup cost
  --lv-strided-pointer-ivs                                                   - Enable recognition of non-constant strided pointer induction variables.
  -m <filename>                                                              - json model filename
  --machine-combiner-dump-subst-intrs                                        - Dump all substituted intrs
  --machine-combiner-inc-threshold=<uint>                                    - Incremental depth computation will be used for basic blocks with more instructions.
  --machine-combiner-verify-pattern-order                                    - Verify that the generated patterns are ordered by increasing latency
  --machine-outliner-reruns=<uint>                                           - Number of times to rerun the outliner after the initial outline
  --machine-sink-bfi                                                         - Use block frequency info to find successors to sink
  --machine-sink-cycle-limit=<uint>                                          - The maximum number of instructions considered for cycle sinking.
  --machine-sink-load-blocks-threshold=<uint>                                - Do not try to find alias store for a load if the block number in the straight line is higher than this threshold.
  --machine-sink-load-instrs-threshold=<uint>                                - Do not try to find alias store for a load if there is a in-path block whose instruction number is higher than this threshold.
  --machine-sink-split                                                       - Split critical edges during machine sinking
  --machine-sink-split-probability-threshold=<uint>                          - Percentage threshold for splitting single-instruction critical edge. If the branch threshold is higher than this threshold, we allow speculative execution of up to 1 instruction to avoid branching to splitted critical edge
  --mandatory-inlining-first                                                 - Perform mandatory inlinings module-wide, before performing inlining
  --matrix-allow-contract                                                    - Allow the use of FMAs if available and profitable. This may result in different results, due to less rounding error.
  --matrix-default-layout=<value>                                            - Sets the default matrix layout
    =column-major                                                            -   Use column-major layout
    =row-major                                                               -   Use row-major layout
  --matrix-print-after-transpose-opt                                         - 
  --max-booleans-in-control-flow-hub=<uint>                                  - Set the maximum number of outgoing blocks for using a boolean value to record the exiting block in the ControlFlowHub.
  --max-bytes-for-alignment=<uint>                                           - Forces the maximum bytes allowed to be emitted when padding for alignment
  --max-counter-promotions=<int>                                             - Max number of allowed counter promotions
  --max-counter-promotions-per-loop=<uint>                                   - Max number counter promotions per loop to avoid increasing register pressure too much
  --max-deopt-or-unreachable-succ-check-depth=<uint>                         - Set the maximum path length when checking whether a basic block is followed by a block that either has a terminating deoptimizing call or is terminated with an unreachable
  --max-dependences=<uint>                                                   - Maximum number of dependences collected by loop-access analysis (default = 100)
  --max-forked-scev-depth=<uint>                                             - Maximum recursion depth when finding forked SCEVs (default = 5)
  --max-heap-to-stack-size=<int>                                             - 
  --max-inst-checked-for-throw-during-inlining=<uint>                        - the maximum number of instructions analyzed for may throw during attribute inference in inlined body
  --max-interleave-group-factor=<uint>                                       - Maximum factor for an interleaved access group (default = 8)
  --max-jump-table-size=<uint>                                               - Set maximum size of jump tables.
  --max-loads-per-memcmp=<uint>                                              - Set maximum number of loads used in expanded memcmp
  --max-loads-per-memcmp-opt-size=<uint>                                     - Set maximum number of loads used in expanded memcmp for -Os/Oz
  --max-nested-scalar-reduction-interleave=<uint>                            - The maximum interleave count to use when interleaving a scalar reduction in a nested loop.
  --max-num-inline-blocks=<uint>                                             - Max number of blocks to be partially inlined
  --max-partial-inlining=<int>                                               - Max number of partial inlining. The default is unlimited
  --max-phi-entries-increase-after-removing-empty-block=<uint>               - Stop removing an empty block if removing it will introduce more than this number of phi entries in its successor
  --max-prefetch-iters-ahead=<uint>                                          - Max number of iterations to prefetch ahead
  --max-speculation-depth=<uint>                                             - Limit maximum recursion depth when calculating costs of speculatively executed instructions
  --max-switch-cases-per-result=<uint>                                       - Limit cases to analyze when converting a switch to select
  --max-uses-for-sinking=<uint>                                              - Do not sink instructions that have too many uses.
  --mcfg-dot-filename-prefix=<string>                                        - The prefix used for the Machine CFG dot file names.
  --mcfg-func-name=<string>                                                  - The name of a function (or its substring) whose CFG is viewed/printed.
  --mcp-use-is-copy-instr                                                    - 
  --medium-basic-block-instruction-threshold=<uint>                          - The minimum number of instructions a basic block should contain before being considered medium-sized.
  --mem-intrinsic-expand-size=<long>                                         - Set minimum mem intrinsic size to expand in IR
  --mem-loc-frag-fill                                                        - 
  --memchr-inline-threshold=<uint>                                           - The maximum length of a constant string to inline a memchr call.
  --memcmp-num-loads-per-block=<uint>                                        - The number of loads per basic block for inline expansion of memcmp that is only being compared against zero.
  --memdep-block-number-limit=<uint>                                         - The number of blocks to scan during memory dependency analysis (default = 200)
  --memdep-block-scan-limit=<uint>                                           - The number of instructions to scan in a block in memory dependency analysis (default = 100)
  --memop-max-annotations=<uint>                                             - Max number of precise value annotations for a single memopintrinsic
  --memop-value-prof-max-opt-size=<uint>                                     - Optimize the memop size <= this value
  --memory-check-merge-threshold=<uint>                                      - Maximum number of comparisons done when trying to merge runtime memory checks. (default = 100)
  --memprof-allow-recursive-callsites                                        - Allow cloning of callsites involved in recursive cycles
  --memprof-allow-recursive-contexts                                         - Allow cloning of contexts through recursive cycles
  --memprof-ave-lifetime-cold-threshold=<uint>                               - The average lifetime (s) for an allocation to be considered cold
  --memprof-cloning-cold-threshold=<uint>                                    - Min percent of cold bytes to hint alloc cold during cloning
  --memprof-debug=<int>                                                      - debug
  --memprof-debug-func=<string>                                              - Debug func
  --memprof-debug-max=<int>                                                  - Debug max inst
  --memprof-debug-min=<int>                                                  - Debug min inst
  --memprof-dot-file-path-prefix=<filename>                                  - Specify the path prefix of the MemProf dot files.
  --memprof-dump-ccg                                                         - Dump CallingContextGraph to stdout after each stage.
  --memprof-export-to-dot                                                    - Export graph to dot files.
  --memprof-guard-against-version-mismatch                                   - Guard against compiler/runtime version mismatch.
  --memprof-histogram                                                        - Collect access count histograms
  --memprof-import-summary=<string>                                          - Import summary to use for testing the ThinLTO backend via opt
  --memprof-instrument-atomics                                               - instrument atomic instructions (rmw, cmpxchg)
  --memprof-instrument-reads                                                 - instrument read instructions
  --memprof-instrument-stack                                                 - Instrument scalar stack variables
  --memprof-instrument-writes                                                - instrument write instructions
  --memprof-keep-all-not-cold-contexts                                       - Keep all non-cold contexts (increases cloning overheads)
  --memprof-lifetime-access-density-cold-threshold=<number>                  - The threshold the lifetime access density (accesses per byte per lifetime sec) must be under to consider an allocation cold
  --memprof-mapping-granularity=<int>                                        - granularity of memprof shadow mapping
  --memprof-mapping-scale=<int>                                              - scale of memprof shadow mapping
  --memprof-match-hot-cold-new                                               - Match allocation profiles onto existing hot/cold operator new calls
  --memprof-matching-cold-threshold=<uint>                                   - Min percent of cold bytes matched to hint allocation cold
  --memprof-memory-access-callback-prefix=<string>                           - Prefix for memory access callbacks
  --memprof-min-ave-lifetime-access-density-hot-threshold=<uint>             - The minimum TotalLifetimeAccessDensity / AllocCount for an allocation to be considered hot
  --memprof-print-match-info                                                 - Print matching stats for each allocation context in this module's profiles
  --memprof-report-hinted-sizes                                              - Report total allocation sizes of hinted allocations
  --memprof-require-definition-for-promotion                                 - Require target function definition when promoting indirect calls
  --memprof-runtime-default-options=<string>                                 - The default memprof options
  --memprof-salvage-stale-profile                                            - Salvage stale MemProf profile
  --memprof-tail-call-search-depth=<uint>                                    - Max depth to recursively search for missing frames through tail calls.
  --memprof-use-callbacks                                                    - Use callbacks instead of inline instrumentation sequences.
  --memprof-use-hot-hints                                                    - Enable use of hot hints (only supported for unambigously hot allocations)
  --memprof-verify-ccg                                                       - Perform verification checks on CallingContextGraph.
  --memprof-verify-nodes                                                     - Perform frequent verification checks on nodes.
  --memssa-check-limit=<uint>                                                - The maximum number of stores/phis MemorySSAwill consider trying to walk past (default = 100)
  --mergefunc-preserve-debug-info                                            - Preserve debug info in thunk when mergefunc transformations are made.
  --mergefunc-use-aliases                                                    - Allow mergefunc to create aliases
  --mergefunc-verify=<uint>                                                  - How many functions in a module could be used for MergeFunctions to pass a basic correctness check. '0' disables this check. Works only with '-debug' key.
  --mfs-count-threshold=<uint>                                               - Minimum number of times a block must be executed to be retained.
  --mfs-psi-cutoff=<uint>                                                    - Percentile profile summary cutoff used to determine cold blocks. Unused if set to zero.
  --mfs-split-ehcode                                                         - Splits all EH code and it's descendants by default.
  --min-block-execution=<uint>                                               - Minimum block executions to consider its BranchProbabilityInfo valid
  --min-call-count-for-cg-matching=<uint>                                    - The minimum number of call anchors required for a function to run stale profile call graph matching.
  --min-func-count-for-cg-matching=<uint>                                    - The minimum number of basic blocks required for a function to run stale profile call graph matching.
  --min-functions-for-staleness-error=<uint>                                 - Skip the check if the number of hot functions is smaller than the specified number.
  --min-jump-table-entries=<uint>                                            - Set minimum number of entries to use a jump table.
  --min-page-size=<uint>                                                     - Use this to override the target's minimum page size.
  --min-prefetch-stride=<uint>                                               - Min stride to add prefetches
  --min-region-size-ratio=<number>                                           - Minimum ratio comparing relative sizes of each outline candidate and original function
  --mir-debug-loc                                                            - Print MIR debug-locations
  --mir-strip-debugify-only                                                  - Should mir-strip-debug only strip debug info from debugified modules by default
  --mir-vreg-namer-use-stable-hash                                           - Use Stable Hashing for MIR VReg Renaming
  --misched=<value>                                                          - Machine instruction scheduler to use
    =default                                                                 -   Use the target's default scheduler choice.
    =converge                                                                -   Standard converging scheduler.
    =ilpmax                                                                  -   Schedule bottom-up for max ILP
    =ilpmin                                                                  -   Schedule bottom-up for min ILP
    =shuffle                                                                 -   Shuffle machine instructions alternating directions
  --misched-cluster                                                          - Enable memop clustering.
  --misched-cutoff=<uint>                                                    - Stop scheduling after N instructions
  --misched-cyclicpath                                                       - Enable cyclic critical path analysis.
  --misched-dcpl                                                             - Print critical path length to stdout
  --misched-detail-resource-booking                                          - Show details of invoking getNextResoufceCycle.
  --misched-dump-reserved-cycles                                             - Dump resource usage at schedule boundary.
  --misched-dump-schedule-trace                                              - Dump resource usage at schedule boundary.
  --misched-dump-schedule-trace-col-header-width=<uint>                      - Set width of the columns with the resources and schedule units
  --misched-dump-schedule-trace-col-width=<uint>                             - Set width of the columns showing resource booking.
  --misched-fusion                                                           - Enable scheduling for macro fusion.
  --misched-limit=<uint>                                                     - Limit ready list to N instructions
  --misched-only-block=<uint>                                                - Only schedule this MBB#
  --misched-only-func=<string>                                               - Only schedule this function
  --misched-postra                                                           - Run MachineScheduler post regalloc (independent of preRA sched)
  --misched-postra-direction=<value>                                         - Post reg-alloc list scheduling direction
    =topdown                                                                 -   Force top-down post reg-alloc list scheduling
    =bottomup                                                                -   Force bottom-up post reg-alloc list scheduling
    =bidirectional                                                           -   Force bidirectional post reg-alloc list scheduling
  --misched-prera-direction=<value>                                          - Pre reg-alloc list scheduling direction
    =topdown                                                                 -   Force top-down pre reg-alloc list scheduling
    =bottomup                                                                -   Force bottom-up pre reg-alloc list scheduling
    =bidirectional                                                           -   Force bidirectional pre reg-alloc list scheduling
  --misched-print-dags                                                       - Print schedule DAGs
  --misched-regpressure                                                      - Enable register pressure scheduling.
  --misched-resource-cutoff=<uint>                                           - Number of intervals to track
  --misched-sort-resources-in-trace                                          - Sort the resources printed in the dump trace
  --misexpect-tolerance=<uint>                                               - Prevents emitting diagnostics when profile counts are within N% of the threshold..
  --misfetch-cost=<uint>                                                     - Cost that models the probabilistic risk of an instruction misfetch due to a jump comparing to falling through, whose cost is zero.
  --mispredict-default-rate=<uint>                                           - Default mispredict rate (initialized to 25%).
  --ml-advisor-keep-fpi-cache                                                - For test - keep the ML Inline advisor's FunctionPropertiesInfo cache
  --ml-advisor-size-increase-threshold=<number>                              - Maximum factor by which expected native size may increase before blocking any further inlining.
  --ml-inliner-model-selector=<string>                                       - 
  --ml-inliner-skip-policy=<value>                                           - 
    =never                                                                   -   never
    =if-caller-not-cold                                                      -   if the caller is not cold
  --mlregalloc-max-eviction-count=<uint>                                     - The maximum number of times a live range can be evicted before preventing it from being evicted
  --module-inliner-top-priority-threshold=<int>                              - The cost threshold for call sites that get inlined without the cost-benefit analysis
  --module-summary-dot-file=<filename>                                       - File to emit dot graph of new summary into
  --move-auto-init-threshold=<uint>                                          - Maximum instructions to analyze per moved initialization
  --msan-and-mask=<ulong>                                                    - Define custom MSan AndMask
  --msan-check-access-address                                                - report accesses through a pointer which has poisoned shadow
  --msan-check-constant-shadow                                               - Insert checks for constant shadow values
  --msan-disable-checks                                                      - Apply no_sanitize to the whole file
  --msan-disambiguate-warning-threshold=<int>                                - Define threshold for number of checks per debug location to force origin update.
  --msan-dump-strict-instructions                                            - print out instructions with default strict semantics
  --msan-dump-strict-intrinsics                                              - Prints 'unknown' intrinsics that were handled heuristically. Use -msan-dump-strict-instructions to print intrinsics that could not be handled exactly nor heuristically.
  --msan-eager-checks                                                        - check arguments and return values at function call boundaries
  --msan-handle-asm-conservative                                             - conservative handling of inline assembly
  --msan-handle-icmp                                                         - propagate shadow through ICmpEQ and ICmpNE
  --msan-handle-icmp-exact                                                   - exact handling of relational integer ICmp
  --msan-handle-lifetime-intrinsics                                          - when possible, poison scoped variables at the beginning of the scope (slower, but more precise)
  --msan-instrumentation-with-call-threshold=<int>                           - If the function being instrumented requires more than this number of checks and origin stores, use callbacks instead of inline checks (-1 means never use callbacks).
  --msan-keep-going                                                          - keep going after reporting a UMR
  --msan-kernel                                                              - Enable KernelMemorySanitizer instrumentation
  --msan-origin-base=<ulong>                                                 - Define custom MSan OriginBase
  --msan-poison-stack                                                        - poison uninitialized stack variables
  --msan-poison-stack-pattern=<int>                                          - poison uninitialized stack variables with the given pattern
  --msan-poison-stack-with-call                                              - poison uninitialized stack variables with a call
  --msan-poison-undef                                                        - poison undef temps
  --msan-print-stack-names                                                   - Print name of local stack variable
  --msan-shadow-base=<ulong>                                                 - Define custom MSan ShadowBase
  --msan-track-origins=<int>                                                 - Track origins (allocation sites) of poisoned memory
  --msan-with-comdat                                                         - Place MSan constructors in comdat sections
  --msan-xor-mask=<ulong>                                                    - Define custom MSan XorMask
  --no-discriminators                                                        - Disable generation of discriminator information.
  --no-kernel-info-end-lto                                                   - remove the kernel-info pass at the end of the full LTO pipeline
  --no-pgo-warn-mismatch                                                     - Use this option to turn off/on warnings about profile cfg mismatch.
  --no-pgo-warn-mismatch-comdat-weak                                         - The option is used to turn on/off warnings about hash mismatch for comdat or weak functions.
  --no-phi-elim-live-out-early-exit                                          - Do not use an early exit if isLiveOutPastPHIs returns true.
  --no-stack-coloring                                                        - Disable stack coloring
  --no-stack-slot-sharing                                                    - Suppress slot sharing during stack coloring
  --no-warn-sample-unused                                                    - Use this option to turn off/on warnings about function with samples but without debug information to use those samples. 
  --non-global-value-max-name-size=<int>                                     - Maximum size for the name of non-global values.
  --norm-fold-all                                                            - Folds all regular instructions (including pre-outputs)
  --norm-preserve-order                                                      - Preserves original instruction order
  --norm-rename-all                                                          - Renames all instructions (including user-named)
  --norm-reorder-operands                                                    - Sorts and reorders operands in commutative instructions
  --notcold-new-hint-value=<uint>                                            - Value to pass to hot/cold operator new for notcold (warm) allocation
  --nsan-check-loads                                                         - Check floating-point load
  --nsan-check-ret                                                           - Check floating-point return values
  --nsan-check-stores                                                        - Check floating-point stores
  --nsan-instrument-fcmp                                                     - Instrument floating-point comparisons
  --nsan-propagate-non-ft-const-stores-as-ft                                 - Propagate non floating-point const stores as floating point values.For debugging purposes only
  --nsan-shadow-type-mapping=<string>                                        - One shadow type id for each of `float`, `double`, `long double`. `d`,`l`,`q`,`e` mean double, x86_fp80, fp128 (quad) and ppc_fp128 (extended double) respectively. The default is to shadow `float` as `double`, and `double` and `x86_fp80` as `fp128`
  --nsan-truncate-fcmp-eq                                                    - This flag controls the behaviour of fcmp equality comparisons.For equality comparisons such as `x == 0.0f`, we can perform the shadow check in the shadow (`x_shadow == 0.0) == (x == 0.0f)`) or app  domain (`(trunc(x_shadow) == 0.0f) == (x == 0.0f)`). This helps catch the case when `x_shadow` is accurate enough (and therefore close enough to zero) so that `trunc(x_shadow)` is zero even though both `x` and `x_shadow` are not
  -o <filename>                                                              - Output filename
  --object-size-offset-visitor-max-visit-instructions=<uint>                 - Maximum number of instructions for ObjectSizeOffsetVisitor to look at
  --only-simple-regions                                                      - Show only simple regions in the graphviz viewer
  --openmp-deduce-icv-values                                                 - 
  --openmp-hide-memory-transfer-latency                                      - [WIP] Tries to hide the latency of host to device memory transfers
  --openmp-ir-builder-optimistic-attributes                                  - Use optimistic attributes describing 'as-if' properties of runtime calls.
  --openmp-ir-builder-unroll-threshold-factor=<number>                       - Factor for the unroll threshold to account for code simplifications still taking place
  --openmp-opt-disable                                                       - Disable OpenMP specific optimizations.
  --openmp-opt-disable-barrier-elimination                                   - Disable OpenMP optimizations that eliminate barriers.
  --openmp-opt-disable-deglobalization                                       - Disable OpenMP optimizations involving deglobalization.
  --openmp-opt-disable-folding                                               - Disable OpenMP optimizations involving folding.
  --openmp-opt-disable-internalization                                       - Disable function internalization.
  --openmp-opt-disable-spmdization                                           - Disable OpenMP optimizations involving SPMD-ization.
  --openmp-opt-disable-state-machine-rewrite                                 - Disable OpenMP optimizations that replace the state machine.
  --openmp-opt-enable-merging                                                - Enable the OpenMP region merging optimization.
  --openmp-opt-inline-device                                                 - Inline all applicable functions on the device.
  --openmp-opt-max-iterations=<uint>                                         - Maximal number of attributor iterations.
  --openmp-opt-print-module-after                                            - Print the current module after OpenMP optimizations.
  --openmp-opt-print-module-before                                           - Print the current module before OpenMP optimizations.
  --openmp-opt-shared-limit=<uint>                                           - Maximum amount of shared memory to use.
  --openmp-opt-verbose-remarks                                               - Enables more verbose remarks.
  --openmp-print-gpu-kernels                                                 - 
  --openmp-print-icv-values                                                  - 
  --opt-bisect-limit=<int>                                                   - Maximum optimization to perform
  --opt-bisect-verbose                                                       - Show verbose output when opt-bisect-limit is set
  --optimize-existing-hot-cold-new                                           - Enable optimization of existing hot/cold operator new library calls
  --optimize-hot-cold-new                                                    - Enable hot/cold operator new library calls
  --optimize-non-fmv-callers                                                 - Statically resolve calls to versioned functions from non-versioned callers.
  --optimize-regalloc                                                        - Enable optimized register allocation compilation path.
  --optsize-jump-table-density=<uint>                                        - Minimum density for building a jump table in an optsize function
  --orderfile-write-mapping=<string>                                         - Dump functions and their MD5 hash to deobfuscate profile data
  --outline-region-freq-percent=<int>                                        - Relative frequency of outline region to the entry block
  --outliner-benefit-threshold=<uint>                                        - The minimum size in bytes before an outlining candidate is accepted
  --outliner-leaf-descendants                                                - Consider all leaf descendants of internal nodes of the suffix tree as candidates for outlining (if false, only leaf children are considered)
  --overwrite-existing-weights                                               - Ignore existing branch weights on IR and always overwrite.
  --partial-inlining-extra-penalty=<uint>                                    - A debug option to add additional penalty to the computed one.
  --partial-profile                                                          - Specify the current profile is used as a partial profile.
  --partial-sample-profile-working-set-size-scale-factor=<number>            - The scale factor used to scale the working set size of the partial sample profile along with the partial profile ratio. This includes the factor of the profile counter per block and the factor to scale the working set size to use the same shared thresholds as PGO.
  --pass-remarks=<pattern>                                                   - Enable optimization remarks from passes whose name match the given regular expression
  --pass-remarks-analysis=<pattern>                                          - Enable optimization analysis remarks from passes whose name match the given regular expression
  --pass-remarks-missed=<pattern>                                            - Enable missed optimization remarks from passes whose name match the given regular expression
  --persist-profile-staleness                                                - Compute stale profile statistical metrics and write it into the native object file(.llvm_stats section).
  --pgo-block-coverage                                                       - Use this option to enable basic block coverage instrumentation
  --pgo-cold-instrument-entry-threshold=<ulong>                              - For cold function instrumentation, skip instrumenting functions whose entry count is above the given value.
  --pgo-critical-edge-threshold=<uint>                                       - Do not instrument functions with the number of critical edges  greater than this threshold.
  --pgo-emit-branch-prob                                                     - When this option is on, the annotated branch probability will be emitted as optimization remarks: -{Rpass|pass-remarks}=pgo-instrumentation
  --pgo-fix-entry-count                                                      - Fix function entry count in profile use.
  --pgo-function-entry-coverage                                              - Use this option to enable function entry coverage instrumentation.
  --pgo-function-size-threshold=<uint>                                       - Do not instrument functions smaller than this threshold.
  --pgo-instr-memop                                                          - Use this option to turn on/off memory intrinsic size profiling.
  --pgo-instr-select                                                         - Use this option to turn on/off SELECT instruction instrumentation. 
  --pgo-instrument-cold-function-only                                        - Enable cold function only instrumentation.
  --pgo-instrument-entry                                                     - Force to instrument function entry basicblock.
  --pgo-instrument-loop-entries                                              - Force to instrument loop entries.
  --pgo-memop-count-threshold=<uint>                                         - The minimum count to optimize memory intrinsic calls
  --pgo-memop-max-version=<uint>                                             - The max version for the optimized memory  intrinsic calls
  --pgo-memop-optimize-memcmp-bcmp                                           - Size-specialize memcmp and bcmp calls
  --pgo-memop-percent-threshold=<uint>                                       - The percentage threshold for the memory intrinsic calls optimization
  --pgo-memop-scale-count                                                    - Scale the memop size counts using the basic  block count value
  --pgo-temporal-instrumentation                                             - Use this option to enable temporal instrumentation
  --pgo-test-profile-file=<filename>                                         - Specify the path of profile data file. This is mainly for test purpose.
  --pgo-test-profile-remapping-file=<filename>                               - Specify the path of profile remapping file. This is mainly for test purpose.
  --pgo-trace-func-hash=<function name>                                      - Trace the hash of the function with this name.
  --pgo-treat-unknown-as-cold                                                - For cold function instrumentation, treat count unknown(e.g. unprofiled) functions as cold.
  --pgo-verify-bfi                                                           - Print out mismatched BFI counts after setting profile metadata The print is enabled under -Rpass-analysis=pgo, or internal option -pass-remarks-analysis=pgo.
  --pgo-verify-bfi-cutoff=<uint>                                             - Set the threshold for pgo-verify-bfi: skip the counts whose profile count value is below.
  --pgo-verify-bfi-ratio=<uint>                                              - Set the threshold for pgo-verify-bfi:  only print out mismatched BFI if the difference percentage is greater than this value (in percentage).
  --pgo-verify-hot-bfi                                                       - Print out the non-match BFI count if a hot raw profile count becomes non-hot, or a cold raw profile count becomes hot. The print is enabled under -Rpass-analysis=pgo, or internal option -pass-remarks-analysis=pgo.
  --pgo-view-block-coverage-graph                                            - Create a dot file of CFGs with block coverage inference information
  --pgo-view-counts=<value>                                                  - A boolean option to show CFG dag or text with block profile counts and branch probabilities right after PGO profile annotation step. The profile counts are computed using branch probabilities from the runtime profile data and block frequency propagation algorithm. To view the raw counts from the profile, use option -pgo-view-raw-counts instead. To limit graph display to only one function, use filtering option -view-bfi-func-name.
    =none                                                                    -   do not show.
    =graph                                                                   -   show a graph.
    =text                                                                    -   show in text.
  --pgo-view-raw-counts=<value>                                              - A boolean option to show CFG dag or text with raw profile counts from profile data. See also option -pgo-view-counts. To limit graph display to only one function, use filtering option -view-bfi-func-name.
    =none                                                                    -   do not show.
    =graph                                                                   -   show a graph.
    =text                                                                    -   show in text.
  --pgo-warn-misexpect                                                       - Use this option to turn on/off warnings about incorrect usage of llvm.expect intrinsics.
  --pgo-warn-missing-function                                                - Use this option to turn on/off warnings about missing profile data for functions.
  --pgso                                                                     - Enable the profile guided size optimizations. 
  --pgso-cold-code-only                                                      - Apply the profile guided size optimizations only to cold code.
  --pgso-cold-code-only-for-instr-pgo                                        - Apply the profile guided size optimizations only to cold code under instrumentation PGO.
  --pgso-cold-code-only-for-partial-sample-pgo                               - Apply the profile guided size optimizations only to cold code under partial-profile sample PGO.
  --pgso-cold-code-only-for-sample-pgo                                       - Apply the profile guided size optimizations only to cold code under sample PGO.
  --pgso-cutoff-instr-prof=<int>                                             - The profile guided size optimization profile summary cutoff for instrumentation profile.
  --pgso-cutoff-sample-prof=<int>                                            - The profile guided size optimization profile summary cutoff for sample profile.
  --pgso-lwss-only                                                           - Apply the profile guided size optimizations only if the working set size is large (except for cold code.)
  --phi-elim-split-all-critical-edges                                        - Split all critical edges during PHI elimination
  --phi-node-folding-threshold=<uint>                                        - Control the amount of phi node folding to perform (default = 2)
  --phicse-debug-hash                                                        - Perform extra assertion checking to verify that PHINodes's hash function is well-behaved w.r.t. its isEqual predicate
  --phicse-num-phi-smallsize=<uint>                                          - When the basic block contains not more than this number of PHI nodes, perform a (faster!) exhaustive search instead of set-driven one.
  --pi-force-live-exit-outline                                               - Force outline regions with live exits
  --pi-mark-coldcc                                                           - Mark outline function calls with ColdCC
  --pipeliner-annotate-for-testing                                           - Instead of emitting the pipelined code, annotate instructions with the generated schedule for feeding into the -modulo-schedule-test pass
  --pipeliner-dbg-res                                                        - 
  --pipeliner-experimental-cg                                                - Use the experimental peeling code generator for software pipelining
  --pipeliner-force-ii=<int>                                                 - Force pipeliner to use specified II.
  --pipeliner-force-issue-width=<int>                                        - Force pipeliner to use specified issue width.
  --pipeliner-ii-search-range=<int>                                          - Range to search for II
  --pipeliner-max=<int>                                                      - 
  --pipeliner-max-mii=<int>                                                  - Size limit for the MII.
  --pipeliner-max-stages=<int>                                               - Maximum stages allowed in the generated scheduled.
  --pipeliner-mve-cg                                                         - Use the MVE code generator for software pipelining
  --pipeliner-prune-deps                                                     - Prune dependences between unrelated Phi nodes.
  --pipeliner-prune-loop-carried                                             - Prune loop carried order dependences.
  --pipeliner-register-pressure                                              - Limit register pressure of scheduled loop
  --pipeliner-register-pressure-margin=<int>                                 - Margin representing the unused percentage of the register pressure limit
  --pipeliner-show-mask                                                      - 
  --pipeliner-swap-branch-targets-mve                                        - Swap target blocks of a conditional branch for MVE expander
  --post-RA-scheduler                                                        - Enable scheduling after register allocation
  --postra-sched-debugdiv=<int>                                              - Debug control MBBs that are scheduled
  --postra-sched-debugmod=<int>                                              - Debug control MBBs that are scheduled
  --pragma-unroll-and-jam-threshold=<uint>                                   - Unrolled size limit for loops with an unroll_and_jam(full) or unroll_count pragma.
  --pragma-unroll-full-max-iterations=<uint>                                 - Maximum allowed iterations to unroll under pragma unroll full.
  --pragma-unroll-threshold=<uint>                                           - Unrolled size limit for loops with an unroll(full) or unroll_count pragma.
  --pragma-vectorize-scev-check-threshold=<uint>                             - The maximum number of SCEV checks allowed with a vectorize(enable) pragma
  --precent-mismatch-for-staleness-error=<uint>                              - Reject the profile if the mismatch percent is higher than the given number.
  --precise-rotation-cost                                                    - Model the cost of loop rotation more precisely by using profile data.
  --precompute-phys-liveness                                                 - Eagerly compute live intervals for all physreg units.
  --predictable-branch-threshold=<uint>                                      - Use this to override the target's predictable branch threshold (%).
  --prefer-inloop-reductions                                                 - Prefer in-loop vector reductions, overriding the targets preference.
  --prefer-predicate-over-epilogue=<value>                                   - Tail-folding and predication preferences over creating a scalar epilogue loop.
    =scalar-epilogue                                                         -   Don't tail-predicate loops, create scalar epilogue
    =predicate-else-scalar-epilogue                                          -   prefer tail-folding, create scalar epilogue if tail folding fails.
    =predicate-dont-vectorize                                                -   prefers tail-folding, don't attempt vectorization if tail-folding fails.
  --prefer-predicated-reduction-select                                       - Prefer predicating a reduction operation over an after loop select.
  --prefetch-distance=<uint>                                                 - Number of instructions to prefetch ahead
  --preinline-threshold=<int>                                                - Control the amount of inlining in pre-instrumentation inliner (default = 75)
  --preserve-alignment-assumptions-during-inlining                           - Convert align attributes to assumptions during inlining.
  --preserve-input-debuginfo-format                                          - When set to true, IR files will be processed and printed in their current debug info format, regardless of default behaviour or other flags passed. Has no effect if input IR does not contain debug records or intrinsics. Ignored in llvm-link, llvm-lto, and llvm-lto2.
  --print-after=<string>                                                     - Print IR after specified passes
  --print-after-all                                                          - Print IR after each pass
  --print-after-isel                                                         - Print machine instrs after ISel
  --print-all-reaching-defs                                                  - Used for test purpuses
  --print-before=<string>                                                    - Print IR before specified passes
  --print-before-all                                                         - Print IR before each pass
  --print-bfi                                                                - Print the block frequency info.
  --print-bfi-func-name=<string>                                             - The option to specify the name of the function whose block frequency info is printed.
  --print-bpi                                                                - Print the branch probability info.
  --print-bpi-func-name=<string>                                             - The option to specify the name of the function whose branch probability info is printed.
  --print-changed                                                            - Print changed IRs
  --print-changed=<value>                                                    - Print changed IRs
    =quiet                                                                   -   Run in quiet mode
    =diff                                                                    -   Display patch-like changes
    =diff-quiet                                                              -   Display patch-like changes in quiet mode
    =cdiff                                                                   -   Display patch-like changes with color
    =cdiff-quiet                                                             -   Display patch-like changes in quiet mode with color
    =dot-cfg                                                                 -   Create a website with graphical changes
    =dot-cfg-quiet                                                           -   Create a website with graphical changes in quiet mode
  --print-changed-diff-path=<string>                                         - system diff used by change reporters
  --print-debug-ata                                                          - 
  --print-debug-counter                                                      - Print out debug counter info after all counters accumulated
  --print-import-failures                                                    - Print information for functions rejected for importing
  --print-imports                                                            - Print imported functions
  --print-instruction-comments                                               - Prints comments for instruction based on inline cost analysis
  --print-isel-input                                                         - Print LLVM IR input to isel pass
  --print-loop-func-scope                                                    - When printing IR for print-[before|after]{-all} for a loop pass, always print function IR
  --print-machine-bfi                                                        - Print the machine block frequency info.
  --print-module-scope                                                       - When printing IR for print-[before|after]{-all} always print a module IR
  --print-pipeline-passes                                                    - Print a '-passes' compatible string describing the pipeline (best-effort only).
  --print-region-style=<value>                                               - style of printing regions
    =none                                                                    -   print no details
    =bb                                                                      -   print regions in detail with block_iterator
    =rn                                                                      -   print regions in detail with element_iterator
  --print-regmask-num-regs=<int>                                             - Number of registers to limit to when printing regmask operands in IR dumps. unlimited = -1
  --print-regusage                                                           - print register usage details collected for analysis.
  --print-slotindexes                                                        - When printing machine IR, annotate instructions and blocks with SlotIndexes when available
  --print-summary-global-ids                                                 - Print the global id for each value when reading the module summary
  --profile-accurate-for-symsinlist                                          - For symbols in profile symbol list, regard their profiles to be accurate. It may be overridden by profile-sample-accurate. 
  --profile-context-root=<string>                                            - A function name, assumed to be global, which will be treated as the root of an interesting graph, which will be profiled independently from other similar graphs.
  --profile-correlate=<value>                                                - Use debug info or binary file to correlate profiles.
    =<empty>                                                                 -   No profile correlation
    =debug-info                                                              -   Use debug info to correlate
    =binary                                                                  -   Use binary to correlate
  --profile-guided-section-prefix                                            - Use profile info to add section prefix for hot/cold functions
  --profile-isfs                                                             - Profile uses flow sensitive discriminators
  --profile-likely-prob=<uint>                                               - branch probability threshold in percentage to be considered very likely when profile is available
  --profile-sample-accurate                                                  - If the sample profile is accurate, we will mark all un-sampled callsite and function as having 0 samples. Otherwise, treat un-sampled callsites and functions conservatively as unknown. 
  --profile-sample-block-accurate                                            - If the sample profile is accurate, we will mark all un-sampled branches and calls as having 0 samples. Otherwise, treat them conservatively as unknown. 
  --profile-summary-contextless                                              - Merge context profiles before calculating thresholds.
  --profile-summary-cutoff-cold=<int>                                        - A count is cold if it is below the minimum count to reach this percentile of total counts.
  --profile-summary-cutoff-hot=<int>                                         - A count is hot if it exceeds the minimum count to reach this percentile of total counts.
  --profile-summary-huge-working-set-size-threshold=<uint>                   - The code working set size is considered huge if the number of blocks required to reach the -profile-summary-cutoff-hot percentile exceeds this count.
  --profile-summary-large-working-set-size-threshold=<uint>                  - The code working set size is considered large if the number of blocks required to reach the -profile-summary-cutoff-hot percentile exceeds this count.
  --profile-symbol-list-cutoff=<ulong>                                       - Cutoff value about how many symbols in profile symbol list will be used. This is very useful for performance debugging
  --profile-unknown-in-special-section                                       - In profiling mode like sampleFDO, if a function doesn't have profile, we cannot tell the function is cold for sure because it may be a function newly added without ever being sampled. With the flag enabled, compiler can put such profile unknown functions into a special section, so runtime system can choose to handle it in a different way than .text section, to save RAM for example. 
  --propagate-attrs                                                          - Propagate attributes in index
  --protect-from-escaped-allocas                                             - Do not optimize lifetime zones that are broken
  --rafast-ignore-missing-defs                                               - 
  --reassociate-geps-verify-no-dead-code                                     - Verify this pass produces no dead code
  --reassociate-use-cse-local                                                - Only reorder expressions within a basic block when exposing CSE opportunities
  --recurrence-chain-limit=<uint>                                            - Maximum length of recurrence chain when evaluating the benefit of commuting operands
  --recursive-inline-max-stacksize=<ulong>                                   - Do not inline recursive functions with a stack size that exceeds the specified limit
  --regalloc=<value>                                                         - Register allocator to use
    =fast                                                                    -   fast register allocator
    =default                                                                 -   pick register allocator based on -O option
    =basic                                                                   -   basic register allocator
    =greedy                                                                  -   greedy register allocator
  --regalloc-csr-first-time-cost=<uint>                                      - Cost for first time use of callee-saved register.
  --regalloc-enable-advisor=<value>                                          - Enable regalloc advisor mode
    =default                                                                 -   Default
    =release                                                                 -   precompiled
    =development                                                             -   for training
  --regalloc-enable-priority-advisor=<value>                                 - Enable regalloc advisor mode
    =default                                                                 -   Default
    =release                                                                 -   precompiled
    =development                                                             -   for training
    =dummy                                                                   -   prioritize low virtual register numbers for test and debug
  --regalloc-evict-interactive-channel-base=<string>                         - Base file path for the interactive mode. The incoming filename should have the name <regalloc-evict-interactive-channel-base>.in, while the outgoing name should be <regalloc-evict-interactive-channel-base>.out
  --regalloc-eviction-max-interference-cutoff=<uint>                         - Number of interferences after which we declare an interference unevictable and bail out. This is a compilation cost-saving consideration. To disable, pass a very large number.
  --regalloc-priority-interactive-channel-base=<string>                      - Base file path for the interactive mode. The incoming filename should have the name <regalloc-priority-interactive-channel-base>.in, while the outgoing name should be <regalloc-priority-interactive-channel-base>.out
  --remarks-section                                                          - Emit a section containing remark diagnostics metadata. By default, this is enabled for the following formats: yaml-strtab, bitstream.
  --rename-exclude-alias-prefixes=<string>                                   - Prefixes for aliases that don't need to be renamed, separated by a comma
  --rename-exclude-function-prefixes=<string>                                - Prefixes for functions that don't need to be renamed, separated by a comma
  --rename-exclude-global-prefixes=<string>                                  - Prefixes for global values that don't need to be renamed, separated by a comma
  --rename-exclude-struct-prefixes=<string>                                  - Prefixes for structs that don't need to be renamed, separated by a comma
  --rename-only-inst                                                         - only rename the instructions in the function
  --renumber-blocks-before-view                                              - If true, basic blocks are re-numbered before MBP layout is printed into a dot graph. Only used when a function is being printed.
  --replexitval=<value>                                                      - Choose the strategy to replace exit value in IndVarSimplify
    =never                                                                   -   never replace exit value
    =cheap                                                                   -   only replace exit value when the cost is cheap
    =unusedindvarinloop                                                      -   only replace exit value when it is an unused induction variable in the loop and has cheap replacement cost
    =noharduse                                                               -   only replace exit values when loop def likely dead
    =always                                                                  -   always replace exit value whenever possible
  --report-profile-staleness                                                 - Compute and report stale profile statistical metrics.
  --restrict-statepoint-remat                                                - Restrict remat for statepoint operands
  --rewrite-map-file=<filename>                                              - Symbol Rewrite Map
  --rewrite-phi-limit=<uint>                                                 - Limit the length of PHI chains to lookup
  --rng-seed=<seed>                                                          - Seed for the random number generator
  --rotation-max-header-size=<uint>                                          - The default maximum header size for automatic loop rotation
  --rotation-prepare-for-lto                                                 - Run loop-rotation in the prepare-for-lto stage. This option should be used for testing only.
  --rs4gc-allow-statepoint-with-no-deopt-info                                - 
  --rs4gc-clobber-non-live                                                   - 
  --rs4gc-remat-derived-at-uses                                              - 
  --runtime-check-per-loop-load-elim=<uint>                                  - Max number of memchecks allowed per eliminated load on average
  --runtime-counter-relocation                                               - Enable relocating counters at runtime.
  --runtime-memory-check-threshold=<uint>                                    - When performing memory disambiguation checks at runtime do not generate more than this number of comparisons (default = 8).
  --safe-stack-coloring                                                      - enable safe stack coloring
  --safe-stack-layout                                                        - enable safe stack layout
  --safepoint-ir-verifier-print-only                                         - 
  --safestack-use-pointer-address                                            - 
  --salvage-stale-profile                                                    - Salvage stale profile by fuzzy matching and use the remapped location for sample profile query.
  --salvage-stale-profile-max-callsites=<uint>                               - The maximum number of callsites in a function, above which stale profile matching will be skipped.
  --salvage-unused-profile                                                   - Salvage unused profile by matching with new functions on call graph.
  --sample-profile-check-record-coverage=<N>                                 - Emit a warning if less than N% of records in the input profile are matched to the IR.
  --sample-profile-check-sample-coverage=<N>                                 - Emit a warning if less than N% of samples in the input profile are matched to the IR.
  --sample-profile-cold-inline-threshold=<int>                               - Threshold for inlining cold callsites
  --sample-profile-even-flow-distribution                                    - Try to evenly distribute flow when there are multiple equally likely options.
  --sample-profile-file=<filename>                                           - Profile file loaded by -sample-profile
  --sample-profile-hot-inline-threshold=<int>                                - Hot callsite threshold for proirity-based sample profile loader inlining.
  --sample-profile-icp-max-prom=<uint>                                       - Max number of promotions for a single indirect call callsite in sample profile loader
  --sample-profile-icp-relative-hotness=<uint>                               - Relative hotness percentage threshold for indirect call promotion in proirity-based sample profile loader inlining.
  --sample-profile-icp-relative-hotness-skip=<uint>                          - Skip relative hotness check for ICP up to given number of targets.
  --sample-profile-inline-growth-limit=<int>                                 - The size growth ratio limit for proirity-based sample profile loader inlining.
  --sample-profile-inline-limit-max=<int>                                    - The upper bound of size growth limit for proirity-based sample profile loader inlining.
  --sample-profile-inline-limit-min=<int>                                    - The lower bound of size growth limit for proirity-based sample profile loader inlining.
  --sample-profile-inline-replay=<filename>                                  - Optimization remarks file containing inline remarks to be replayed by inlining from sample profile loader.
  --sample-profile-inline-replay-fallback=<value>                            - How sample profile inline replay treats sites that don't come from the replay. Original: defers to original advisor, AlwaysInline: inline all sites not in replay, NeverInline: inline no sites not in replay
    =Original                                                                -   All decisions not in replay send to original advisor (default)
    =AlwaysInline                                                            -   All decisions not in replay are inlined
    =NeverInline                                                             -   All decisions not in replay are not inlined
  --sample-profile-inline-replay-format=<value>                              - How sample profile inline replay file is formatted
    =Line                                                                    -   <Line Number>
    =LineColumn                                                              -   <Line Number>:<Column Number>
    =LineDiscriminator                                                       -   <Line Number>.<Discriminator>
    =LineColumnDiscriminator                                                 -   <Line Number>:<Column Number>.<Discriminator> (default)
  --sample-profile-inline-replay-scope=<value>                               - Whether inline replay should be applied to the entire Module or just the Functions (default) that are present as callers in remarks during sample profile inlining.
    =Function                                                                -   Replay on functions that have remarks associated with them (default)
    =Module                                                                  -   Replay on the entire module
  --sample-profile-inline-size                                               - Inline cold call sites in profile loader if it's beneficial for code size.
  --sample-profile-join-islands                                              - Join isolated components having positive flow.
  --sample-profile-max-propagate-iterations=<uint>                           - Maximum number of iterations to go through when propagating sample block/edge weights through the CFG.
  --sample-profile-merge-inlinee                                             - Merge past inlinee's profile to outline version if sample profile loader decided not to inline a call site. It will only be enabled when top-down order of profile loading is enabled. 
  --sample-profile-prioritized-inline                                        - Use call site prioritized inlining for sample profile loader. Currently only CSSPGO is supported.
  --sample-profile-profi-cost-block-dec=<uint>                               - The cost of decreasing a block's count by one.
  --sample-profile-profi-cost-block-entry-dec=<uint>                         - The cost of decreasing the entry block's count by one.
  --sample-profile-profi-cost-block-entry-inc=<uint>                         - The cost of increasing the entry block's count by one.
  --sample-profile-profi-cost-block-inc=<uint>                               - The cost of increasing a block's count by one.
  --sample-profile-profi-cost-block-unknown-inc=<uint>                       - The cost of increasing an unknown block's count by one.
  --sample-profile-profi-cost-block-zero-inc=<uint>                          - The cost of increasing a count of zero-weight block by one.
  --sample-profile-rebalance-unknown                                         - Evenly re-distribute flow among unknown subgraphs.
  --sample-profile-recursive-inline                                          - Allow sample loader inliner to inline recursive calls.
  --sample-profile-remapping-file=<filename>                                 - Profile remapping file loaded by -sample-profile
  --sample-profile-remove-probe                                              - Remove pseudo-probe after sample profile annotation.
  --sample-profile-top-down-load                                             - Do profile annotation and inlining for functions in top-down order of call graph during sample profile loading. It only works for new pass manager. 
  --sample-profile-use-preinliner                                            - Use the preinliner decisions stored in profile context.
  --sample-profile-use-profi                                                 - Use profi to infer block and edge counts.
  --sampled-instr-burst-duration=<uint>                                      - Set the profile instrumentation burst duration, which can range from 1 to the value of 'sampled-instr-period' (0 is invalid). This number of samples will be recorded for each 'sampled-instr-period' count update. Setting to 1 enables simple sampling, in which case it is recommended to set 'sampled-instr-period' to a prime number.
  --sampled-instr-period=<uint>                                              - Set the profile instrumentation sample period. A sample period of 0 is invalid. For each sample period, a fixed number of consecutive samples will be recorded. The number is controlled by 'sampled-instr-burst-duration' flag. The default sample period of 65536 is optimized for generating efficient code that leverages unsigned short integer wrapping in overflow, but this is disabled under simple sampling (burst duration = 1).
  --sampled-instrumentation                                                  - Do PGO instrumentation sampling
  --sanitizer-coverage-control-flow                                          - collect control flow for each function
  --sanitizer-coverage-gated-trace-callbacks                                 - Gate the invocation of the tracing callbacks on a global variable. Currently only supported for trace-pc-guard and trace-cmp.
  --sanitizer-coverage-inline-8bit-counters                                  - increments 8-bit counter for every edge
  --sanitizer-coverage-inline-bool-flag                                      - sets a boolean flag for every edge
  --sanitizer-coverage-level=<int>                                           - Sanitizer Coverage. 0: none, 1: entry block, 2: all blocks, 3: all blocks and critical edges
  --sanitizer-coverage-pc-table                                              - create a static PC table
  --sanitizer-coverage-prune-blocks                                          - Reduce the number of instrumented blocks
  --sanitizer-coverage-stack-depth                                           - max stack depth tracing
  --sanitizer-coverage-trace-compares                                        - Tracing of CMP and similar instructions
  --sanitizer-coverage-trace-divs                                            - Tracing of DIV instructions
  --sanitizer-coverage-trace-geps                                            - Tracing of GEP instructions
  --sanitizer-coverage-trace-loads                                           - Tracing of load instructions
  --sanitizer-coverage-trace-pc                                              - Experimental pc tracing
  --sanitizer-coverage-trace-pc-guard                                        - pc tracing with a guard
  --sanitizer-coverage-trace-stores                                          - Tracing of store instructions
  --sanitizer-metadata-atomics                                               - Emit PCs for atomic operations.
  --sanitizer-metadata-covered                                               - Emit PCs for covered functions.
  --sanitizer-metadata-nosanitize-attr                                       - Mark some metadata features uncovered in functions with associated no_sanitize attributes.
  --sanitizer-metadata-uar                                                   - Emit PCs for start of functions that are subject for use-after-return checking
  --sanitizer-metadata-weak-callbacks                                        - Declare callbacks extern weak, and only call if non-null.
  --sbvec-allow-non-pow2                                                     - Allow non-power-of-2 vectorization.
  --sbvec-always-verify                                                      - Helps find bugs by verifying the IR whenever we emit new instructions (*very* expensive).
  --sbvec-collect-seeds=<string>                                             - Collect these seeds. Use empty for none or a comma-separated list of 'loads' and 'stores'.
  --sbvec-cost-threshold=<int>                                               - Vectorization cost threshold.
  --sbvec-passes=<string>                                                    - Comma-separated list of vectorizer passes. If not set we run the predefined pipeline.
  --sbvec-print-pass-pipeline                                                - Prints the pass pipeline and returns.
  --sbvec-seed-bundle-size-limit=<uint>                                      - Limit the size of the seed bundle to cap compilation time.
  --sbvec-seed-groups-limit=<uint>                                           - Limit the number of collected seeds groups in a BB to cap compilation time.
  --sbvec-vec-reg-bits=<uint>                                                - Override the vector register size in bits, which is otherwise found by querying TTI.
  --scalable-vectorization=<value>                                           - Control whether the compiler can use scalable vectors to vectorize a loop
    =off                                                                     -   Scalable vectorization is disabled.
    =preferred                                                               -   Scalable vectorization is available and favored when the cost is inconclusive.
    =on                                                                      -   Scalable vectorization is available and favored when the cost is inconclusive.
  --scalar-evolution-classify-expressions                                    - When printing analysis, include information on every instruction
  --scalar-evolution-finite-loop                                             - Handle <= and >= in finite loops
  --scalar-evolution-huge-expr-threshold=<uint>                              - Size of the expression which is considered huge
  --scalar-evolution-max-add-rec-size=<uint>                                 - Max coefficients in AddRec during evolving
  --scalar-evolution-max-arith-depth=<uint>                                  - Maximum depth of recursive arithmetics
  --scalar-evolution-max-cast-depth=<uint>                                   - Maximum depth of recursive SExt/ZExt/Trunc
  --scalar-evolution-max-constant-evolving-depth=<uint>                      - Maximum depth of recursive constant evolving
  --scalar-evolution-max-loop-guard-collection-depth=<uint>                  - Maximum depth for recursive loop guard collection
  --scalar-evolution-max-scc-analysis-depth=<uint>                           - Maximum amount of nodes to process while searching SCEVUnknown Phi strongly connected components
  --scalar-evolution-max-scev-compare-depth=<uint>                           - Maximum depth of recursive SCEV complexity comparisons
  --scalar-evolution-max-scev-operations-implication-depth=<uint>            - Maximum depth of recursive SCEV operations implication analysis
  --scalar-evolution-max-value-compare-depth=<uint>                          - Maximum depth of recursive value complexity comparisons
  --scalar-evolution-use-context-for-no-wrap-flag-strenghening               - Infer nuw/nsw flags using context where suitable
  --scalar-evolution-use-expensive-range-sharpening                          - Use more powerful methods of sharpening expression ranges. May be costly in terms of compile time
  --scale-partial-sample-profile-working-set-size                            - If true, scale the working set size of the partial sample profile by the partial profile ratio to reflect the size of the program being compiled.
  --scev-addops-inline-threshold=<uint>                                      - Threshold for inlining addition operands into a SCEV
  --scev-cheap-expansion-budget=<uint>                                       - When performing SCEV expansion only if it is cheap to do, this controls the budget that is considered cheap (default = 4)
  --scev-mulops-inline-threshold=<uint>                                      - Threshold for inlining multiplication operands into a SCEV
  --scev-range-iter-threshold=<uint>                                         - Threshold for switching to iteratively computing SCEV ranges
  --scev-verify-ir                                                           - Verify IR correctness when making sensitive SCEV queries (slow)
  --sched-model-force-enable-intervals                                       - Force the use of resource intervals in the schedule model
  --sched-print-cycles                                                       - Report top/bottom cycles when dumping SUnit instances
  --scheditins                                                               - Use InstrItineraryData for latency lookup
  --schedmodel                                                               - Use TargetSchedModel for latency lookup
  --select-opti-loop-cycle-gain-threshold=<uint>                             - Minimum gain per loop (in cycles) threshold.
  --select-opti-loop-gradient-gain-threshold=<uint>                          - Gradient gain threshold (%).
  --select-opti-loop-relative-gain-threshold=<uint>                          - Minimum relative gain per loop threshold (1/X). Defaults to 12.5%
  --show-fs-branchprob                                                       - Print setting flow sensitive branch probabilities
  --simple-loop-unswitch-drop-non-trivial-implicit-null-checks               - If enabled, drop make.implicit metadata in unswitched implicit null checks to save time analyzing if we can keep it.
  --simple-loop-unswitch-guards                                              - If enabled, simple loop unswitching will also consider llvm.experimental.guard intrinsics as unswitch candidates.
  --simple-loop-unswitch-inject-invariant-condition-hotness-threshold=<uint> - Only try to inject loop invariant conditions and unswitch on them to eliminate branches that are not-taken 1/<this option> times or less.
  --simple-loop-unswitch-inject-invariant-conditions                         - Whether we should inject new invariants and unswitch them to eliminate some existing (non-invariant) conditions.
  --simple-loop-unswitch-memoryssa-threshold=<uint>                          - Max number of memory uses to explore during partial unswitching analysis
  --simplify-mir                                                             - Leave out unnecessary information when printing MIR
  --simplifycfg-branch-fold-common-dest-vector-multiplier=<uint>             - Multiplier to apply to threshold when determining whether or not to fold branch to common destination when vector operations are present
  --simplifycfg-branch-fold-threshold=<uint>                                 - Maximum cost of combining conditions when folding branches
  --simplifycfg-hoist-common                                                 - Hoist common instructions up to the parent block
  --simplifycfg-hoist-common-skip-limit=<uint>                               - Allow reordering across at most this many instructions when hoisting
  --simplifycfg-hoist-cond-stores                                            - Hoist conditional stores if an unconditional store precedes
  --simplifycfg-hoist-loads-stores-with-cond-faulting                        - Hoist loads/stores if the target supports conditional faulting
  --simplifycfg-max-small-block-size=<int>                                   - Max size of a block which is still considered small enough to thread through
  --simplifycfg-merge-compatible-invokes                                     - Allow SimplifyCFG to merge invokes together when appropriate
  --simplifycfg-merge-cond-stores                                            - Hoist conditional stores even if an unconditional store does not precede - hoist multiple conditional stores into a single predicated store
  --simplifycfg-merge-cond-stores-aggressively                               - When merging conditional stores, do so even if the resultant basic blocks are unlikely to be if-converted as a result
  --simplifycfg-require-and-preserve-domtree                                 - Temporary development switch used to gradually uplift SimplifyCFG into preserving DomTree,
  --simplifycfg-sink-common                                                  - Sink common instructions down to the end block
  --sink-common-insts                                                        - Sink common instructions (default = false)
  --sink-freq-percent-threshold=<uint>                                       - Do not sink instructions that require cloning unless they execute less than this percent of the time.
  --sink-insts-to-avoid-spills                                               - Sink instructions into cycles to avoid register spills
  --skip-ret-exit-block                                                      - Suppress counter promotion if exit blocks contain ret.
  --slp-max-look-ahead-depth=<int>                                           - The maximum look-ahead depth for operand reordering scores
  --slp-max-reg-size=<int>                                                   - Attempt to vectorize for this register size in bits
  --slp-max-root-look-ahead-depth=<int>                                      - The maximum look-ahead depth for searching best rooting option
  --slp-max-stride=<uint>                                                    - The maximum stride, considered to be profitable.
  --slp-max-vf=<uint>                                                        - Maximum SLP vectorization factor (0=unlimited)
  --slp-min-reg-size=<int>                                                   - Attempt to vectorize for this register size in bits
  --slp-min-strided-loads=<uint>                                             - The minimum number of loads, which should be considered strided, if the stride is > 1 or is runtime value
  --slp-min-tree-size=<uint>                                                 - Only vectorize small trees if they are fully vectorizable
  --slp-recursion-max-depth=<uint>                                           - Limit the recursion depth when building a vectorizable tree
  --slp-revec                                                                - Enable vectorization for wider vector utilization
  --slp-schedule-budget=<int>                                                - Limit the size of the SLP scheduling region per block
  --slp-skip-early-profitability-check                                       - When true, SLP vectorizer bypasses profitability checks based on heuristics and makes vectorization decision via cost modeling.
  --slp-threshold=<int>                                                      - Only vectorize if you gain more than this number 
  --slp-vectorize-hor                                                        - Attempt to vectorize horizontal reductions
  --slp-vectorize-hor-store                                                  - Attempt to vectorize horizontal reductions feeding into a store
  --slp-vectorize-non-power-of-2                                             - Try to vectorize with non-power-of-2 number of elements.
  --small-loop-cost=<uint>                                                   - The cost of a loop that is considered 'small' by the interleaver.
  --sort-profiled-scc-member                                                 - Sort profiled recursion by edge weights.
  --sort-timers                                                              - In the report, sort the timers in each group in wall clock time order
  --spec-exec-max-not-hoisted=<uint>                                         - Speculative execution is not applied to basic blocks where the number of instructions that would not be speculatively executed exceeds this limit.
  --spec-exec-max-speculation-cost=<uint>                                    - Speculative execution is not applied to basic blocks where the cost of the instructions to speculatively execute exceeds this limit.
  --spec-exec-only-if-divergent-target                                       - Speculative execution is applied only to targets with divergent branches, even if the pass was configured to apply only to all targets.
  --speculate-one-expensive-inst                                             - Allow exactly one expensive instruction to be speculatively executed
  --speculate-unpredictables                                                 - Speculate unpredictable branches (default = false)
  --speculative-counter-promotion-max-exiting=<uint>                         - The max number of exiting blocks of a loop to allow  speculative counter promotion
  --speculative-counter-promotion-to-loop                                    - When the option is false, if the target block is in a loop, the promotion will be disallowed unless the promoted counter  update can be further/iteratively promoted into an acyclic  region.
  --split-spill-mode=<value>                                                 - Spill mode for splitting live ranges
    =default                                                                 -   Default
    =size                                                                    -   Optimize for size
    =speed                                                                   -   Optimize for speed
  --split-static-data                                                        - Split static data sections into hot and cold sections using profile information
  --split-threshold-for-reg-with-hint=<uint>                                 - The threshold for splitting a virtual register with a hint, in percentage
  --spp-all-backedges                                                        - 
  --spp-counted-loop-trip-width=<int>                                        - 
  --spp-no-backedge                                                          - 
  --spp-no-call                                                              - 
  --spp-no-entry                                                             - 
  --spp-print-base-pointers                                                  - 
  --spp-print-liveset                                                        - 
  --spp-print-liveset-size                                                   - 
  --spp-rematerialization-threshold=<uint>                                   - 
  --spp-split-backedge                                                       - 
  --sroa-skip-mem2reg                                                        - 
  --ssc-dce-limit=<int>                                                      - 
  --stack-safety-max-iterations=<int>                                        - 
  --stack-safety-print                                                       - 
  --stack-safety-run                                                         - 
  --stackcoloring-lifetime-start-on-first-use                                - Treat stack lifetimes as starting on first use, not on START marker.
  --stackmap-version=<int>                                                   - Specify the stackmap encoding version (default = 3)
  --start-after=<pass-name>                                                  - Resume compilation after a specific pass
  --start-before=<pass-name>                                                 - Resume compilation before a specific pass
  --static-func-full-module-prefix                                           - Use full module build paths in the profile counter names for static functions.
  --static-func-strip-dirname-prefix=<uint>                                  - Strip specified level of directory name from source path in the profile counter name for static functions.
  --static-likely-prob=<uint>                                                - branch probability threshold in percentage to be considered very likely
  --stats                                                                    - Enable statistics output from program (available with Asserts)
  --stats-json                                                               - Display statistics as json data
  --stop-after=<pass-name>                                                   - Stop compilation after a specific pass
  --stop-before=<pass-name>                                                  - Stop compilation before a specific pass
  --store-to-load-forwarding-conflict-detection                              - Enable conflict detection in loop-access analysis
  --stress-cgp-ext-ld-promotion                                              - Stress test ext(promotable(ld)) -> promoted(ext(ld)) optimization in CodeGenPrepare
  --stress-cgp-store-extract                                                 - Stress test store(extract) optimizations in CodeGenPrepare
  --stress-early-ifcvt                                                       - Turn all knobs to 11
  --stress-ivchain                                                           - Stress test LSR IV chains
  --stress-regalloc=<N>                                                      - Limit all regclasses to N registers
  --stress-sched                                                             - Stress test instruction scheduling
  --strip-global-constants                                                   - Removes debug compile units which reference to non-existing global constants
  --strncmp-inline-threshold=<uint>                                          - The maximum length of a constant string for a builtin string cmp call eligible for inlining. The default value is 3.
  --structurizecfg-relaxed-uniform-regions                                   - Allow relaxed uniform region checks
  --structurizecfg-skip-uniform-regions                                      - Force whether the StructurizeCFG pass skips uniform regions
  --summary-file=<string>                                                    - The summary file to use for function importing.
  --supports-hot-cold-new                                                    - Linking with hot/cold operator new interfaces
  --switch-range-to-icmp                                                     - Convert switches into an integer range comparison (default = false)
  --switch-to-lookup                                                         - Convert switches to lookup tables (default = false)
  --tail-dup-indirect-size=<uint>                                            - Maximum instructions to consider tail duplicating blocks that end with indirect branches.
  --tail-dup-limit=<uint>                                                    - 
  --tail-dup-placement                                                       - Perform tail duplication during placement. Creates more fallthrough opportunities in outline branches.
  --tail-dup-placement-aggressive-threshold=<uint>                           - Instruction cutoff for aggressive tail duplication during layout. Used at -O3. Tail merging during layout is forced to have a threshold that won't conflict.
  --tail-dup-placement-penalty=<uint>                                        - Cost penalty for blocks that can avoid breaking CFG by copying. Copying can increase fallthrough, but it also increases icache pressure. This parameter controls the penalty to account for that. Percent as integer.
  --tail-dup-placement-threshold=<uint>                                      - Instruction cutoff for tail duplication during layout. Tail merging during layout is forced to have a threshold that won't conflict.
  --tail-dup-pred-size=<uint>                                                - Maximum predecessors (maximum successors at the same time) to consider tail duplicating blocks.
  --tail-dup-profile-percent-threshold=<uint>                                - If profile count information is used in tail duplication cost model, the gained fall through number from tail duplication should be at least this percent of hot count.
  --tail-dup-size=<uint>                                                     - Maximum instructions to consider tail duplicating
  --tail-dup-succ-size=<uint>                                                - Maximum successors (maximum predecessors at the same time) to consider tail duplicating blocks.
  --tail-dup-verify                                                          - Verify sanity of PHI instructions during taildup
  --tail-merge-size=<uint>                                                   - Min number of instructions to consider tail merging
  --tail-merge-threshold=<uint>                                              - Max number of predecessors to consider tail merging
  --temporal-reuse-threshold=<uint>                                          - Use this to specify the max. distance between array elements accessed in a loop so that the elements are classified to have temporal reuse
  --terminal-rule                                                            - Apply the terminal rule
  --thinlto-workload-def=<string>                                            - Pass a workload definition. This is a file containing a JSON dictionary. The keys are root functions, the values are lists of functions to import in the module defining the root. It is assumed -funique-internal-linkage-names was used, to ensure local linkage functions have unique names. For example: 
                                                                               {
                                                                                 "rootFunction_1": ["function_to_import_1", "function_to_import_2"], 
                                                                                 "rootFunction_2": ["function_to_import_3", "function_to_import_4"] 
                                                                               }
  --time-passes                                                              - Time each pass, printing elapsed time for each on exit
  --time-passes-per-run                                                      - Time each pass run, printing elapsed time for each run on exit
  --track-memory                                                             - Enable -time-passes memory tracking (this may be slow)
  --treat-scalable-fixed-error-as-warning                                    - Treat issues where a fixed-width property is requested from a scalable type as a warning, instead of an error
  --triangle-chain-count=<uint>                                              - Number of triangle-shaped-CFG's that need to be in a row for the triangle tail duplication heuristic to kick in. 0 to disable.
  --tsan-compound-read-before-write                                          - Emit special compound instrumentation for reads-before-writes
  --tsan-distinguish-volatile                                                - Emit special instrumentation for accesses to volatiles
  --tsan-handle-cxx-exceptions                                               - Handle C++ exceptions (insert cleanup blocks for unwinding)
  --tsan-instrument-atomics                                                  - Instrument atomics
  --tsan-instrument-func-entry-exit                                          - Instrument function entry and exit
  --tsan-instrument-memintrinsics                                            - Instrument memintrinsics (memset/memcpy/memmove)
  --tsan-instrument-memory-accesses                                          - Instrument memory accesses
  --tsan-instrument-read-before-write                                        - Do not eliminate read instrumentation for read-before-writes
  --two-entry-phi-node-folding-threshold=<uint>                              - Control the maximal total instruction cost that we are willing to speculatively execute to fold a 2-entry PHI node into a select (default = 4)
  --twoaddr-reschedule                                                       - Coalesce copies by rescheduling (default=true)
  --type-based-intrinsic-cost                                                - Calculate intrinsics cost based only on argument types
  --tysan-writes-always-set-type                                             - Writes always set the type
  --unlikely-branch-weight=<uint>                                            - Weight of the branch unlikely to be taken (default = 1)
  --unroll-allow-loop-nests-peeling                                          - Allows loop nests to be peeled.
  --unroll-allow-partial                                                     - Allows loops to be partially unrolled until -unroll-threshold loop size is reached.
  --unroll-allow-peeling                                                     - Allows loops to be peeled when the dynamic trip count is known to be low.
  --unroll-allow-remainder                                                   - Allow generation of a loop remainder (extra iterations) when unrolling a loop.
  --unroll-and-jam-count=<uint>                                              - Use this unroll count for all loops including those with unroll_and_jam_count pragma values, for testing purposes
  --unroll-and-jam-threshold=<uint>                                          - Threshold to use for inner loop when doing unroll and jam.
  --unroll-count=<uint>                                                      - Use this unroll count for all loops including those with unroll_count pragma values, for testing purposes
  --unroll-force-peel-count=<uint>                                           - Force a peel count regardless of profiling information.
  --unroll-full-max-count=<uint>                                             - Set the max unroll count for full unrolling, for testing purposes
  --unroll-max-count=<uint>                                                  - Set the max unroll count for partial and runtime unrolling, fortesting purposes
  --unroll-max-iteration-count-to-analyze=<uint>                             - Don't allow loop unrolling to simulate more than this number of iterations when checking full unroll profitability
  --unroll-max-percent-threshold-boost=<uint>                                - The maximum 'boost' (represented as a percentage >= 100) applied to the threshold when aggressively unrolling a loop due to the dynamic cost savings. If completely unrolling a loop will reduce the total runtime from X to Y, we boost the loop unroll threshold to DefaultThreshold*std::min(MaxPercentThresholdBoost, X/Y). This limit avoids excessive code bloat.
  --unroll-max-upperbound=<uint>                                             - The max of trip count upper bound that is considered in unrolling
  --unroll-optsize-threshold=<uint>                                          - The cost threshold for loop unrolling when optimizing for size
  --unroll-partial-threshold=<uint>                                          - The cost threshold for partial loop unrolling
  --unroll-peel-count=<uint>                                                 - Set the unroll peeling count, for testing purposes
  --unroll-peel-max-count=<uint>                                             - Max average trip count which will cause loop peeling.
  --unroll-remainder                                                         - Allow the loop remainder to be unrolled.
  --unroll-revisit-child-loops                                               - Enqueue and re-visit child loops in the loop PM after unrolling. This shouldn't typically be needed as child loops (or their clones) were already visited.
  --unroll-runtime                                                           - Unroll loops with run-time trip counts
  --unroll-runtime-epilog                                                    - Allow runtime unrolled loops to be unrolled with epilog instead of prolog.
  --unroll-runtime-multi-exit                                                - Allow runtime unrolling for loops with multiple exits, when epilog is generated
  --unroll-runtime-other-exit-predictable                                    - Assume the non latch exit block to be predictable
  --unroll-threshold=<uint>                                                  - The cost threshold for loop unrolling
  --unroll-threshold-aggressive=<uint>                                       - Threshold (max size of unrolled loop) to use in aggressive (O3) optimizations
  --unroll-threshold-default=<uint>                                          - Default threshold (max size of unrolled loop), used in all but O3 optimizations
  --unroll-verify-domtree                                                    - Verify domtree after unrolling
  --unroll-verify-loopinfo                                                   - Verify loopinfo after unrolling
  --unswitch-num-initial-unscaled-candidates=<int>                           - Number of unswitch candidates that are ignored when calculating cost multiplier.
  --unswitch-siblings-toplevel-div=<int>                                     - Toplevel siblings divisor for cost multiplier.
  --unswitch-threshold=<int>                                                 - The cost threshold for unswitching a loop.
  --update-pseudo-probe                                                      - Update pseudo probe distribution factor
  --use-constant-fp-for-fixed-length-splat                                   - Use ConstantFP's native fixed-length vector splat support.
  --use-constant-fp-for-scalable-splat                                       - Use ConstantFP's native scalable vector splat support.
  --use-constant-int-for-fixed-length-splat                                  - Use ConstantInt's native fixed-length vector splat support.
  --use-constant-int-for-scalable-splat                                      - Use ConstantInt's native scalable vector splat support.
  --use-ctx-profile=<string>                                                 - Use the specified contextual profile file
  --use-dereferenceable-at-point-semantics                                   - Deref attributes and metadata infer facts at definition only
  --use-iterative-bfi-inference                                              - Apply an iterative post-processing to infer correct BFI counts
  --use-leb128-directives                                                    - Disable the usage of LEB128 directives, and generate .byte instead.
  --use-lir-code-size-heurs                                                  - Use loop idiom recognition code size heuristics when compiling with -Os/-Oz
  --use-noalias-intrinsic-during-inlining                                    - Use the llvm.experimental.noalias.scope.decl intrinsic during inlining.
  --use-profiled-call-graph                                                  - Process functions in a top-down order defined by the profiled call graph when -sample-profile-top-down-load is on.
  --use-segment-set-for-physregs                                             - Use segment set for the computation of the live ranges of physregs.
  --use-source-filename-for-promoted-locals                                  - Uses the source file name instead of the Module hash. This requires that the source filename has a unique name / path to avoid name collisions.
  --use-tbaa-in-sched-mi                                                     - Enable use of TBAA during MI DAG construction
  -v                                                                         - verbose
  --vector-combine-max-scan-instrs=<uint>                                    - Max number of instructions to scan for vector combining.
  --vector-library=<value>                                                   - Vector functions library
    =none                                                                    -   No vector functions library
    =Accelerate                                                              -   Accelerate framework
    =Darwin_libsystem_m                                                      -   Darwin libsystem_m
    =LIBMVEC-X86                                                             -   GLIBC Vector Math library
    =MASSV                                                                   -   IBM MASS vector library
    =SVML                                                                    -   Intel SVML library
    =sleefgnuabi                                                             -   SIMD Library for Evaluating Elementary Functions
    =ArmPL                                                                   -   Arm Performance Libraries
    =AMDLIBM                                                                 -   AMD vector math library
  --vectorize-loops                                                          - Run the Loop vectorization passes
  --vectorize-memory-check-threshold=<uint>                                  - The maximum allowed number of runtime memory checks
  --vectorize-num-stores-pred=<uint>                                         - Max number of stores to be predicated behind an if.
  --vectorize-scev-check-threshold=<uint>                                    - The maximum number of SCEV checks allowed.
  --vectorize-slp                                                            - Run the SLP vectorization passes
  --vectorizer-maximize-bandwidth                                            - Maximize bandwidth when selecting vectorization factor which will be determined by the smallest type in loop.
  --vectorizer-maximize-bandwidth-for-vector-calls                           - Try wider VFs if they enable the use of vector variants
  --vectorizer-min-trip-count=<uint>                                         - Loops with a constant trip count that is smaller than this value are vectorized only if no scalar iteration overheads are incurred.
  --verify-assumption-cache                                                  - Enable verification of assumption cache
  --verify-cfiinstrs                                                         - Verify Call Frame Information instructions
  --verify-coalescing                                                        - Verify machine instrs before and after register coalescing
  --verify-dom-info                                                          - Verify dominator info (time consuming)
  --verify-loop-info                                                         - Verify loop info (time consuming)
  --verify-loop-lcssa                                                        - Verify loop lcssa form (time consuming)
  --verify-machine-dom-info                                                  - Verify machine dominator info (time consuming)
  --verify-machineinstrs                                                     - Verify generated machine code
  --verify-matrix-shapes                                                     - Enable/disable matrix shape verification.
  --verify-memoryssa                                                         - Enable verification of MemorySSA.
  --verify-misched                                                           - Verify machine instrs before and after machine scheduling
  --verify-noalias-scope-decl-dom                                            - Ensure that llvm.experimental.noalias.scope.decl for identical scopes are not dominating
  --verify-predicateinfo                                                     - Verify PredicateInfo in legacy printer pass.
  --verify-pseudo-probe                                                      - Do pseudo probe verification
  --verify-pseudo-probe-funcs=<string>                                       - The option to specify the name of the functions to verify.
  --verify-regalloc                                                          - Verify during register allocation
  --verify-region-info                                                       - Verify region info (time consuming)
  --verify-scev                                                              - Verify ScalarEvolution's backedge taken counts (slow)
  --verify-scev-strict                                                       - Enable stricter verification with -verify-scev is passed
  --view-bfi-func-name=<string>                                              - The option to specify the name of the function whose CFG will be displayed.
  --view-block-freq-propagation-dags=<value>                                 - Pop up a window to show a dag displaying how block frequencies propagation through the CFG.
    =none                                                                    -   do not display graphs.
    =fraction                                                                -   display a graph using the fractional block frequency representation.
    =integer                                                                 -   display a graph using the raw integer fractional block frequency representation.
    =count                                                                   -   display a graph using the real profile count if available.
  --view-block-layout-with-bfi=<value>                                       - Pop up a window to show a dag displaying MBP layout and associated block frequencies of the CFG.
    =none                                                                    -   do not display graphs.
    =fraction                                                                -   display a graph using the fractional block frequency representation.
    =integer                                                                 -   display a graph using the raw integer fractional block frequency representation.
    =count                                                                   -   display a graph using the real profile count if available.
  --view-edge-bundles                                                        - Pop up a window to show edge bundle graphs
  --view-hot-freq-percent=<uint>                                             - An integer in percent used to specify the hot blocks/edges to be displayed in red: a block or edge whose frequency is no less than the max frequency of the function multiplied by this percent.
  --view-machine-block-freq-propagation-dags=<value>                         - Pop up a window to show a dag displaying how machine block frequencies propagate through the CFG.
    =none                                                                    -   do not display graphs.
    =fraction                                                                -   display a graph using the fractional block frequency representation.
    =integer                                                                 -   display a graph using the raw integer fractional block frequency representation.
    =count                                                                   -   display a graph using the real profile count if available.
  --view-misched-cutoff=<uint>                                               - Hide nodes with more predecessor/successor than cutoff
  --view-misched-dags                                                        - Pop up a window to show MISched dags after they are processed
  --view-slp-tree                                                            - Display the SLP trees with Graphviz
  --vp-counters-per-site=<number>                                            - The average number of profile counters allocated per value profiling site.
  --vp-static-alloc                                                          - Do static counter allocation for value profiler
  --vplan-build-stress-test                                                  - Build VPlan for every supported loop nest in the function and bail out right after the build (stress test the VPlan H-CFG construction in the VPlan-native vectorization path).
  --vplan-print-in-dot-format                                                - Use dot format instead of plain text when dumping VPlans
  --vplan-verify-each                                                        - Verfiy VPlans after VPlan transforms.
  --whole-program-visibility                                                 - Enable whole program visibility
  --wholeprogramdevirt-branch-funnel-threshold=<uint>                        - Maximum number of call targets per call site to enable branch funnels
  --wholeprogramdevirt-check=<value>                                         - Type of checking for incorrect devirtualizations
    =none                                                                    -   No checking
    =trap                                                                    -   Trap when incorrect
    =fallback                                                                -   Fallback to indirect when incorrect
  --wholeprogramdevirt-cutoff=<uint>                                         - Max number of devirtualizations for devirt module pass
  --wholeprogramdevirt-keep-unreachable-function                             - Regard unreachable functions as possible devirtualize targets.
  --wholeprogramdevirt-print-index-based                                     - Print index-based devirtualization messages
  --wholeprogramdevirt-read-summary=<string>                                 - Read summary from given bitcode or YAML file before running pass
  --wholeprogramdevirt-skip=<string>                                         - Prevent function(s) from being devirtualized
  --wholeprogramdevirt-summary-action=<value>                                - What to do with the summary when running this pass
    =none                                                                    -   Do nothing
    =import                                                                  -   Import typeid resolutions from summary and globals
    =export                                                                  -   Export typeid resolutions to summary and globals
  --wholeprogramdevirt-write-summary=<string>                                - Write summary to given bitcode or YAML file after running pass. Output file format is deduced from extension: *.bc means writing bitcode, otherwise YAML
  --window-diff-limit=<uint>                                                 - The lower limit of the difference between best II and base II in the window algorithm. If the difference is smaller than this lower limit, window scheduling will not be performed.
  --window-ii-coeff=<uint>                                                   - The coefficient used when initializing II in the window algorithm.
  --window-ii-limit=<uint>                                                   - The upper limit of II in the window algorithm.
  --window-region-limit=<uint>                                               - The lower limit of the scheduling region in the window algorithm.
  --window-sched=<value>                                                     - Set how to use window scheduling algorithm.
    =off                                                                     -   Turn off window algorithm.
    =on                                                                      -   Use window algorithm after SMS algorithm fails.
    =force                                                                   -   Use window algorithm instead of SMS algorithm.
  --window-search-num=<uint>                                                 - The number of searches per loop in the window algorithm. 0 means no search number limit.
  --window-search-ratio=<uint>                                               - The ratio of searches per loop in the window algorithm. 100 means search all positions in the loop, while 0 means not performing any search.
  --write-experimental-debuginfo                                             - Write debug info in the new non-intrinsic format. Has no effect if --preserve-input-debuginfo-format=true.
  --write-experimental-debuginfo-iterators-to-bitcode                        - 
  --write-relbf-to-summary                                                   - Write relative block frequency to function summary 

Generic Options:

  -h                                                                         - Alias for --help
  --help                                                                     - Display available options (--help-hidden for more)
  --help-hidden                                                              - Display all available options
  --help-list                                                                - Display list of available options (--help-list-hidden for more)
  --help-list-hidden                                                         - Display list of all available options
  --print-all-options                                                        - Print all option values after command line parsing
  --print-options                                                            - Print non-default options after command line parsing
  --version                                                                  - Display the version of this program

五、编译 issues 汇总

问题 1：`stdfloat` 标准库未找到

1
2
3
4
5


(sandbox) test/xrt/09_gemm_extern_vec_4x4$ g++ test.cpp -o test.exe -std=c++23 -Wall %xrt_flags -lrt -lstdc++ -lboost_program_options -lboost_filesystem
test.cpp:21:10: fatal error: stdfloat: No such file or directory
   21 | #include <stdfloat>
      |          ^~~~~~~~~~
compilation terminated.

解决方法：

该问题是因为 stdfloat 标准库是2023年的标准，需要 gcc13 及以上的版本，而本地的 gcc 版本为 12，需要升级 gcc 版本。

问题 2：`aie.extras` 模块无法找到

1
2
3
4


(sandbox) test/xrt/09_gemm_extern_vec_4x4$ python -c "import aie.extras"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'aie.extras'; 'aie' is not a package

解决方法：

在 PYTHONPATH 中补充 aie 模块的路径：

1
2


export PYTHONPATH=$MLIR_AIR_DIR/install-aie/python/aie:$PYTHONPATH
export PYTHONPATH=$MLIR_AIR_DIR/install-air/python/air:$PYTHONPATH

问题 3：`-opaque-pointers` 不透明指针

1
2
3
4


AIE Compilation: ━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   6% -:--:-- 0:00:02  1/17 4 
Workersld.lld: error: /tmp/genwrapper_for_ps-c7d07a.o: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'LLVM19.0.0git' Reader: 'LLVM 14.0.0')
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
Error encountered while running: clang++ -O2 -fuse-ld=lld -shared -o /media/4T/home/shikai/repos/mlir-air-new/test/xrt/09_gemm_extern_vec_4x4/tmp/sim/ps/ps.so /usr/tools/mlir-air/install-aie/aie_runtime_lib/AIE2/aiesim/genwrapper_for_ps.cpp -D__AIEARCH__=20 -fPIC -flto -fpermissive -DAIE_OPTION_SCALAR_FLOAT_ON_VECTOR -Wno-deprecated-declarations -Wno-enum-constexpr-conversion -Wno-format-security -DSC_INCLUDE_DYNAMIC_PROCESSES -D__AIESIM__ -D__PS_INIT_AIE__ -Og -Dmain(...)=ps_main(...) -I/media/4T/home/shikai/repos/mlir-air-new/test/xrt/09_gemm_extern_vec_4x4/tmp -I/usr/xilinx/Vitis/2024.2/aietools/include -I/usr/xilinx/Vitis/2024.2/aietools/include/drivers/aiengine -I/usr/xilinx/Vitis/2024.2/aietools/data/osci_systemc/include -I/usr/xilinx/Vitis/2024.2/aietools/include/xtlm/include -I/usr/xilinx/Vitis/2024.2/aietools/include/common_cpp/common_cpp_v1_0/include -I/usr/tools/mlir-air/install-aie/runtime_lib/x86_64/test_lib/include /usr/tools/mlir-air/install-aie/runtime_lib/x86_64/test_lib/lib/libmemory_allocator_sim_aie.a -L/usr/xilinx/Vitis/2024.2/aietools/lib/lnx64.o -L/usr/xilinx/Vitis/2024.2/aietools/data/osci_systemc/lib/lnx64 -Wl,--as-needed -lxioutils -lxaiengine -ladf_api -lsystemc -lxtlm -flto

解决方法：

aiecc.py 工具链依赖于较新版本的 LLVM(19.0.0），而系统中的 ld.lld 链接器是较旧的版本（LLVM 14.0.0）。不透明的指针是 LLVM 较新版本引入的特性，旧版本的链接器无法正确处理。

需要将系统中的 ld.lld 链接器替换为较新版本的链接器（即更新系统 LLVM）：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


# 检查旧版本 ld.lld 链接器
ld.lld --version

# 添加 LLVM apt 源
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo add-apt-repository "deb http://apt.llvm.org/$(lsb_release -cs)/ llvm-toolchain-$(lsb_release -cs)-19 main"
sudo apt update

# 安装 LLVM 19
sudo apt install llvm-19 lld-19 clang-19

cd /usr/bin
rm -rf ld.lld
sudo ln -s /usr/bin/ld.lld-19 ld.lld

# 检查新版本 ld.lld 链接器
ld.lld --version

参考资料

给作者倒杯卡布奇诺 ~

赞赏

支付宝

微信