Skip to content

app-mlperf-inference

Automatically generated README for this automation recipe: app-mlperf-inference

Category: Modular MLPerf inference benchmark pipeline

License: Apache 2.0

Developers: Arjun Suresh, Thomas Zhu, Grigori Fursin * Notes from the authors, contributors and users: README-extra


This CM script provides a unified interface to prepare and run a modular version of the MLPerf inference benchmark across diverse ML models, data sets, frameworks, libraries, run-time systems and platforms using the cross-platform automation meta-framework (MLCommons CM).

It is assembled from reusable and interoperable CM scripts for DevOps and MLOps being developed by the open MLCommons taskforce on automation and reproducibility.

It is a higher-level wrapper to several other CM scripts modularizing the MLPerf inference benchmark: * Reference Python implementation * Universal C++ implementation * TFLite C++ implementation * NVidia optimized implementation

See this SCC'23 tutorial to use this script to run a reference (unoptimized) Python implementation of the MLPerf object detection benchmark with RetinaNet model, Open Images dataset, ONNX runtime and CPU target.

See this CM script to automate and validate your MLPerf inference submission.

Get in touch with the open taskforce on automation and reproducibility at MLCommons if you need help with your submission or if you would like to participate in further modularization of MLPerf and collaborative design space exploration and optimization of ML Systems.

  • CM meta description for this script: _cm.yaml
  • Output cached? False

Reuse this script in your project

Install MLCommons CM automation meta-framework

Pull CM repository with this automation recipe (CM script)

cm pull repo mlcommons@cm4mlops

cmr "app vision language mlcommons mlperf inference generic" --help

Run this script

Run this script via CLI
cm run script --tags=app,vision,language,mlcommons,mlperf,inference,generic[,variations] [--input_flags]
Run this script via CLI (alternative)
cmr "app vision language mlcommons mlperf inference generic [variations]" [--input_flags]
Run this script from Python
import cmind

r = cmind.access({'action':'run'
              'automation':'script',
              'tags':'app,vision,language,mlcommons,mlperf,inference,generic'
              'out':'con',
              ...
              (other input keys for this script)
              ...
             })

if r['return']>0:
    print (r['error'])
Run this script via Docker (beta)
cm docker script "app vision language mlcommons mlperf inference generic[variations]" [--input_flags]

Variations

  • Group "implementation"

    Click here to expand this section.

    • _cpp
      • Aliases: _mil,_mlcommons-cpp
      • ENV variables:
        • CM_MLPERF_CPP: yes
        • CM_MLPERF_IMPLEMENTATION: mlcommons_cpp
        • CM_IMAGENET_ACCURACY_DTYPE: float32
        • CM_OPENIMAGES_ACCURACY_DTYPE: float32
    • _intel-original
      • Aliases: _intel
      • ENV variables:
        • CM_MLPERF_IMPLEMENTATION: intel
    • _kilt
      • Aliases: _qualcomm
      • ENV variables:
        • CM_MLPERF_IMPLEMENTATION: qualcomm
    • _nvidia-original
      • Aliases: _nvidia
      • ENV variables:
        • CM_MLPERF_IMPLEMENTATION: nvidia
        • CM_SQUAD_ACCURACY_DTYPE: float16
        • CM_IMAGENET_ACCURACY_DTYPE: int32
        • CM_CNNDM_ACCURACY_DTYPE: int32
        • CM_LIBRISPEECH_ACCURACY_DTYPE: int8
    • _reference (default)
      • Aliases: _mlcommons-python,_python
      • ENV variables:
        • CM_MLPERF_PYTHON: yes
        • CM_MLPERF_IMPLEMENTATION: mlcommons_python
        • CM_SQUAD_ACCURACY_DTYPE: float32
        • CM_IMAGENET_ACCURACY_DTYPE: float32
        • CM_OPENIMAGES_ACCURACY_DTYPE: float32
        • CM_LIBRISPEECH_ACCURACY_DTYPE: float32
        • CM_CNNDM_ACCURACY_DTYPE: int32
    • _tflite-cpp
      • Aliases: _ctuning-cpp-tflite
      • ENV variables:
        • CM_MLPERF_TFLITE_CPP: yes
        • CM_MLPERF_CPP: yes
        • CM_MLPERF_IMPLEMENTATION: ctuning_cpp_tflite
        • CM_IMAGENET_ACCURACY_DTYPE: float32
  • Group "backend"

    Click here to expand this section.

    • _deepsparse
      • ENV variables:
        • CM_MLPERF_BACKEND: deepsparse
    • _glow
      • ENV variables:
        • CM_MLPERF_BACKEND: glow
    • _ncnn
      • ENV variables:
        • CM_MLPERF_BACKEND: ncnn
    • _onnxruntime
      • ENV variables:
        • CM_MLPERF_BACKEND: onnxruntime
    • _pytorch
      • ENV variables:
        • CM_MLPERF_BACKEND: pytorch
    • _ray
      • ENV variables:
        • CM_MLPERF_BACKEND: ray
    • _tensorrt
      • ENV variables:
        • CM_MLPERF_BACKEND: tensorrt
    • _tf
      • ENV variables:
        • CM_MLPERF_BACKEND: tf
    • _tflite
      • ENV variables:
        • CM_MLPERF_BACKEND: tflite
    • _tvm-onnx
      • ENV variables:
        • CM_MLPERF_BACKEND: tvm-onnx
    • _tvm-pytorch
      • ENV variables:
        • CM_MLPERF_BACKEND: tvm-pytorch
    • _tvm-tflite
      • ENV variables:
        • CM_MLPERF_BACKEND: tvm-tflite
  • Group "device"

    Click here to expand this section.

    • _cpu (default)
      • ENV variables:
        • CM_MLPERF_DEVICE: cpu
    • _cuda
      • ENV variables:
        • CM_MLPERF_DEVICE: gpu
    • _qaic
      • ENV variables:
        • CM_MLPERF_DEVICE: qaic
    • _rocm
      • ENV variables:
        • CM_MLPERF_DEVICE: rocm
    • _tpu
      • ENV variables:
        • CM_MLPERF_DEVICE: tpu
  • Group "model"

    Click here to expand this section.

    • _3d-unet-99
      • ENV variables:
        • CM_MODEL: 3d-unet-99
    • _3d-unet-99.9
      • ENV variables:
        • CM_MODEL: 3d-unet-99.9
    • _bert-99
      • ENV variables:
        • CM_MODEL: bert-99
    • _bert-99.9
      • ENV variables:
        • CM_MODEL: bert-99.9
    • _dlrm-v2-99
      • ENV variables:
        • CM_MODEL: dlrm-v2-99
    • _dlrm-v2-99.9
      • ENV variables:
        • CM_MODEL: dlrm-v2-99.9
    • _efficientnet
      • ENV variables:
        • CM_MODEL: efficientnet
    • _gptj-99
      • ENV variables:
        • CM_MODEL: gptj-99
    • _gptj-99.9
      • ENV variables:
        • CM_MODEL: gptj-99.9
    • _llama2-70b-99
      • ENV variables:
        • CM_MODEL: llama2-70b-99
    • _llama2-70b-99.9
      • ENV variables:
        • CM_MODEL: llama2-70b-99.9
    • _mobilenet
      • ENV variables:
        • CM_MODEL: mobilenet
    • _resnet50 (default)
      • ENV variables:
        • CM_MODEL: resnet50
    • _retinanet
      • ENV variables:
        • CM_MODEL: retinanet
    • _rnnt
      • ENV variables:
        • CM_MODEL: rnnt
    • _sdxl
      • ENV variables:
        • CM_MODEL: stable-diffusion-xl
  • Group "precision"

    Click here to expand this section.

    • _bfloat16
      • ENV variables:
        • CM_MLPERF_QUANTIZATION: False
        • CM_MLPERF_MODEL_PRECISION: float32
    • _float16
      • ENV variables:
        • CM_MLPERF_QUANTIZATION: False
        • CM_MLPERF_MODEL_PRECISION: float32
    • _float32 (default)
      • Aliases: _fp32
      • ENV variables:
        • CM_MLPERF_QUANTIZATION: False
        • CM_MLPERF_MODEL_PRECISION: float32
    • _int4
      • ENV variables:
        • CM_MLPERF_QUANTIZATION: True
        • CM_MLPERF_MODEL_PRECISION: int4
    • _int8
      • Aliases: _quantized
      • ENV variables:
        • CM_MLPERF_QUANTIZATION: True
        • CM_MLPERF_MODEL_PRECISION: int8
    • _uint8
      • ENV variables:
        • CM_MLPERF_QUANTIZATION: True
        • CM_MLPERF_MODEL_PRECISION: uint8
  • Group "execution-mode"

    Click here to expand this section.

    • _fast
      • ENV variables:
        • CM_FAST_FACTOR: 5
        • CM_OUTPUT_FOLDER_NAME: fast_results
        • CM_MLPERF_RUN_STYLE: fast
    • _test (default)
      • ENV variables:
        • CM_OUTPUT_FOLDER_NAME: test_results
        • CM_MLPERF_RUN_STYLE: test
    • _valid
      • ENV variables:
        • CM_OUTPUT_FOLDER_NAME: valid_results
        • CM_MLPERF_RUN_STYLE: valid
  • Group "reproducibility"

    Click here to expand this section.

    • _r2.1_default
      • ENV variables:
        • CM_SKIP_SYS_UTILS: yes
        • CM_TEST_QUERY_COUNT: 100
    • _r3.0_default
      • ENV variables:
        • CM_SKIP_SYS_UTILS: yes
    • _r3.1_default
    • _r4.0_default
      • ENV variables:
        • CM_ENV_NVMITTEN_DOCKER_WHEEL_PATH: /opt/nvmitten-0.1.3-cp38-cp38-linux_x86_64.whl
    • _r4.1_default
      • ENV variables:
        • CM_ENV_NVMITTEN_DOCKER_WHEEL_PATH: /opt/nvmitten-0.1.3b0-cp38-cp38-linux_x86_64.whl
  • No group (any combination of variations can be selected)

    Click here to expand this section.

    • _power
      • ENV variables:
        • CM_MLPERF_POWER: yes
        • CM_SYSTEM_POWER: yes
  • Group "batch_size"

    Click here to expand this section.

    • _batch_size.#
      • ENV variables:
        • CM_MLPERF_LOADGEN_MAX_BATCHSIZE: #
  • Group "loadgen-scenario"

    Click here to expand this section.

    • _multistream
      • ENV variables:
        • CM_MLPERF_LOADGEN_SCENARIO: MultiStream
    • _offline (default)
      • ENV variables:
        • CM_MLPERF_LOADGEN_SCENARIO: Offline
    • _server
      • ENV variables:
        • CM_MLPERF_LOADGEN_SCENARIO: Server
    • _singlestream
      • ENV variables:
        • CM_MLPERF_LOADGEN_SCENARIO: SingleStream
Default variations

_cpu,_float32,_offline,_reference,_resnet50,_test

Input Flags

  • --scenario: MLPerf inference scenario {Offline,Server,SingleStream,MultiStream} (Offline)
  • --mode: MLPerf inference mode {performance,accuracy} (accuracy)
  • --test_query_count: Specifies the number of samples to be processed during a test run
  • --target_qps: Target QPS
  • --target_latency: Target Latency
  • --max_batchsize: Maximum batchsize to be used
  • --num_threads: Number of CPU threads to launch the application with
  • --hw_name: Valid value - any system description which has a config file (under same name) defined here
  • --output_dir: Location where the outputs are produced
  • --rerun: Redo the run even if previous run files exist (True)
  • --regenerate_files: Regenerates measurement files including accuracy.txt files even if a previous run exists. This option is redundant if --rerun is used
  • --adr.python.name: Python virtual environment name (optional) (mlperf)
  • --adr.python.version_min: Minimal Python version (3.8)
  • --adr.python.version: Force Python version (must have all system deps)
  • --adr.compiler.tags: Compiler for loadgen (gcc)
  • --adr.inference-src-loadgen.env.CM_GIT_URL: Git URL for MLPerf inference sources to build LoadGen (to enable non-reference implementations)
  • --adr.inference-src.env.CM_GIT_URL: Git URL for MLPerf inference sources to run benchmarks (to enable non-reference implementations)
  • --quiet: Quiet run (select default values for all questions) (False)
  • --readme: Generate README with the reproducibility report
  • --debug: Debug MLPerf script

Script flags mapped to environment

  • --clean=valueCM_MLPERF_CLEAN_SUBMISSION_DIR=value
  • --count=valueCM_MLPERF_LOADGEN_QUERY_COUNT=value
  • --debug=valueCM_DEBUG_SCRIPT_BENCHMARK_PROGRAM=value
  • --docker=valueCM_RUN_DOCKER_CONTAINER=value
  • --gpu_name=valueCM_NVIDIA_GPU_NAME=value
  • --hw_name=valueCM_HW_NAME=value
  • --imagenet_path=valueIMAGENET_PATH=value
  • --max_amps=valueCM_MLPERF_POWER_MAX_AMPS=value
  • --max_batchsize=valueCM_MLPERF_LOADGEN_MAX_BATCHSIZE=value
  • --max_volts=valueCM_MLPERF_POWER_MAX_VOLTS=value
  • --mode=valueCM_MLPERF_LOADGEN_MODE=value
  • --multistream_target_latency=valueCM_MLPERF_LOADGEN_MULTISTREAM_TARGET_LATENCY=value
  • --ntp_server=valueCM_MLPERF_POWER_NTP_SERVER=value
  • --num_threads=valueCM_NUM_THREADS=value
  • --offline_target_qps=valueCM_MLPERF_LOADGEN_OFFLINE_TARGET_QPS=value
  • --output_dir=valueOUTPUT_BASE_DIR=value
  • --power=valueCM_MLPERF_POWER=value
  • --power_server=valueCM_MLPERF_POWER_SERVER_ADDRESS=value
  • --readme=valueCM_MLPERF_README=value
  • --regenerate_files=valueCM_REGENERATE_MEASURE_FILES=value
  • --rerun=valueCM_RERUN=value
  • --scenario=valueCM_MLPERF_LOADGEN_SCENARIO=value
  • --server_target_qps=valueCM_MLPERF_LOADGEN_SERVER_TARGET_QPS=value
  • --singlestream_target_latency=valueCM_MLPERF_LOADGEN_SINGLESTREAM_TARGET_LATENCY=value
  • --target_latency=valueCM_MLPERF_LOADGEN_TARGET_LATENCY=value
  • --target_qps=valueCM_MLPERF_LOADGEN_TARGET_QPS=value
  • --test_query_count=valueCM_TEST_QUERY_COUNT=value

Default environment

These keys can be updated via --env.KEY=VALUE or env dictionary in @input.json or using script flags.

  • CM_MLPERF_LOADGEN_MODE: accuracy
  • CM_MLPERF_LOADGEN_SCENARIO: Offline
  • CM_OUTPUT_FOLDER_NAME: test_results
  • CM_MLPERF_RUN_STYLE: test
  • CM_TEST_QUERY_COUNT: 10
  • CM_MLPERF_QUANTIZATION: False

Native script being run

No run file exists for Windows


Script output

cmr "app vision language mlcommons mlperf inference generic [variations]" [--input_flags] -j