app-mlperf-inference-nvidia
Automatically generated README for this automation recipe: app-mlperf-inference-nvidia
Category: Reproduce MLPerf benchmarks
License: Apache 2.0
This script is a CM wrapper to the official Nvidia submission code used for MLPerf inference submissions.
Download the needed files
- Please ask privately in this discord channel if you would like to get access to an Amazon S3 bucket containing all the needed files for easiness. Otherwise, you can download them from the below links.
For x86 machines, please download the latest install tar files from the below sites 1. cuDNN (for cuda 11) 2. TensorRT 3. Imagenet validation set (unfortunately not available via public URL) following the instructions given here
## Using Docker (Recommended on x86 systems)
Assuming all the downloaded files are to the user home directory please do the following steps: 1. Download CUDA 11.8wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo usermod -aG docker $USER
nvidia-ctk --version
nvidia-smi
cmr "install cuda prebuilt _driver" --version=11.8.0
cm docker script --tags=build,nvidia,inference,server \
--cuda_run_file_path=$HOME/cuda_11.8.0_520.61.05_linux.run \
--tensorrt_tar_file_path=$HOME/TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz \
--cudnn_tar_file_path=$HOME/cudnn-linux-x86_64-8.9.2.26_cuda11-archive.tar.xz \
--imagenet_path=$HOME/imagenet-2012-val \
--scratch_path=$HOME/mlperf_scratch \
--docker_cm_repo=mlcommons@cm4mlops \
--results_dir=$HOME/results_dir \
--submission_dir=$HOME/submission_dir \
--adr.compiler.tags=gcc
============================================
=> A system ID is a string containing only letters, numbers, and underscores
=> that is used as the human-readable name of the system. It is also used as
=> the system name when creating the measurements/ and results/ entries.
=> This string should also start with a letter to be a valid Python enum member name.
=> Specify the system ID to use for the current system: phoenix
=> Reloaded system list. MATCHED_SYSTEM: KnownSystem.phoenix
=> This script will generate Benchmark Configuration stubs for the detected system.
Continue? [y/n]: y
## Without Docker
1. Install CUDA If CUDA is not detected, CM should download and install it automatically when you run the workflow. ** Nvidia drivers are expected to be installed on the system ** 2. Install cuDNN cmr "get cudnn" --tar_file=<PATH_TO_CUDNN_TAR_FILE>
cmr "get tensorrt _dev" --tar_file=<PATH_TO_TENSORRT_TAR_FILE>
============================================
=> A system ID is a string containing only letters, numbers, and underscores
=> that is used as the human-readable name of the system. It is also used as
=> the system name when creating the measurements/ and results/ entries.
=> This string should also start with a letter to be a valid Python enum member name.
=> Specify the system ID to use for the current system: phoenix
=> Reloaded system list. MATCHED_SYSTEM: KnownSystem.phoenix
=> This script will generate Benchmark Configuration stubs for the detected system.
Continue? [y/n]: y
Acknowledgments
- A common CM interface and automation for MLPerf inference benchmarks was developed by Arjun Suresh and Grigori Fursin sponsored by the cTuning foundation and cKnowledge.org.
-
Nvidia's MLPerf inference implementation was developed by Zhihan Jiang, Ethan Cheng, Yiheng Zhang and Jinho Suh.
-
CM meta description for this script: _cm.yaml
- Output cached? False
Reuse this script in your project
Install MLCommons CM automation meta-framework
Pull CM repository with this automation recipe (CM script)
cm pull repo mlcommons@cm4mlops
Print CM help from the command line
cmr "reproduce mlcommons mlperf inference harness nvidia-harness nvidia" --help
Run this script
Run this script via CLI
cm run script --tags=reproduce,mlcommons,mlperf,inference,harness,nvidia-harness,nvidia[,variations] [--input_flags]
Run this script via CLI (alternative)
cmr "reproduce mlcommons mlperf inference harness nvidia-harness nvidia [variations]" [--input_flags]
Run this script from Python
import cmind
r = cmind.access({'action':'run'
'automation':'script',
'tags':'reproduce,mlcommons,mlperf,inference,harness,nvidia-harness,nvidia'
'out':'con',
...
(other input keys for this script)
...
})
if r['return']>0:
print (r['error'])
Run this script via Docker (beta)
cm docker script "reproduce mlcommons mlperf inference harness nvidia-harness nvidia[variations]" [--input_flags]
Variations
-
No group (any combination of variations can be selected)
Click here to expand this section.
_run-harness
_v3.1
- ENV variables:
- CM_MLPERF_INFERENCE_VERSION:
v3.1
- CM_MLPERF_GPTJ_MODEL_FP8_PATH_SUFFIX:
GPTJ-07142023.pth
- CM_MLPERF_INFERENCE_VERSION:
- ENV variables:
-
Group "backend"
Click here to expand this section.
_tensorrt
(default)- ENV variables:
- CM_MLPERF_BACKEND:
tensorrt
- CM_MLPERF_BACKEND_NAME:
TensorRT
- CM_MLPERF_BACKEND:
- ENV variables:
-
Group "batch-size"
Click here to expand this section.
_batch_size.#
- ENV variables:
- CM_MODEL_BATCH_SIZE:
#
- CM_MLPERF_NVIDIA_HARNESS_GPU_BATCH_SIZE:
#
- CM_MODEL_BATCH_SIZE:
- ENV variables:
-
Group "build-engine-options"
Click here to expand this section.
_build_engine_options.#
- ENV variables:
- CM_MLPERF_NVIDIA_HARNESS_EXTRA_BUILD_ENGINE_OPTIONS:
#
- CM_MLPERF_NVIDIA_HARNESS_EXTRA_BUILD_ENGINE_OPTIONS:
- ENV variables:
-
Group "device"
Click here to expand this section.
_cpu
- ENV variables:
- CM_MLPERF_DEVICE:
cpu
- CM_MLPERF_DEVICE:
- ENV variables:
_cuda
(default)- ENV variables:
- CM_MLPERF_DEVICE:
gpu
- CM_MLPERF_DEVICE_LIB_NAMESPEC:
cudart
- CM_MLPERF_DEVICE:
- ENV variables:
-
Group "device-memory"
Click here to expand this section.
_gpu_memory.16
- ENV variables:
- CM_NVIDIA_GPU_MEMORY:
16
- CM_NVIDIA_GPU_MEMORY:
- ENV variables:
_gpu_memory.24
- ENV variables:
- CM_NVIDIA_GPU_MEMORY:
24
- CM_NVIDIA_GPU_MEMORY:
- ENV variables:
_gpu_memory.32
- ENV variables:
- CM_NVIDIA_GPU_MEMORY:
32
- CM_NVIDIA_GPU_MEMORY:
- ENV variables:
_gpu_memory.40
- ENV variables:
- CM_NVIDIA_GPU_MEMORY:
40
- CM_NVIDIA_GPU_MEMORY:
- ENV variables:
_gpu_memory.48
- ENV variables:
- CM_NVIDIA_GPU_MEMORY:
48
- CM_NVIDIA_GPU_MEMORY:
- ENV variables:
_gpu_memory.8
- ENV variables:
- CM_NVIDIA_GPU_MEMORY:
8
- CM_NVIDIA_GPU_MEMORY:
- ENV variables:
_gpu_memory.80
- ENV variables:
- CM_NVIDIA_GPU_MEMORY:
80
- CM_NVIDIA_GPU_MEMORY:
- ENV variables:
-
Group "dla-batch-size"
Click here to expand this section.
_dla_batch_size.#
- ENV variables:
- CM_MLPERF_NVIDIA_HARNESS_DLA_BATCH_SIZE:
#
- CM_MLPERF_SUT_NAME_RUN_CONFIG_SUFFIX2:
dla_batch_size.#
- CM_MLPERF_NVIDIA_HARNESS_DLA_BATCH_SIZE:
- ENV variables:
-
Group "gpu-connection"
Click here to expand this section.
_pcie
_sxm
-
Group "gpu-name"
Click here to expand this section.
_a100
- ENV variables:
- CM_NVIDIA_CUSTOM_GPU:
yes
- CM_NVIDIA_CUSTOM_GPU:
- ENV variables:
_a6000
- ENV variables:
- CM_NVIDIA_CUSTOM_GPU:
yes
- CM_NVIDIA_CUSTOM_GPU:
- ENV variables:
_custom
- ENV variables:
- CM_NVIDIA_CUSTOM_GPU:
yes
- CM_MODEL_BATCH_SIZE: ``
- CM_MLPERF_NVIDIA_HARNESS_GPU_BATCH_SIZE:
<<<CM_MODEL_BATCH_SIZE>>>
- CM_NVIDIA_CUSTOM_GPU:
- ENV variables:
_l4
- ENV variables:
- CM_NVIDIA_CUSTOM_GPU:
yes
- CM_NVIDIA_CUSTOM_GPU:
- ENV variables:
_orin
- ENV variables:
- CM_NVIDIA_CUSTOM_GPU:
yes
- CM_MODEL_BATCH_SIZE: ``
- CM_MLPERF_NVIDIA_HARNESS_GPU_BATCH_SIZE:
<<<CM_MODEL_BATCH_SIZE>>>
- CM_NVIDIA_CUSTOM_GPU:
- ENV variables:
_rtx_4090
- ENV variables:
- CM_NVIDIA_CUSTOM_GPU:
yes
- CM_NVIDIA_CUSTOM_GPU:
- ENV variables:
_rtx_6000_ada
- ENV variables:
- CM_NVIDIA_CUSTOM_GPU:
yes
- CM_NVIDIA_CUSTOM_GPU:
- ENV variables:
_t4
- ENV variables:
- CM_NVIDIA_CUSTOM_GPU:
yes
- CM_NVIDIA_CUSTOM_GPU:
- ENV variables:
-
Group "loadgen-scenario"
Click here to expand this section.
_multistream
- ENV variables:
- CM_MLPERF_LOADGEN_SCENARIO:
MultiStream
- CM_MLPERF_LOADGEN_SCENARIO:
- ENV variables:
_offline
- ENV variables:
- CM_MLPERF_LOADGEN_SCENARIO:
Offline
- CM_MLPERF_LOADGEN_SCENARIO:
- ENV variables:
_server
- ENV variables:
- CM_MLPERF_LOADGEN_SCENARIO:
Server
- CM_MLPERF_LOADGEN_SCENARIO:
- ENV variables:
_singlestream
- ENV variables:
- CM_MLPERF_LOADGEN_SCENARIO:
SingleStream
- CUDA_VISIBLE_DEVICES_NOT_USED:
0
- CM_MLPERF_LOADGEN_SCENARIO:
- ENV variables:
-
Group "model"
Click here to expand this section.
_3d-unet-99
- ENV variables:
- CM_MODEL:
3d-unet-99
- CM_ML_MODEL_STARTING_WEIGHTS_FILENAME:
https://zenodo.org/record/5597155/files/3dunet_kits19_128x128x128.onnx
- CM_ML_MODEL_WEIGHT_TRANSFORMATIONS:
quantization, affine fusion
- CM_ML_MODEL_INPUTS_DATA_TYPE:
int8
- CM_ML_MODEL_WEIGHTS_DATA_TYPE:
int8
- CM_MODEL:
- ENV variables:
_3d-unet-99.9
- ENV variables:
- CM_MODEL:
3d-unet-99.9
- CM_ML_MODEL_STARTING_WEIGHTS_FILENAME:
https://zenodo.org/record/5597155/files/3dunet_kits19_128x128x128.onnx
- CM_ML_MODEL_WEIGHT_TRANSFORMATIONS:
quantization, affine fusion
- CM_ML_MODEL_INPUTS_DATA_TYPE:
int8
- CM_ML_MODEL_WEIGHTS_DATA_TYPE:
int8
- CM_MODEL:
- ENV variables:
_bert-99
- ENV variables:
- CM_MODEL:
bert-99
- CM_NOT_ML_MODEL_STARTING_WEIGHTS_FILENAME:
https://zenodo.org/record/3750364/files/bert_large_v1_1_fake_quant.onnx
- CM_ML_MODEL_WEIGHT_TRANSFORMATIONS:
quantization, affine fusion
- CM_ML_MODEL_INPUTS_DATA_TYPE:
int32
- CM_ML_MODEL_WEIGHTS_DATA_TYPE:
int8
- CM_MODEL:
- ENV variables:
_bert-99.9
- ENV variables:
- CM_MODEL:
bert-99.9
- CM_NOT_ML_MODEL_STARTING_WEIGHTS_FILENAME:
https://zenodo.org/record/3733910/files/model.onnx
- CM_ML_MODEL_WEIGHT_TRANSFORMATIONS:
quantization, affine fusion
- CM_ML_MODEL_INPUTS_DATA_TYPE:
int32
- CM_ML_MODEL_WEIGHTS_DATA_TYPE:
fp16
- CM_MODEL:
- ENV variables:
_dlrm-v2-99
- ENV variables:
- CM_MODEL:
dlrm-v2-99
- CM_ML_MODEL_WEIGHT_TRANSFORMATIONS:
affine fusion
- CM_ML_MODEL_INPUTS_DATA_TYPE:
fp32
- CM_ML_MODEL_WEIGHTS_DATA_TYPE:
fp16
- CM_MODEL:
- ENV variables:
_dlrm-v2-99.9
- ENV variables:
- CM_MODEL:
dlrm-v2-99.9
- CM_ML_MODEL_WEIGHT_TRANSFORMATIONS:
affine fusion
- CM_ML_MODEL_INPUTS_DATA_TYPE:
fp32
- CM_ML_MODEL_WEIGHTS_DATA_TYPE:
fp16
- CM_MODEL:
- ENV variables:
_gptj-99
- ENV variables:
- CM_MODEL:
gptj-99
- CM_ML_MODEL_WEIGHT_TRANSFORMATIONS:
quantization, affine fusion
- CM_ML_MODEL_INPUTS_DATA_TYPE:
int32
- CM_ML_MODEL_WEIGHTS_DATA_TYPE:
fp16
- CM_MODEL:
- ENV variables:
_gptj-99.9
- ENV variables:
- CM_MODEL:
gptj-99.9
- CM_ML_MODEL_WEIGHT_TRANSFORMATIONS:
quantization, affine fusion
- CM_ML_MODEL_INPUTS_DATA_TYPE:
int32
- CM_ML_MODEL_WEIGHTS_DATA_TYPE:
fp16
- CM_MODEL:
- ENV variables:
_resnet50
(default)- ENV variables:
- CM_MODEL:
resnet50
- CM_ML_MODEL_WEIGHT_TRANSFORMATIONS:
quantization, affine fusion
- CM_ML_MODEL_INPUTS_DATA_TYPE:
int8
- CM_ML_MODEL_WEIGHTS_DATA_TYPE:
int8
- CM_MODEL:
- ENV variables:
_retinanet
- ENV variables:
- CM_MODEL:
retinanet
- CM_ML_MODEL_STARTING_WEIGHTS_FILENAME:
https://zenodo.org/record/6617981/files/resnext50_32x4d_fpn.pth
- CM_ML_MODEL_WEIGHT_TRANSFORMATIONS:
quantization, affine fusion
- CM_ML_MODEL_INPUTS_DATA_TYPE:
int8
- CM_ML_MODEL_WEIGHTS_DATA_TYPE:
int8
- CM_MODEL:
- ENV variables:
_rnnt
- ENV variables:
- CM_MODEL:
rnnt
- CM_ML_MODEL_STARTING_WEIGHTS_FILENAME:
https://zenodo.org/record/3662521/files/DistributedDataParallel_1576581068.9962234-epoch-100.pt
- CM_ML_MODEL_WEIGHT_TRANSFORMATIONS:
quantization, affine fusion
- CM_ML_MODEL_INPUTS_DATA_TYPE:
fp16
- CM_ML_MODEL_WEIGHTS_DATA_TYPE:
fp16
- CM_MODEL:
- ENV variables:
-
Group "num-gpus"
Click here to expand this section.
_num-gpus.#
- ENV variables:
- CM_NVIDIA_NUM_GPUS:
#
- CM_NVIDIA_NUM_GPUS:
- ENV variables:
_num-gpus.1
(default)- ENV variables:
- CM_NVIDIA_NUM_GPUS:
1
- CM_NVIDIA_NUM_GPUS:
- ENV variables:
-
Group "power-mode"
Click here to expand this section.
_maxn
- ENV variables:
- CM_MLPERF_NVIDIA_HARNESS_MAXN:
True
- CM_MLPERF_NVIDIA_HARNESS_MAXN:
- ENV variables:
_maxq
- ENV variables:
- CM_MLPERF_NVIDIA_HARNESS_MAXQ:
True
- CM_MLPERF_NVIDIA_HARNESS_MAXQ:
- ENV variables:
-
Group "run-mode"
Click here to expand this section.
_build
- ENV variables:
- MLPERF_NVIDIA_RUN_COMMAND:
build
- CM_MLPERF_NVIDIA_HARNESS_RUN_MODE:
build
- MLPERF_NVIDIA_RUN_COMMAND:
- ENV variables:
_build_engine
- Aliases:
_build-engine
- ENV variables:
- MLPERF_NVIDIA_RUN_COMMAND:
generate_engines
- CM_MLPERF_NVIDIA_HARNESS_RUN_MODE:
generate_engines
- MLPERF_NVIDIA_RUN_COMMAND:
- Aliases:
_calibrate
- ENV variables:
- MLPERF_NVIDIA_RUN_COMMAND:
calibrate
- CM_MLPERF_NVIDIA_HARNESS_RUN_MODE:
calibrate
- MLPERF_NVIDIA_RUN_COMMAND:
- ENV variables:
_download_model
- ENV variables:
- MLPERF_NVIDIA_RUN_COMMAND:
download_model
- CM_MLPERF_NVIDIA_HARNESS_RUN_MODE:
download_model
- MLPERF_NVIDIA_RUN_COMMAND:
- ENV variables:
_prebuild
- ENV variables:
- MLPERF_NVIDIA_RUN_COMMAND:
prebuild
- CM_MLPERF_NVIDIA_HARNESS_RUN_MODE:
prebuild
- MLPERF_NVIDIA_RUN_COMMAND:
- ENV variables:
_preprocess_data
- ENV variables:
- MLPERF_NVIDIA_RUN_COMMAND:
preprocess_data
- CM_MLPERF_NVIDIA_HARNESS_RUN_MODE:
preprocess_data
- MLPERF_NVIDIA_RUN_COMMAND:
- ENV variables:
_run_harness
(default)- ENV variables:
- CM_MLPERF_NVIDIA_HARNESS_RUN_MODE:
run_harness
- MLPERF_NVIDIA_RUN_COMMAND:
run_harness
- CM_CALL_MLPERF_RUNNER:
yes
- CM_MLPERF_NVIDIA_HARNESS_RUN_MODE:
- ENV variables:
-
Group "triton"
Click here to expand this section.
_use_triton
- ENV variables:
- CM_MLPERF_NVIDIA_HARNESS_USE_TRITON:
yes
- CM_MLPERF_SUT_NAME_RUN_CONFIG_SUFFIX3:
using_triton
- CM_MLPERF_NVIDIA_HARNESS_USE_TRITON:
- ENV variables:
-
Group "version"
Click here to expand this section.
_v4.0
(default)- ENV variables:
- CM_MLPERF_INFERENCE_VERSION:
v4.0
- CM_MLPERF_GPTJ_MODEL_FP8_PATH_SUFFIX:
GPTJ-FP8-quantized
- CM_MLPERF_INFERENCE_VERSION:
- ENV variables:
Default variations
_cuda,_num-gpus.1,_resnet50,_run_harness,_tensorrt,_v4.0
Script flags mapped to environment
--audio_buffer_num_lines=value
→CM_MLPERF_NVIDIA_HARNESS_AUDIO_BUFFER_NUM_LINES=value
--count=value
→CM_MLPERF_LOADGEN_QUERY_COUNT=value
--deque_timeout_usec=value
→CM_MLPERF_NVIDIA_HARNESS_DEQUE_TIMEOUT_USEC=value
--devices=value
→CM_MLPERF_NVIDIA_HARNESS_DEVICES=value
--dla_batch_size=value
→CM_MLPERF_NVIDIA_HARNESS_DLA_BATCH_SIZE=value
--dla_copy_streams=value
→CM_MLPERF_NVIDIA_HARNESS_DLA_COPY_STREAMS=value
--dla_inference_streams=value
→CM_MLPERF_NVIDIA_HARNESS_DLA_INFERENCE_STREAMS=value
--embedding_weights_on_gpu_part=value
→CM_MLPERF_NVIDIA_HARNESS_EMBEDDING_WEIGHTS_ON_GPU_PART=value
--enable_sort=value
→CM_MLPERF_NVIDIA_HARNESS_ENABLE_SORT=value
--end_on_device=value
→CM_MLPERF_NVIDIA_HARNESS_END_ON_DEVICE=value
--extra_run_options=value
→CM_MLPERF_NVIDIA_HARNESS_EXTRA_RUN_OPTIONS=value
--gpu_batch_size=value
→CM_MLPERF_NVIDIA_HARNESS_GPU_BATCH_SIZE=value
--gpu_copy_streams=value
→CM_MLPERF_NVIDIA_HARNESS_GPU_COPY_STREAMS=value
--gpu_inference_streams=value
→CM_MLPERF_NVIDIA_HARNESS_GPU_INFERENCE_STREAMS=value
--graphs_max_seqlen=value
→CM_MLPERF_NVIDIA_HARNESS_GRAPHS_MAX_SEQLEN=value
--input_format=value
→CM_MLPERF_NVIDIA_HARNESS_INPUT_FORMAT=value
--log_dir=value
→CM_MLPERF_NVIDIA_HARNESS_LOG_DIR=value
--make_cmd=value
→MLPERF_NVIDIA_RUN_COMMAND=value
--max_batchsize=value
→CM_MLPERF_LOADGEN_MAX_BATCHSIZE=value
--max_dlas=value
→CM_MLPERF_NVIDIA_HARNESS_MAX_DLAS=value
--mlperf_conf=value
→CM_MLPERF_CONF=value
--mode=value
→CM_MLPERF_LOADGEN_MODE=value
--multistream_target_latency=value
→CM_MLPERF_LOADGEN_MULTISTREAM_TARGET_LATENCY=value
--num_issue_query_threads=value
→CM_MLPERF_NVIDIA_HARNESS_NUM_ISSUE_QUERY_THREADS=value
--num_sort_segments=value
→CM_MLPERF_NVIDIA_HARNESS_NUM_SORT_SEGMENTS=value
--num_warmups=value
→CM_MLPERF_NVIDIA_HARNESS_NUM_WARMUPS=value
--offline_target_qps=value
→CM_MLPERF_LOADGEN_OFFLINE_TARGET_QPS=value
--output_dir=value
→CM_MLPERF_OUTPUT_DIR=value
--performance_sample_count=value
→CM_MLPERF_LOADGEN_PERFORMANCE_SAMPLE_COUNT=value
--power_setting=value
→CM_MLPERF_NVIDIA_HARNESS_POWER_SETTING=value
--rerun=value
→CM_RERUN=value
--run_infer_on_copy_streams=value
→CM_MLPERF_NVIDIA_HARNESS_RUN_INFER_ON_COPY_STREAMS=value
--scenario=value
→CM_MLPERF_LOADGEN_SCENARIO=value
--server_target_qps=value
→CM_MLPERF_LOADGEN_SERVER_TARGET_QPS=value
--singlestream_target_latency=value
→CM_MLPERF_LOADGEN_SINGLESTREAM_TARGET_LATENCY=value
--skip_postprocess=value
→CM_MLPERF_NVIDIA_HARNESS_SKIP_POSTPROCESS=value
--skip_preprocess=value
→CM_SKIP_PREPROCESS_DATASET=value
--skip_preprocessing=value
→CM_SKIP_PREPROCESS_DATASET=value
--soft_drop=value
→CM_MLPERF_NVIDIA_HARNESS_SOFT_DROP=value
--start_from_device=value
→CM_MLPERF_NVIDIA_HARNESS_START_FROM_DEVICE=value
--target_latency=value
→CM_MLPERF_LOADGEN_TARGET_LATENCY=value
--target_qps=value
→CM_MLPERF_LOADGEN_TARGET_QPS=value
--use_cuda_thread_per_device=value
→CM_MLPERF_NVIDIA_HARNESS_USE_CUDA_THREAD_PER_DEVICE=value
--use_deque_limit=value
→CM_MLPERF_NVIDIA_HARNESS_USE_DEQUE_LIMIT=value
--use_fp8=value
→CM_MLPERF_NVIDIA_HARNESS_USE_FP8=value
--use_graphs=value
→CM_MLPERF_NVIDIA_HARNESS_USE_GRAPHS=value
--use_small_tile_gemm_plugin=value
→CM_MLPERF_NVIDIA_HARNESS_USE_SMALL_TILE_GEMM_PLUGIN=value
--use_triton=value
→CM_MLPERF_NVIDIA_HARNESS_USE_TRITON=value
--user_conf=value
→CM_MLPERF_USER_CONF=value
--workspace_size=value
→CM_MLPERF_NVIDIA_HARNESS_WORKSPACE_SIZE=value
Default environment
These keys can be updated via --env.KEY=VALUE
or env
dictionary in @input.json
or using script flags.
- CM_BATCH_COUNT:
1
- CM_BATCH_SIZE:
1
- CM_FAST_COMPILATION:
yes
- CM_MLPERF_LOADGEN_SCENARIO:
Offline
- CM_MLPERF_LOADGEN_MODE:
performance
- CM_SKIP_PREPROCESS_DATASET:
no
- CM_SKIP_MODEL_DOWNLOAD:
no
- CM_MLPERF_SUT_NAME_IMPLEMENTATION_PREFIX:
nvidia_original
- CM_MLPERF_SKIP_RUN:
no
Native script being run
No run file exists for Windows
Script output
cmr "reproduce mlcommons mlperf inference harness nvidia-harness nvidia [variations]" [--input_flags] -j