CM "script" automation specification
Please check the CM documentation for more details about the CM automation language.
See the automatically generated catalog of all CM scripts from MLCommons.
Understanding CM scripts
- A CM script is identified by a set of tags and by unique ID.
- Further each CM script can have multiple variations and they are identified by variation tags which are treated in the same way as tags and identified by a
_
prefix.
CM script execution flow
- When a CM script is invoked (either by tags or by unique ID), its
_cm.json
is processed first which will check for anydeps
script and if there are, then they are executed in order. - Once all the
deps
scripts are executed,customize.py
file is checked and if existingpreprocess
function inside it is executed if present. - Then any
prehook_deps
CM scripts mentioned in_cm.json
are executed similar todeps
- After this, keys in
env
dictionary is exported asENV
variables andrun
file if exists is executed. - Once run file execution is done, any
posthook_deps
CM scripts mentioned in_cm.json
are executed similar todeps
- Then
postprocess
function inside customize.py is executed if present. - After this stage any
post_deps
CM scripts mentioned in_cm.json
is executed.
** If a script is already cached, then the preprocess
, run file
and postprocess
executions won't happen and only the dependencies marked as dynamic
will be executed from deps
, prehook_deps
, posthook_deps
and postdeps
.
Input flags
When we run a CM script we can also pass inputs to it and any input added in input_mapping
dictionary inside _cm.json
gets converted to the corresponding ENV
variable.
Conditional execution of any deps
, post_deps
We can use skip_if_env
dictionary inside any deps
, prehook_deps
, posthook_deps
or post_deps
to make its execution conditional
Versions
We can specify any specific version of a script using version
. version_max
and version_min
are also possible options.
-
When
version_min
is given, any version above this if present in the cache or detected in the system can be chosen. If nothing is detecteddefault_version
if present and if aboveversion_min
will be used for installation. Otherwiseversion_min
will be used asversion
. -
When
version_max
is given, any version below this if present in the cache or detected in the system can be chosen. If nothing is detecteddefault_version
if present and if belowversion_max
will be used for installation. Otherwiseversion_max_usable
(additional needed input forversion_max
) will be used asversion
.
Variations
- Variations are used to customize CM script and each unique combination of variations uses a unique cache entry. Each variation can turn on
env
keys also any other meta including dependencies specific to it. Variations are turned on like tags but with a_
prefix. For example, if a script is having tags"get,myscript"
, to call the variation"test"
inside it, we have to use tags"get,myscript,_test"
.
Variation groups
group
is a key to map variations into a group and at any time only one variation from a group can be used in the variation tags. For example, both cpu
and cuda
can be two variations under the device
group, but user can at any time use either cpu
or cuda
as variation tags but not both.
Dynamic variations
Sometimes it is difficult to add all variations needed for a script like say batch_size
which can take many different values. To handle this case, we support dynamic variations using '#' where '#' can be dynamically replaced by any string. For example, "_batch_size.8"
can be used as a tag to turn on the dynamic variation "_batch_size.#"
.
ENV flow during CM script execution
- During a given script execution incoming
env
dictionary is saved(saved_env)
and all the updates happens on a copy of it. - Once a script execution is over (which includes all the dependent script executions as well), newly created keys and any updated keys are merged with the
saved_env
provided the keys are mentioned innew_env_keys
- Same behaviour applies to
state
dictionary.
Special env keys
- Any env key with a prefix
CM_TMP_*
andCM_GIT_*
are not passed by default to any dependency. These can be force passed by adding the key(s) to theforce_env_keys
list of the concerned dependency. - Similarly we can avoid any env key from being passed to a given dependency by adding the prefix of the key in the
clean_env_keys
list of the concerned dependency. --input
is automatically converted toCM_INPUT
env keyversion
is converted toCM_VERSION
,`version_min
toCM_VERSION_MIN
andversion_max
toCM_VERSION_MAX
- If
env['CM_GH_TOKEN']=TOKEN_VALUE
is set then git URLs (specified byCM_GIT_URL
) are changed to add this token. - If
env['CM_GIT_SSH']=yes
, then git URLs are changed to SSH from HTTPS.
Script Meta
Special keys in script meta
- TBD:
reuse_version
,inherit_variation_tags
,update_env_tags_from_env
How cache works?
- If
cache=true
is set in a script meta, the result of the script execution is cached for further use. - For a cached script,
env
andstate
updates are done usingnew_env
andnew_state
dictionaries which are stored in thecm-cached.json
file inside the cached folder. - By using
--new
input, a new cache entry can be forced even when an old one exist. - By default no depndencies are run for a cached entry unless
dynamic
key is set for it.
Please see here for trying CM scripts.
© 2022-24 MLCommons