Commandline Help (Options)
Flags | Argument | Description |
---|---|---|
-accept, --accept | Accept ALL confirmations and silence prompts | |
-ap, --acid-partition-count | Set the limit of partitions that the ACID strategy will work with. '-1' means no-limit. | |
-asm, --avro-schema-migration | Migrate AVRO Schema Files referenced in TBLPROPERTIES by 'avro.schema.url'. Without migration it is expected that the file will exist on the other cluster and match the 'url' defined in the schema DDL. If it's not present, schema creation will FAIL. Specifying this option REQUIRES the LEFT and RIGHT cluster to be LINKED. See docs: https://github.com/cloudera-labs/hms-mirror#linking-clusters-storage-layers | |
-at, --auto-tune | Auto-tune Session Settings for SELECT's and DISTRIBUTION for Partition INSERT's. | |
-c, --concurrency | Set application concurrency, default is 10 | |
-cfg, --config | Config with details for the HMS-Mirror. Default: $HOME/.hms-mirror/cfg/default.yaml | |
-cine, --create-if-not-exist | CREATE table/partition statements will be adjusted to include 'IF NOT EXISTS'. This will ensure all remaining sql statements will be run. This can be used to sync partition definitions for existing tables. | |
-cs, --common-storage | Common Storage used with Data Strategy HYBRID, SQL, EXPORT_IMPORT. This will change the way these methods are implemented by using the specified storage location as an 'common' storage point between two clusters. In this case, the cluster do NOT need to be 'linked'. Each cluster DOES need to have access to the location and authorization to interact with the location. This may mean additional configuration requirements for 'hdfs' to ensure this seamless access. | |
-cto, --compress-text-output | Data movement (SQL/STORAGE_MIGRATION) of TEXT based file formats will be compressed in the new table. | |
-d, --data-strategy | Specify how the data will follow the schema. [DUMP, SCHEMA_ONLY, LINKED, SQL, EXPORT_IMPORT, HYBRID, CONVERT_LINKED, STORAGE_MIGRATION, COMMON, ICEBERG_CONVERSION] | |
-da, --downgrade-acid | Downgrade ACID tables to EXTERNAL tables with purge. | |
-db, --database | Comma separated list of Databases (upto 100). | |
-dbo, --database-only | Migrate the Database definitions as they exist from LEFT to RIGHT | |
-dbp, --db-prefix | Optional: A prefix to add to the RIGHT cluster DB Name. Usually used for testing. | |
-dbr, --db-rename | Optional: Rename target db to ... This option is only valid when '1' database is listed in | |
-dbRegEx, --database-regex | RegEx of Database to include in process. | |
-dbsp, --database-skip-properties | Comma separated list of database properties (regex) to skip during the migration process. This will prevent the property from being set on the target cluster. | |
-dc, --distcp | Build the 'distcp' workplans. Optional argument (PULL, PUSH) to define which cluster is running the distcp commands. Default is PULL. | |
-dms, --data-movement-strategy | Specify how the data will follow the schema. [SQL, DISTCP, MANUAL] | |
-dp, --decrypt-password | Used this in conjunction with '-pkey' to decrypt the generated passcode from | |
-ds, --dump-source | Specify which 'cluster' is the source for the DUMP strategy (LEFT\|RIGHT). | |
-dtd, --dump-test-data | Used to dump a data set that can be feed into the process for testing. | |
-e, --execute | Execute actions request, without this flag the process is a dry-run. | |
-ep, --export-partition-count | Set the limit of partitions that the EXPORT_IMPORT strategy will work with. | |
-epl, --evaluate-partition-location | For SCHEMA_ONLY and DUMP data-strategies, review the partition locations and build partition metadata calls to create them is they can't be located via 'MSCK'. | |
-ewd, --external-warehouse-directory | The external warehouse directory path. Should not include the namespace OR the database directory. This will be used to set the LOCATION database option. | |
-f, --flip | Flip the definitions for LEFT and RIGHT. Allows the same config to be used in reverse. | |
-fel, --force-external-location | Under some conditions, the LOCATION element for EXTERNAL tables is removed (ie: -rdl). In which case we rely on the settings of the database definition to control the EXTERNAL table data location. But for some older Hive versions, the LOCATION element in the database is NOT honored. Even when the database LOCATION is set, the EXTERNAL table LOCATION defaults to the system wide warehouse settings. This flag will ensure the LOCATION element remains in the CREATE definition of the table to force it's location. | |
-glm, --global-location-map | <key=value> | Comma separated key=value pairs of Locations to Map. IE: /myorig/data/finance=/data/ec/finance. This reviews 'EXTERNAL' table locations for the path '/myorig/data/finance' and replaces it with '/data/ec/finance'. Option can be used alone or with -rdl. Only applies to 'EXTERNAL' tables and if the tables location doesn't contain one of the supplied maps, it will be translated according to -rdl rules if -rdl is specified. If -rdl is not specified, the conversion for that table is skipped. |
-h, --help | Help | |
-ip, --in-place | Downgrade ACID tables to EXTERNAL tables with purge. | |
-is, --intermediate-storage | Intermediate Storage used with Data Strategy HYBRID, SQL, EXPORT_IMPORT. This will change the way these methods are implemented by using the specified storage location as an intermediate transfer point between two clusters. In this case, the cluster do NOT need to be 'linked'. Each cluster DOES need to have access to the location and authorization to interact with the location. This may mean additional configuration requirements for 'hdfs' to ensure this seamless access. | |
-itpo, --iceberg-table-property-overrides | <key=value> | Comma separated key=value pairs of Iceberg Table Properties to set/override. |
-iv, --iceberg-version | Specify the Iceberg Version to use. Specify 1 or 2. Default is 2. | |
-ltd, --load-test-data | Use the data saved by the | |
-ma, --migrate-acid | <bucket-threshold (2)> | Migrate ACID tables (if strategy allows). Optional: ArtificialBucketThreshold count that will remove the bucket definition if it's below this. Use this as a way to remove artificial bucket definitions that were added 'artificially' in legacy Hive. (default: 2) |
-mao, --migrate-acid-only | <bucket-threshold (2)> | Migrate ACID tables ONLY (if strategy allows). Optional: ArtificialBucketThreshold count that will remove the bucket definition if it's below this. Use this as a way to remove artificial bucket definitions that were added 'artificially' in legacy Hive. (default: 2) |
-mnn, --migrate-non-native | Migrate Non-Native tables (if strategy allows). These include table definitions that rely on external connections to systems like: HBase, Kafka, JDBC | |
-mnno, --migrate-non-native-only | Migrate Non-Native tables (if strategy allows). These include table definitions that rely on external connections to systems like: HBase, Kafka, JDBC | |
-np, --no-purge | For SCHEMA_ONLY, COMMON, and LINKED data strategies set RIGHT table to NOT purge on DROP | |
-o, --output-dir | Output Directory (default: $HOME/.hms-mirror/reports/<yyyy-MM-dd_HH-mm-ss>) | |
-p, --password | Used this in conjunction with '-pkey' to generate the encrypted password that you'll add to the configs for the JDBC connections. | |
-pkey, --password-key | The key used to encrypt / decrypt the cluster jdbc passwords. If not present, the passwords will be processed as is (clear text) from the config file. | |
-po, --property-overrides | <key=value> | Comma separated key=value pairs of Hive properties you wish to set/override. |
-pol, --property-overrides-left | <key=value> | Comma separated key=value pairs of Hive properties you wish to set/override for LEFT cluster. |
-por, --property-overrides-right | <key=value> | Comma separated key=value pairs of Hive properties you wish to set/override for RIGHT cluster. |
-pt, --pass-through | <key=value> | Key=value property to pass-through to the configuration. This will allow you to set properties that are not part of the HMS-Mirror configuration or part of spring's configuration. |
-q, --quiet | Reduce screen reporting output. Good for background processes with output redirects to a file | |
-rdl, --reset-to-default-location | Strip 'LOCATION' from all target cluster definitions. This will allow the system defaults to take over and define the location of the new datasets. | |
-replay, --replay | Use to replay process from the report output. | |
-rid, --right-is-disconnected | Don't attempt to connect to the 'right' cluster and run in this mode | |
-ro, --read-only | For SCHEMA_ONLY, COMMON, and LINKED data strategies set RIGHT table to NOT purge on DROP. Intended for use with replication distcp strategies and has restrictions about existing DB's on RIGHT and PATH elements. To simply NOT set the purge flag for applicable tables, use -np. | |
-rr, --reset-right | Use this for testing to remove the database on the RIGHT using CASCADE. | |
-s, --sync | For SCHEMA_ONLY, COMMON, and LINKED data strategies. Drop and Recreate Schema's when different. Best to use with RO to ensure table/partition drops don't delete data. When used WITHOUT | |
-scw, --suppress-cli-warnings | Suppress CLI Warnings from the final on-screen report. | |
-sdpi, --sort-dynamic-partition-inserts | Used to set | |
-sf, --skip-features | Skip Features evaluation. | |
-slc, --skip-link-check | Skip Link Check. Use when going between or to Cloud Storage to avoid having to configure hms-mirror with storage credentials and libraries. This does NOT preclude your Hive Server 2 and compute environment from such requirements. | |
-slt, --skip-legacy-translation | Skip Schema Upgrades and Serde Translations | |
-smn, --storage-migration-namespace | Optional: Used with the 'data strategy STORAGE_MIGRATION to specify the target namespace. | |
-sms, --storage-migration-strict | Use 'strict' location translations for storage migration. | |
-so, --skip-optimizations | Skip any optimizations during data movement, like dynamic sorting or distribute by | |
-sp, --sql-partition-count | Set the limit of partitions that the SQL strategy will work with. '-1' means no-limit. | |
-sql, --sql-output | . This option is no longer required to get SQL out in a report. That is the default behavior. | |
-ssc, --skip-stats-collection | Skip collecting basic FS stats for a table. This WILL affect the optimizer and our ability to determine the best strategy for moving data. | |
-su, --setup | Setup a default configuration file through a series of questions | |
-swt, --save-working-tables | Save working tables (shadow tables) created during the migration process. | |
-tef, --table-exclude-filter | Filter tables (excludes) with name matching RegEx. Comparison done with 'show tables' results. Check case, that's important. Hive tables are generally stored in LOWERCASE. Make sure you double-quote the expression on the commandline. | |
-tf, --table-filter | Filter tables (inclusive) with name matching RegEx. Comparison done with 'show tables' results. Check case, that's important. Hive tables are generally stored in LOWERCASE. Make sure you double-quote the expression on the commandline. | |
-tfp, --table-filter-partition-count-limit | Filter partition tables OUT that are have more than specified here. Non Partitioned table aren't filtered. | |
-tfs, --table-filter-size-limit | Filter tables OUT that are above the indicated size. Expressed in MB | |
-to, --transfer-ownership | If available (supported) on LEFT cluster, extract and transfer the tables owner to the RIGHT cluster. Note: This will make an 'exta' SQL call on the LEFT cluster to determine the ownership. This won't be supported on CDH 5 and some other legacy Hive platforms. Beware the cost of this extra call for EVERY table, as it may slow down the process for a large volume of tables. | |
-todb, --transfer-ownership-database | If available (supported) on LEFT cluster, extract and transfer the DB owner to the RIGHT cluster. Note: This will make an 'exta' SQL call on the LEFT cluster to determine the ownership. This won't be supported on CDH 5 and some other legacy Hive platforms. | |
-totbl, --transfer-ownership-table | If available (supported) on LEFT cluster, extract and transfer the tables owner to the RIGHT cluster. Note: This will make an 'exta' SQL call on the LEFT cluster to determine the ownership. This won't be supported on CDH 5 and some other legacy Hive platforms. Beware the cost of this extra call for EVERY table, as it may slow down the process for a large volume of tables. | |
-tt, --translation-type | Translation Strategy when migrating data. (ALIGNED\|RELATIVE) Default is RELATIVE | |
-v, --views-only | Process VIEWs ONLY | |
-wd, --warehouse-directory | The warehouse directory path. Should not include the namespace OR the database directory. This will be used to set the MANAGEDLOCATION database option. | |
-wps, --warehouse-plans | <db=ext-dir:mngd-dir[,db=ext-dir:mngd-dir]...> | The warehouse plans by database. Defines a plan for a database with 'external' and 'managed' directories. |