minio/docs/batch-jobs/README.md

153 lines
4.8 KiB
Markdown

# MinIO Batch Job
MinIO Batch jobs is an MinIO object management feature that lets you manage objects at scale. Jobs currently supported by MinIO
- Replicate objects between buckets on multiple sites
Upcoming Jobs
- Copy objects from NAS to MinIO
- Copy objects from HDFS to MinIO
## Replication Job
To perform replication via batch jobs, you create a job. The job consists of a job description YAML that describes
- Source location from where the objects must be copied from
- Target location from where the objects must be copied to
- Fine grained filtering is available to pick relevant objects from source to copy from
MinIO batch jobs framework also provides
- Retrying a failed job automatically driven by user input
- Monitoring job progress in real-time
- Send notifications upon completion or failure to user configured target
Following YAML describes the structure of a replication job, each value is documented and self-describing.
```yaml
replicate:
apiVersion: v1
# source of the objects to be replicated
source:
type: TYPE # valid values are "minio"
bucket: BUCKET
prefix: PREFIX
# NOTE: if source is remote then target must be "local"
# endpoint: ENDPOINT
# credentials:
# accessKey: ACCESS-KEY
# secretKey: SECRET-KEY
# sessionToken: SESSION-TOKEN # Available when rotating credentials are used
# target where the objects must be replicated
target:
type: TYPE # valid values are "minio"
bucket: BUCKET
prefix: PREFIX
# NOTE: if target is remote then source must be "local"
# endpoint: ENDPOINT
# credentials:
# accessKey: ACCESS-KEY
# secretKey: SECRET-KEY
# sessionToken: SESSION-TOKEN # Available when rotating credentials are used
# optional flags based filtering criteria
# for all source objects
flags:
filter:
newerThan: "7d" # match objects newer than this value (e.g. 7d10h31s)
olderThan: "7d" # match objects older than this value (e.g. 7d10h31s)
createdAfter: "date" # match objects created after "date"
createdBefore: "date" # match objects created before "date"
## NOTE: tags are not supported when "source" is remote.
# tags:
# - key: "name"
# value: "pick*" # match objects with tag 'name', with all values starting with 'pick'
## NOTE: metadata filter not supported when "source" is non MinIO.
# metadata:
# - key: "content-type"
# value: "image/*" # match objects with 'content-type', with all values starting with 'image/'
notify:
endpoint: "https://notify.endpoint" # notification endpoint to receive job status events
token: "Bearer xxxxx" # optional authentication token for the notification endpoint
retry:
attempts: 10 # number of retries for the job before giving up
delay: "500ms" # least amount of delay between each retry
```
You can create and run multiple 'replication' jobs at a time there are no predefined limits set.
## Batch Jobs Terminology
### Job
A job is the basic unit of work for MinIO Batch Job. A job is a self describing YAML, once this YAML is submitted and evaluated - MinIO performs the requested actions on each of the objects obtained under the described criteria in job YAML file.
### Type
Type describes the job type, such as replicating objects between MinIO sites. Each job performs a single type of operation across all objects that match the job description criteria.
## Batch Jobs via Commandline
[mc](http://github.com/minio/mc) provides 'mc batch' command to create, start and manage submitted jobs.
```
NAME:
mc batch - manage batch jobs
USAGE:
mc batch COMMAND [COMMAND FLAGS | -h] [ARGUMENTS...]
COMMANDS:
generate generate a new batch job definition
start start a new batch job
list, ls list all current batch jobs
status summarize job events on MinIO server in real-time
describe describe job definition for a job
```
### Generate a job yaml
```
mc batch generate alias/ replicate
```
### Start the batch job (returns back the JID)
```
mc batch start alias/ ./replicate.yaml
Successfully start 'replicate' job `E24HH4nNMcgY5taynaPfxu` on '2022-09-26 17:19:06.296974771 -0700 PDT'
```
### List all batch jobs
```
mc batch list alias/
ID TYPE USER STARTED
E24HH4nNMcgY5taynaPfxu replicate minioadmin 1 minute ago
```
### List all 'replicate' batch jobs
```
mc batch list alias/ --type replicate
ID TYPE USER STARTED
E24HH4nNMcgY5taynaPfxu replicate minioadmin 1 minute ago
```
### Real-time 'status' for a batch job
```
mc batch status myminio/ E24HH4nNMcgY5taynaPfxu
●∙∙
Objects: 28766
Versions: 28766
Throughput: 3.0 MiB/s
Transferred: 406 MiB
Elapsed: 2m14.227222868s
CurrObjName: share/doc/xml-core/examples/foo.xmlcatalogs
```
### 'describe' the batch job yaml.
```
mc batch describe myminio/ E24HH4nNMcgY5taynaPfxu
replicate:
apiVersion: v1
...
```