mirror of
https://github.com/minio/minio.git
synced 2025-11-25 03:56:17 -05:00
feat: implement support batch replication (#15554)
This commit is contained in:
152
docs/batch-jobs/README.md
Normal file
152
docs/batch-jobs/README.md
Normal file
@@ -0,0 +1,152 @@
|
||||
# MinIO Batch Job
|
||||
MinIO Batch jobs is an MinIO object management feature that lets you manage objects at scale. Jobs currently supported by MinIO
|
||||
|
||||
- Replicate objects between buckets on multiple sites
|
||||
|
||||
Upcoming Jobs
|
||||
|
||||
- Copy objects from NAS to MinIO
|
||||
- Copy objects from HDFS to MinIO
|
||||
|
||||
## Replication Job
|
||||
To perform replication via batch jobs, you create a job. The job consists of a job description YAML that describes
|
||||
|
||||
- Source location from where the objects must be copied from
|
||||
- Target location from where the objects must be copied to
|
||||
- Fine grained filtering is available to pick relevant objects from source to copy from
|
||||
|
||||
MinIO batch jobs framework also provides
|
||||
|
||||
- Retrying a failed job automatically driven by user input
|
||||
- Monitoring job progress in real-time
|
||||
- Send notifications upon completion or failure to user configured target
|
||||
|
||||
Following YAML describes the structure of a replication job, each value is documented and self-describing.
|
||||
|
||||
```yaml
|
||||
replicate:
|
||||
apiVersion: v1
|
||||
# source of the objects to be replicated
|
||||
source:
|
||||
type: TYPE # valid values are "minio"
|
||||
bucket: BUCKET
|
||||
prefix: PREFIX
|
||||
# NOTE: if source is remote then target must be "local"
|
||||
# endpoint: ENDPOINT
|
||||
# credentials:
|
||||
# accessKey: ACCESS-KEY
|
||||
# secretKey: SECRET-KEY
|
||||
# sessionToken: SESSION-TOKEN # Available when rotating credentials are used
|
||||
|
||||
# target where the objects must be replicated
|
||||
target:
|
||||
type: TYPE # valid values are "minio"
|
||||
bucket: BUCKET
|
||||
prefix: PREFIX
|
||||
# NOTE: if target is remote then source must be "local"
|
||||
# endpoint: ENDPOINT
|
||||
# credentials:
|
||||
# accessKey: ACCESS-KEY
|
||||
# secretKey: SECRET-KEY
|
||||
# sessionToken: SESSION-TOKEN # Available when rotating credentials are used
|
||||
|
||||
# optional flags based filtering criteria
|
||||
# for all source objects
|
||||
flags:
|
||||
filter:
|
||||
newerThan: "7d" # match objects newer than this value (e.g. 7d10h31s)
|
||||
olderThan: "7d" # match objects older than this value (e.g. 7d10h31s)
|
||||
createdAfter: "date" # match objects created after "date"
|
||||
createdBefore: "date" # match objects created before "date"
|
||||
|
||||
## NOTE: tags are not supported when "source" is remote.
|
||||
# tags:
|
||||
# - key: "name"
|
||||
# value: "pick*" # match objects with tag 'name', with all values starting with 'pick'
|
||||
|
||||
## NOTE: metadata filter not supported when "source" is non MinIO.
|
||||
# metadata:
|
||||
# - key: "content-type"
|
||||
# value: "image/*" # match objects with 'content-type', with all values starting with 'image/'
|
||||
|
||||
notify:
|
||||
endpoint: "https://notify.endpoint" # notification endpoint to receive job status events
|
||||
token: "Bearer xxxxx" # optional authentication token for the notification endpoint
|
||||
|
||||
retry:
|
||||
attempts: 10 # number of retries for the job before giving up
|
||||
delay: "500ms" # least amount of delay between each retry
|
||||
```
|
||||
|
||||
You can create and run multiple 'replication' jobs at a time there are no predefined limits set.
|
||||
|
||||
## Batch Jobs Terminology
|
||||
|
||||
### Job
|
||||
A job is the basic unit of work for MinIO Batch Job. A job is a self describing YAML, once this YAML is submitted and evaluated - MinIO performs the requested actions on each of the objects obtained under the described criteria in job YAML file.
|
||||
|
||||
### Type
|
||||
Type describes the job type, such as replicating objects between MinIO sites. Each job performs a single type of operation across all objects that match the job description criteria.
|
||||
|
||||
## Batch Jobs via Commandline
|
||||
[mc](http://github.com/minio/mc) provides 'mc batch' command to create, start and manage submitted jobs.
|
||||
|
||||
```
|
||||
NAME:
|
||||
mc batch - manage batch jobs
|
||||
|
||||
USAGE:
|
||||
mc batch COMMAND [COMMAND FLAGS | -h] [ARGUMENTS...]
|
||||
|
||||
COMMANDS:
|
||||
generate generate a new batch job definition
|
||||
start start a new batch job
|
||||
list, ls list all current batch jobs
|
||||
status summarize job events on MinIO server in real-time
|
||||
describe describe job definition for a job
|
||||
```
|
||||
|
||||
### Generate a job yaml
|
||||
```
|
||||
mc batch generate alias/ replicate
|
||||
```
|
||||
|
||||
### Start the batch job (returns back the JID)
|
||||
```
|
||||
mc batch start alias/ ./replicate.yaml
|
||||
Successfully start 'replicate' job `E24HH4nNMcgY5taynaPfxu` on '2022-09-26 17:19:06.296974771 -0700 PDT'
|
||||
```
|
||||
|
||||
### List all batch jobs
|
||||
```
|
||||
mc batch list alias/
|
||||
ID TYPE USER STARTED
|
||||
E24HH4nNMcgY5taynaPfxu replicate minioadmin 1 minute ago
|
||||
```
|
||||
|
||||
### List all 'replicate' batch jobs
|
||||
```
|
||||
mc batch list alias/ --type replicate
|
||||
ID TYPE USER STARTED
|
||||
E24HH4nNMcgY5taynaPfxu replicate minioadmin 1 minute ago
|
||||
```
|
||||
|
||||
### Real-time 'status' for a batch job
|
||||
```
|
||||
mc batch status myminio/ E24HH4nNMcgY5taynaPfxu
|
||||
●∙∙
|
||||
Objects: 28766
|
||||
Versions: 28766
|
||||
Throughput: 3.0 MiB/s
|
||||
Transferred: 406 MiB
|
||||
Elapsed: 2m14.227222868s
|
||||
CurrObjName: share/doc/xml-core/examples/foo.xmlcatalogs
|
||||
```
|
||||
|
||||
### 'describe' the batch job yaml.
|
||||
```
|
||||
mc batch describe myminio/ E24HH4nNMcgY5taynaPfxu
|
||||
replicate:
|
||||
apiVersion: v1
|
||||
...
|
||||
```
|
||||
Reference in New Issue
Block a user