Commit Graph

77 Commits

Author SHA1 Message Date
Anis Eleuch
fa5d9c02ef
batch: Set a default retry attempts and a prefix (#20452)
A batch job will fail if the retry attempt is not provided. The reason
is that the code mistakenly gets the retry attempts from the job status
rather than the job yaml file.

This will also set a default empty prefix for batch expiration.

Also this will avoid trimming the prefix since the yaml decoder already
does that if no quotes were provided, and we should not trim if quotes
were provided and the user provided a leading or a trailing space.
2024-09-18 10:59:03 -07:00
Poorna
060276932d
batch:repl fix copy from source -> remote (#20382)
completes fix started by  #20365
2024-09-05 04:57:23 -07:00
Anis Eleuch
aaf4fb1184
batch: repl: A missing prefix in the remote source will fail replication (#20365)
When the prefix field is not provided in the remote source of a yaml
replication job, the code fails to do listing and makes replication
successful. This commit fixes it.
2024-09-02 05:36:43 -07:00
rubyisrust
516af01a12
chore: fix some function names (#20243)
Signed-off-by: rubyisrust <rustrover@icloud.com>
2024-08-13 11:23:33 -07:00
jiuker
50a5ad48fc
feat: support batch replication prefix slice (#20033) 2024-08-01 05:53:30 -07:00
jiuker
c87a489514
fix: support prefix when batchJob replicate enable the snowball (#20178) 2024-07-29 00:59:50 -07:00
Klaus Post
1966668066
Avoid Batch Replication Job log spam (#20158)
Only print once per job and error location.

Set default retry to default 1 second wait, and use as minimum.
2024-07-26 05:55:50 -07:00
jiuker
b3a94c4e85
fix: Use xtime duration to parse batch job (#20117) 2024-07-23 00:05:53 -07:00
Anis Eleuch
2e5d792f0c
batch-expiry: Save progress regularly in the drives and at the end (#20098)
- Also, fix failure reporting at the end.
- Also, avoid parsing report objects when listing or resuming jobs, this
does not cause any bugs, it is only printing, not useful errors.
2024-07-17 09:42:32 -07:00
Krishnan Parthasarathi
380233d646
batch: Update job info object on success (#20053) 2024-07-08 18:45:54 -07:00
Harshavardhana
32d04091a2
resume any batch jobs in a goroutine (#20035)
Bonus: move batch job initialization to the last item after all other initialization, 
            allowing for faster startup time for different subsystems.
2024-07-03 00:16:05 -07:00
Anis Eleuch
757cf413cb
Add batch status API (#19679)
Currently the status of a completed or failed batch is held in the
memory, a simple restart will lose the status and the user will not
have any visibility of the job that was long running.

In addition to the metrics, add a new API that reads the batch status
from the drives. A batch job will be cleaned up three days after
completion.

Also add the batch type in the batch id, the reason is that the batch
job request is removed immediately when the job is finished, then we
do not know the type of batch job anymore, hence a difficulty to locate
the job report
2024-07-02 01:17:52 -07:00
Poorna
91faaa1387
fix panic in batch replicate (#20014)
Fixes:

```
panic: send on closed channel
	panic: close of closed channel

goroutine 878 [running]:
github.com/minio/minio/internal/ioutil.SafeClose[...](...)
	/Users/kp/code/src/github.com/minio/minio/internal/ioutil/ioutil.go:407
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func2.2()
	/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2229 +0xc0
panic({0x108c25e60?, 0x1090b28d0?})
	/usr/local/go/src/runtime/panic.go:770 +0x124
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func2.3({{0x1400e397316, 0x5}, {0x1400d88b8a8, 0x8}, {0x1f99d80, 0xede101c42, 0x0}, 0x3bc, 0x0, 0x0, ...})
	/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2235 +0xb4
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func2()
	/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2277 +0xabc
created by github.com/minio/minio/cmd.(*erasureServerPools).Walk in goroutine 575
	/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2210 +0x33c
```
2024-06-28 18:20:47 -07:00
Harshavardhana
55aa431578
fix: on windows avoid ':' as part of the object name (#19907)
fixes #18865
avoid-colon
2024-06-10 20:13:30 -07:00
Poorna
5aaef9790f
replication: pass checksum headers to replica (#19834) 2024-06-06 02:36:42 -07:00
jiuker
d326ba52e9
feat: support batchJob for windows (#19877) 2024-06-05 08:44:53 -07:00
Aditya Manthramurthy
5f78691fcf
ldap: Add user DN attributes list config param (#19758)
This change uses the updated ldap library in minio/pkg (bumped
up to v3). A new config parameter is added for LDAP configuration to
specify extra user attributes to load from the LDAP server and to store
them as additional claims for the user.

A test is added in sts_handlers.go that shows how to access the LDAP
attributes as a claim.

This is in preparation for adding SSH pubkey authentication to MinIO's SFTP
integration.
2024-05-24 16:05:23 -07:00
Klaus Post
847ee5ac45
Make WalkDir return errors (#19677)
If used, 'opts.Marker` will cause many missed entries since results are returned 
unsorted, and pools are serialized.

Switch to fully concurrent listing and merging across pools to return sorted entries.
2024-05-06 13:27:52 -07:00
Harshavardhana
523bd769f1
add support for specific error response for InvalidRange (#19668)
fixes #19648

AWS S3 returns the actual object size as part of XML
response for InvalidRange error, this is used apparently
by SDKs to retry the request without the range.
2024-05-05 09:56:21 -07:00
Anis Eleuch
787c44c39d
batch-repl: Do not allow both source/target to be remote (#19434)
Return an error when the user specifies endpoints for both source
and target. This can generate many type of errors as the code considers
a deployment remote if its endpoint is specified.
2024-04-08 07:11:38 -07:00
Anis Eleuch
95bf4a57b6
logging: Add subsystem to log API (#19002)
Create new code paths for multiple subsystems in the code. This will
make maintaing this easier later.

Also introduce bugLogIf() for errors that should not happen in the first
place.
2024-04-04 05:04:40 -07:00
Anis Eleuch
97ce11cb6b
Avoid using a nil transport when the config is not initialized (#19405)
Make sure to pass a nil pointer as a Transport to minio-go  when the API config
is not initialized, this will make sure that we do not pass an interface
with a known type but a nil value.

This will also fix the update of the API remote_transport_deadline
configuration without requiring the cluster restart.
2024-04-03 11:27:05 -07:00
Poorna
7fd76dbbb7
fix batch snowball to close channel after listing finishes (#19316)
panic seen due to premature closing of slow channel while listing is still sending or
list has already closed on the sender's side:
```
panic: close of closed channel

goroutine 13666 [running]:
github.com/minio/minio/internal/ioutil.SafeClose[...](0x101ff51e4?)
	/Users/kp/code/src/github.com/minio/minio/internal/ioutil/ioutil.go:425 +0x24
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1()
	/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2142 +0x170
created by github.com/minio/minio/cmd.(*erasureServerPools).Walk in goroutine 1189
	/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:1985 +0x228
```
2024-03-21 16:13:43 -07:00
Harshavardhana
1173b26fc8
avoid triggering heals on metacache files if any (#19299) 2024-03-19 20:21:15 -07:00
Anis Eleuch
68dd74c5ab
batch: Separate batch job request and batch job stats (#19205)
Currently, the progress of the batch job is saved in inside the job
request object, which is normally not supported by MinIO. Though there
is no apparent bug, it is better to fix this now.

Batch progress is saved in .minio.sys/batch-jobs/reports/

Co-authored-by: Anis Eleuch <anis@min.io>
2024-03-07 10:58:22 -08:00
Praveen raj Mani
cfd8645843
fix: update batch replication stats for snowball uploads (#19045) 2024-02-13 07:33:27 -08:00
Harshavardhana
1d3bd02089
avoid close 'nil' panics if any (#18890)
brings a generic implementation that
prints a stack trace for 'nil' channel
closes(), if not safely closes it.
2024-01-28 10:04:17 -08:00
Krishnan Parthasarathi
3a90af0bcd
Add line, col to types used in batch-expire (#18747) 2024-01-08 15:22:28 -08:00
Harshavardhana
53ce92b9ca
fix: use the right channel to feed the data in (#18605)
this PR fixes a regression in batch replication
where we weren't sending any data from the Walk()
results due to incorrect channels being used.
2023-12-06 18:17:03 -08:00
Krishnan Parthasarathi
a50f26b7f5
Implement batch-expiration for objects (#17946)
Based on an initial PR from -
https://github.com/minio/minio/pull/17792

But fully completes it with newer finalized YAML spec.
2023-12-02 02:51:33 -08:00
Harshavardhana
bd0819330d
avoid Walk() API listing objects without quorum (#18535)
This allows batch replication to basically do not
attempt to copy objects that do not have read quorum.

This PR also allows walk() to provide custom
values for quorum under batch replication, and
key rotation.
2023-11-27 17:20:04 -08:00
Anis Eleuch
fbc6f3f6e8
snowball-repl: Add support of immediate tiering (#18508)
Also, fix a possible crash when some fields are not added to the batch
snowball yaml
2023-11-22 16:33:11 -08:00
Anis Eleuch
70fbcfee4a
Implement batch snowball (#18485) 2023-11-22 10:51:46 -08:00
Anis Eleuch
02331a612c
batch-repl: Replicate missing metadata and standard headers (#18484)
- Replicate Expires when the source is local or remote
- Replicate metadata when the source is remote
2023-11-18 19:12:44 -08:00
Harshavardhana
8b1e819bf3
fix: make sure to purge all the completed in resume() (#18429)
currently previously completed jobs would re-run
even if they are completed, causing incorrect behavior.
2023-11-13 08:15:00 -08:00
Harshavardhana
54721b7c7b
fix: batch replication from source allow out of band deletes (#18423)
it is possible that ILM or Deletes got triggered on batch
of objects that we are attempting to batch replicate, ignore
this scenario as valid behavior.
2023-11-10 16:12:35 -08:00
Harshavardhana
3a90fb108c only look for metadata if batch replication asks for metadata filters (#18082)
This PR changes the StatObject() to be must have for non-minio source
to being a conditional API call.

- Calls StatObject() when needed
- Calls GetObjectTagging() when needed

These calls if we do without these conditionals can cause a lot of
delays, so we avoid them if not needed in more common scenario.
2023-09-22 11:31:57 -07:00
Harshavardhana
b8ebe54e53 Revert "skip tiered objects to GLACIER in batch replication (#18044)"
This reverts commit fd421ddd6f.

MinIO already provides `filter` based on metadata that would work
in this scenario already.
2023-09-19 00:05:40 -07:00
Harshavardhana
fd421ddd6f
skip tiered objects to GLACIER in batch replication (#18044)
tiered objects to GLACIER are not readable until
they are restored, we skip these as unreadable
2023-09-18 10:25:31 -07:00
jiuker
9947c01c8e
feat: SSE-KMS use uuid instead of read all data to md5. (#17958) 2023-09-18 10:00:54 -07:00
Harshavardhana
36385010f5
use optimized pathJoin instead of path.Join (#18042)
this avoids allocations in scanner routine, they are tiny but 
they allocate a lot over many cycles of the scanner.
2023-09-16 19:08:59 -07:00
Aditya Manthramurthy
cbc0ef459b
Fix policy package import name (#18031)
We do not need to rename the import of minio/pkg/v2/policy as iampolicy
any more.
2023-09-14 14:50:16 -07:00
Aditya Manthramurthy
1c99fb106c
Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Krishnan Parthasarathi
6a67c277eb
Reuse types for key-value, notification and retry (#17936) 2023-08-29 11:27:23 -07:00
Harshavardhana
3ba927edae
fix: batch status reporting after complete (#17852)
batch status can perpetually wait after completion
due to a race between the MetricsHandler() returning
the active metrics in intervals of 1sec and delete
of metrics after job completion.

this PR ensures that we keep the 'status' around
for a while, i.e upto 24hrs for all the batch jobs.
2023-08-15 12:22:30 -07:00
Harshavardhana
b760137e1d
fix: add proxyByNode for batch jobs as part of their jobId (#17844) 2023-08-11 13:12:35 -07:00
Harshavardhana
5f56f441bf
fix: apply common notification code with content-type (#17843) 2023-08-11 11:34:43 -07:00
Anis Eleuch
a3f00c5d5e
batch: Strict unmarshal yaml document to avoid user made typos (#17808)
// UnmarshalStrict is like Unmarshal except that any fields that are found
// in the data that do not have corresponding struct members, or mapping
// keys that are duplicates, will result in
// an error.
2023-08-05 13:51:48 -07:00
Harshavardhana
533cd8d6df
fix: batch replication pull must preserve versionID (#17805)
batch replication pull must preserve versionID regardless
of destination bucket versioning configuration.

This is similar to the issue with decommissioning and rebalancing
2023-08-04 12:09:10 -07:00
Aditya Manthramurthy
bb6921bf9c
Send AuditLog via new middleware fn for admin APIs (#17632)
A new middleware function is added for admin handlers, including options
for modifying certain behaviors. This admin middleware:

- sets the handler context via reflection in the request and sends AuditLog
- checks for object API availability (skipping it if a flag is passed)
- enables gzip compression (skipping it if a flag is passed)
- enables header tracing (adding body tracing if a flag is passed)

While the new function is a middleware, due to the flags used for
conditional behavior modification, which is used in each route registration
call.

To try to ensure that no regressions are introduced, the following
changes were done mechanically mostly with `sed` and regexp:

- Remove defer logger.AuditLog in admin handlers
- Replace newContext() calls with r.Context()
- Update admin routes registration calls

Bonus: remove unused NetSpeedtestHandler

Since the new adminMiddleware function checks for object layer presence
by default, we need to pass the `noObjLayerFlag` explicitly to admin
handlers that should work even when it is not available. The following
admin handlers do not require it:

- ServerInfoHandler
- StartProfilingHandler
- DownloadProfilingHandler
- ProfileHandler
- SiteReplicationDevNull
- SiteReplicationNetPerf
- TraceHandler

For these handlers adminMiddleware does not check for the object layer
presence (disabled by passing the `noObjLayerFlag`), and for all other
handlers, the pre-check ensures that the handler is not called when the
object layer is not available - the client would get a
ErrServerNotInitialized and can retry later.

This `noObjLayerFlag` is added based on existing behavior for these
handlers only.
2023-07-13 14:52:21 -07:00