minio

mirror of https://github.com/minio/minio.git synced 2025-11-28 13:09:09 -05:00

Author	SHA1	Message	Date
Shubhendu	708296ae1b	Heal buckets at node level (#18504 )	2023-12-05 02:17:35 -08:00
Harshavardhana	fbb5e75e01	avoid run-away goroutine build-up in notification send, use channels (#18533 ) use memory for async events when necessary and dequeue them as needed, for all synchronous events customers must enable ``` MINIO_API_SYNC_EVENTS=on ``` Async events can be lost but is upto to the admin to decide what they want, we will not create run-away number of goroutines per event instead we will queue them properly. Currently the max async workers is set to runtime.GOMAXPROCS(0) which is more than sufficient in general, but it can be made configurable in future but may not be needed.	2023-12-05 02:16:33 -08:00
Harshavardhana	f327b21557	handle crashes with ILM expiry changes (#18590 )	2023-12-05 01:14:36 -08:00
Harshavardhana	45b7253f39	parallelize renameData() cleanup upon error (#18591 )	2023-12-04 14:54:34 -08:00
Harshavardhana	05bb655efc	avoid caching metrics for timeout errors per drive (#18584 ) Bonus: combine the loop for drive/REST registration.	2023-12-04 11:54:13 -08:00
Harshavardhana	8fdfcfb562	upon RenameData() quorum error delete any partial success (#18586 ) there is potential for danglingWrites when quorum failed, where only some drives took a successful write, generally this is left to the healing routine to pick it up. However it is better that we delete it right away to avoid potential for quorum issues on version signature when there are many versions of an object.	2023-12-04 11:33:39 -08:00
Harshavardhana	e7c144eeac	avoid double MRF heal when there is versions disparity (#18585 )	2023-12-04 11:13:50 -08:00
Harshavardhana	e98172d72d	avoid hot-tier SLA to be tied to warm-tier SLA (#18581 ) it is okay if the warm-tier cannot keep up, we should continue to take I/O at hot-tier, only fail hot-tier or block it when we are disk full. Bonus: add metrics counter for these missed tasks, we will know for sure if one of the node is lagging behind or is losing too many tasks during transitioning.	2023-12-02 13:02:12 -08:00
Krishnan Parthasarathi	a50f26b7f5	Implement batch-expiration for objects (#17946 ) Based on an initial PR from - https://github.com/minio/minio/pull/17792 But fully completes it with newer finalized YAML spec.	2023-12-02 02:51:33 -08:00
Klaus Post	69294cf98a	Disable DMA optimization on windows (#18575 ) It appears that Windows can lock up when errors occur. Use regular copy here.	2023-12-01 16:13:19 -08:00
Krishnan Parthasarathi	c397fb6c7a	Minor fixes to bucket replication (#18578 )	2023-12-01 16:13:08 -08:00
Klaus Post	961b0b524e	Do not require restart when a disk is unreachable during node boot (#18576 ) A disk that is not able to initialize when an instance is started will never have a handler registered, which means a user will need to restart the node after fixing the disk; This will also prevent showing the wrong 'upgrade is needed.' error message in that case. When the disk is still failing, print an error every 30 minutes; Disk reconnection will be retried every 30 seconds. Co-authored-by: Anis Elleuch <anis@min.io>	2023-12-01 12:01:14 -08:00
Harshavardhana	109a9e3f35	skip ILM expired objects from healing (#18569 )	2023-12-01 07:56:24 -08:00
Klaus Post	5f971fea6e	Fix Mux Connect Error (#18567 ) `OpMuxConnectError` was not handled correctly. Remove local checks for single request handlers so they can run before being registered locally. Bonus: Only log IAM bootstrap on startup.	2023-12-01 00:18:04 -08:00
Klaus Post	94fbcd8ebe	Add TLS cert checksum (#18557 ) It allows validation of whether all certs match across clusters.	2023-11-30 12:13:50 -08:00
Harshavardhana	879d5dd236	site replication must heal policy mappings with correct userType (#18563 )	2023-11-30 10:34:18 -08:00
Harshavardhana	0ee722f8c3	cleanup handling of STS isAllowed and simplifies the PolicyDBGet() (#18554 )	2023-11-29 16:07:35 -08:00
Anis Eleuch	b7d11141e1	rename Force to Immediate for clarity (#18540 )	2023-11-28 22:35:16 -08:00
Klaus Post	bea0b050cd	Improve env var config error reporting (#18549 ) Improve env var config error Env vars that were set on current server but not on remotes were not reported in errors. Add these.	2023-11-28 10:39:02 -08:00
Shubhendu	ce62980d4e	Fixed transition rules getting overwritten while healing (#18542 ) While healing the latest changes of expiry rules across sites if target had pre existing transition rules, they were getting overwritten as cloned latest expiry rules from remote site were getting written as is. Fixed the same and added test cases as well. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-11-28 10:38:35 -08:00
Klaus Post	dc88865908	fix: shadowed error in getObjectFileInfo() (#18548 ) This will result in `done <- err == nil` always returning true for this path, which seems unintentional.	2023-11-28 09:47:41 -08:00
Krishnan Parthasarathi	9fbd931058	Skip versions expired by DeleteAllVersionsAction (#18537 ) Object versions expired by DeleteAllVersionsAction must not be included toward data-usage accounting.	2023-11-28 08:39:21 -08:00
jiuker	b0264bdb90	preserve null version delete marker on suspended bucket version (#18547 )	2023-11-28 08:31:33 -08:00
bestgopher	95d6f43cc8	fix(cmd/notification.go): no error when retry successful (#18530 )	2023-11-27 22:41:03 -08:00
Anis Eleuch	9cb94eb4a9	cleaning up will delete instead of rename to trash with full disk err (#18534 ) moveToTrash() function moves a folder to .trash, for example, when doing some object deletions: a data dir that has many parts will be renamed to the trash folder; However, ENOSPC is a valid error from rename(), and it can cripple a user trying to free some space in an entire disk situation. Therefore, this commit will try to do a recursive delete in that case.	2023-11-27 17:36:02 -08:00
Harshavardhana	bd0819330d	avoid Walk() API listing objects without quorum (#18535 ) This allows batch replication to basically do not attempt to copy objects that do not have read quorum. This PR also allows walk() to provide custom values for quorum under batch replication, and key rotation.	2023-11-27 17:20:04 -08:00
Harshavardhana	8d9e83fd99	support passing signatureAge conditional (#18529 ) this PR allows following policy ``` { "Version": "2012-10-17", "Statement": [ { "Sid": "Deny a presigned URL request if the signature is more than 10 min old", "Effect": "Deny", "Action": "s3:", "Resource": "arn:aws:s3:::DOC-EXAMPLE-BUCKET1/", "Condition": { "NumericGreaterThan": { "s3:signatureAge": 600000 } } } ] } ``` This is to basically disable all pre-signed URLs that are older than 10 minutes.	2023-11-27 11:30:19 -08:00
jiuker	be02333529	feat: drive sub-sys to max timeout reload (#18501 )	2023-11-27 09:15:06 -08:00
Harshavardhana	506f121576	remove frivolous logging in transition object (#18526 ) AWS S3 closes keep-alive connections frequently leading to frivolous logs filling up the MinIO logs when the transition tier is an AWS S3 bucket. Ignore such transient errors, let MinIO retry it when it can.	2023-11-26 22:18:09 -08:00
Klaus Post	ca488cce87	Add detailed parameter tracing + custom prefix (#18518 ) * Allow per handler custom prefix. * Add automatic parameter extraction	2023-11-26 01:32:59 -08:00
Shireesh Anjal	11dc723324	Pass SUBNET URL to console (#18503 ) When minio runs with MINIO_CI_CD=on, it is expected to communicate with the locally running SUBNET. This is happening in the case of MinIO via call home functionality. However, the subnet-related functionality inside the console continues to talk to the SUBNET production URL. Because of this, the console cannot be tested with a locally running SUBNET. Set the env variable CONSOLE_SUBNET_URL correctly in such cases. (The console already has code to use the value of this variable as the subnet URL)	2023-11-24 09:59:35 -08:00
Shubhendu	dd6ea18901	fix: No shallow copy needed when looking at r.Form (#18499 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-11-24 09:46:55 -08:00
Harshavardhana	9032f49f25	DiskInfo() must return errDiskNotFound not internal errors (#18514 )	2023-11-24 09:07:14 -08:00
Anis Eleuch	fbc6f3f6e8	snowball-repl: Add support of immediate tiering (#18508 ) Also, fix a possible crash when some fields are not added to the batch snowball yaml	2023-11-22 16:33:11 -08:00
Harshavardhana	fba883839d	feat: bring new HDD related performance enhancements (#18239 ) Optionally allows customers to enable - Enable an external cache to catch GET/HEAD responses - Enable skipping disks that are slow to respond in GET/HEAD when we have already achieved a quorum	2023-11-22 13:46:17 -08:00
Krishnan Parthasarathi	a93214ea63	ilm: ObjectSizeLessThan and ObjectSizeGreaterThan (#18500 )	2023-11-22 13:42:39 -08:00
Klaus Post	e6b0fc465b	tweak healing to include version-id in healing result (#18225 )	2023-11-22 12:30:31 -08:00
Anis Eleuch	70fbcfee4a	Implement batch snowball (#18485 )	2023-11-22 10:51:46 -08:00
Sveinn	d67e4d5b17	fix: check for bucket existence before FTP upload (#18496 )	2023-11-21 21:36:32 -08:00
Harshavardhana	fe3e49c4eb	use Access(F_OK) do not need to check for permissions (#18492 )	2023-11-21 15:08:41 -08:00
Shubhendu	58306a9d34	Replicate Expiry ILM configs while site replication (#18130 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-11-21 09:48:06 -08:00
Harshavardhana	a4cfb5e1ed	return errors if dataDir is missing during HeadObject() (#18477 ) Bonus: allow replication to attempt Deletes/Puts when the remote returns quorum errors of some kind, this is to ensure that MinIO can rewrite the namespace with the latest version that exists on the source.	2023-11-20 21:33:47 -08:00
Klaus Post	51aa59a737	perf: websocket grid connectivity for all internode communication (#18461 ) This PR adds a WebSocket grid feature that allows servers to communicate via a single two-way connection. There are two request types: * Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small roundtrips with small payloads. * Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`, which allows for different combinations of full two-way streams with an initial payload. Only a single stream is created between two machines - and there is, as such, no server/client relation since both sides can initiate and handle requests. Which server initiates the request is decided deterministically on the server names. Requests are made through a mux client and server, which handles message passing, congestion, cancelation, timeouts, etc. If a connection is lost, all requests are canceled, and the calling server will try to reconnect. Registered handlers can operate directly on byte slices or use a higher-level generics abstraction. There is no versioning of handlers/clients, and incompatible changes should be handled by adding new handlers. The request path can be changed to a new one for any protocol changes. First, all servers create a "Manager." The manager must know its address as well as all remote addresses. This will manage all connections. To get a connection to any remote, ask the manager to provide it given the remote address using. ``` func (m Manager) Connection(host string) Connection ``` All serverside handlers must also be registered on the manager. This will make sure that all incoming requests are served. The number of in-flight requests and responses must also be given for streaming requests. The "Connection" returned manages the mux-clients. Requests issued to the connection will be sent to the remote. * `func (c Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)` performs a single request and returns the result. Any deadline provided on the request is forwarded to the server, and canceling the context will make the function return at once. `func (c Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st Stream, err error)` will initiate a remote call and send the initial payload. ```Go // A Stream is a two-way stream. // All responses must be read by the caller. // If the call is canceled through the context, //The appropriate error will be returned. type Stream struct { // Responses from the remote server. // Channel will be closed after an error or when the remote closes. // All responses must be read by the caller until either an error is returned or the channel is closed. // Canceling the context will cause the context cancellation error to be returned. Responses <-chan Response // Requests sent to the server. // If the handler is defined with 0 incoming capacity this will be nil. // Channel must be closed to signal the end of the stream. // If the request context is canceled, the stream will no longer process requests. Requests chan<- []byte } type Response struct { Msg []byte Err error } ``` There are generic versions of the server/client handlers that allow the use of type safe implementations for data types that support msgpack marshal/unmarshal.	2023-11-20 17:09:35 -08:00
Anis Eleuch	02331a612c	batch-repl: Replicate missing metadata and standard headers (#18484 ) - Replicate Expires when the source is local or remote - Replicate metadata when the source is remote	2023-11-18 19:12:44 -08:00
Anis Eleuch	8317557f70	decom: Fix listing quorum to be equal to deletion quorum (#18476 ) With an odd number of drives per erasure set setup, the write/quorum is the half + 1; however the decommissioning listing will still list those objects and does not consider those as stale. Fix it by using (N+1)/2 formula. Co-authored-by: Anis Elleuch <anis@min.io>	2023-11-17 21:09:09 -08:00
Anis Eleuch	1bb7a2a295	Immediate transition ILM to avoid quick deferring to the scanner (#18475 ) Immediate transition use case and is mostly used to fill warm backend with a lot of data when a new deployment is created Currently, if the transition queue is complete, the transition will be deferred to the scanner; change this behavior by blocking the PUT request until the transition queue has a new place for a transition task.	2023-11-17 16:16:46 -08:00
Harshavardhana	0a286153bb	remove checking for BucketInfo() peer call for every PUT() (#18464 ) we already validate if the bucket doesn't exist in RenameData() which can handle this cleanly, instead of making a network call and returning errors.	2023-11-17 05:29:50 -08:00
Anis Eleuch	22d59e757d	Remove stale data in HEAD/GET object (#18460 ) Currently if the object does not exist in quorum disks of an erasure set, the dangling code is never called because the returned error will be errFileNotFound or errFileVersionNotFound; With this commit, when errFileNotFound or errFileVersionNotFound is returning when trying to calculate the quorum of a given object, the code checks if a disk returned nil, which means a stale object exists in that disk, that will trigger deleteIfDangling() function	2023-11-16 08:39:53 -08:00
Andreas Auernhammer	0daa2dbf59	health: split liveness and readiness handler (#18457 ) This commit splits the liveness and readiness handler into two separate handlers. In K8S, a liveness probe is used to determine whether the pod is in "live" state and functioning at all. In contrast, the readiness probe is used to determine whether the pod is ready to serve requests. A failing liveness probe causes pod restarts while a failing readiness probe causes k8s to stop routing traffic to the pod. Hence, a liveness probe should be as robust as possible while a readiness probe should be used to load balancing. Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ Signed-off-by: Andreas Auernhammer <github@aead.dev>	2023-11-16 01:51:27 -08:00
Praveen raj Mani	38f35463b7	Load bucket configs during the metadata refresh (#18449 ) This patch takes care of loading the bucket configs of failed buckets during the periodic refresh. This makes sure the event notifiers and remote bucket targets are properly initialized.	2023-11-15 12:43:25 -08:00

1 2 3 4 5 ...

5712 Commits