minio

mirror of https://github.com/minio/minio.git synced 2024-12-25 22:55:54 -05:00

Author	SHA1	Message	Date
Anis Eleuch	bf1769d3e0	xl: Avoid marking a drive offline after one part read failure (#19779 ) This commit will fix one rare case of a multipart object that can be read in theory but GetObject API returned an error. It turned out that a six years old code was marking a drive offline when the bitrot streaming fails to read a part in a disk with any error. This can affect reading a subsequent part, though having enough shards, but unable to construct because one drive was marked offline earlier. This commit will remove the drive marking offline code. It will also close the bitrotstreaming reader before marking it as nil.	2024-05-21 07:36:21 -07:00
Klaus Post	d4b391de1b	Add PutObject Ring Buffer (#19605 ) Replace the `io.Pipe` from streamingBitrotWriter -> CreateFile with a fixed size ring buffer. This will add an output buffer for encoded shards to be written to disk - potentially via RPC. This will remove blocking when `(*streamingBitrotWriter).Write` is called, and it writes hashes and data. With current settings, the write looks like this: ``` Outbound ┌───────────────────┐ ┌────────────────┐ ┌───────────────┐ ┌────────────────┐ │ │ Parr. │ │ (http body) │ │ │ │ │ Bitrot Hash │ Write │ Pipe │ Read │ HTTP buffer │ Write (syscall) │ TCP Buffer │ │ Erasure Shard │ ──────────► │ (unbuffered) │ ────────────► │ (64K Max) │ ───────────────────► │ (4MB) │ │ │ │ │ │ (io.Copy) │ │ │ └───────────────────┘ └────────────────┘ └───────────────┘ └────────────────┘ ``` We write a Hash (32 bytes). Since the pipe is unbuffered, it will block until the 32 bytes have been delivered to the TCP buffer, and the next Read hits the Pipe. Then we write the shard data. This will typically be bigger than 64KB, so it will block until two blocks have been read from the pipe. When we insert a ring buffer: ``` Outbound ┌───────────────────┐ ┌────────────────┐ ┌───────────────┐ ┌────────────────┐ │ │ │ │ (http body) │ │ │ │ │ Bitrot Hash │ Write │ Ring Buffer │ Read │ HTTP buffer │ Write (syscall) │ TCP Buffer │ │ Erasure Shard │ ──────────► │ (2MB) │ ────────────► │ (64K Max) │ ───────────────────► │ (4MB) │ │ │ │ │ │ (io.Copy) │ │ │ └───────────────────┘ └────────────────┘ └───────────────┘ └────────────────┘ ``` The hash+shard will fit within the ring buffer, so writes will not block - but will complete after a memcopy. Reads can fill the 64KB buffer if there is data for it. If the network is congested, the ring buffer will become filled, and all syscalls will be on full buffers. Only when the ring buffer is filled will erasure coding start blocking. Since there is always "space" to write output data, we remove the parallel writing since we are always writing to memory now, and the goroutine synchronization overhead probably not worth taking. If the output were blocked in the existing, we would still wait for it to unblock in parallel write, so it would make no difference there - except now the ring buffer smoothes out the load. There are some micro-optimizations we could look at later. The biggest is that, in most cases, we could encode directly to the ring buffer - if we are not at a boundary. Also, "force filling" the Read requests (i.e., blocking until a full read can be completed) could be investigated and maybe allow concurrent memory on read and write.	2024-05-14 17:11:04 -07:00
Klaus Post	ec816f3840	Reduce parallelReader allocs (#19558 )	2024-04-19 09:44:59 -07:00
Harshavardhana	caac9d216e	remove all the frivolous logs, that may or may not be actionable (#18922 ) for actionable, inspections we have `mc support inspect` we do not need double logging, healing will report relevant errors if any, in terms of quorum lost etc.	2024-01-30 18:11:45 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Harshavardhana	45fb375c41	allow healing to prefer local disks over remote (#17788 )	2023-08-03 02:18:18 -07:00
Kaan Kabalak	21fbe88e1f	Print certain log messages once per error (#17484 )	2023-06-24 20:29:13 -07:00
Anis Eleuch	54c5c88fe6	Add number of offline disks in quorum errors (#16822 )	2023-05-25 09:39:06 -07:00
Harshavardhana	38ccc4f672	fix: make sure to avoid calling RenameData() on disconnected disks. (#14094 ) Large clusters with multiple sets, or multi-pool setups at times might fail and report unexpected "file not found" errors. This can become a problem during startup sequence when some files need to be created at multiple locations. - This PR ensures that we nil the erasure writers such that they are skipped in RenameData() call. - RenameData() doesn't need to "Access()" calls for `.minio.sys` folders they always exist. - Make sure PutObject() never returns ObjectNotFound{} for any errors, make sure it always returns "WriteQuorum" when renameData() fails with ObjectNotFound{}. Return appropriate errors for all other cases.	2022-01-12 18:49:01 -08:00
jiangfucheng	7460fb8349	fix padding error and compatible with uploaded objects (#13803 )	2021-12-03 09:26:30 -08:00
Harshavardhana	ec8d93f756	fix: add missing readTriggerCh close (#12593 )	2021-06-29 08:47:15 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Harshavardhana	d84261aa6d	fix: ensure proper usage of DataDir (#12300 ) - GetObject() should always use a common dataDir to read from when it starts reading, this allows the code in erasure decoding to have sane expectations. - Healing should always heal on the common dataDir, this allows the code in dangling object detection to purge dangling content. These both situations can happen under certain types of retries during PUT when server is restarting etc, some namespace entries might be left over.	2021-05-14 16:50:47 -07:00
Harshavardhana	091845df39	fix: return quorum error upon decode failures (#12184 )	2021-04-29 10:00:03 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	6160188bf3	fix: erasure index based reading based on actual ParityBlocks (#11792 ) in some setups with ordering issues in drive configuration, we should rely on expected parityBlocks instead of `len(disks)/2`	2021-03-15 20:03:13 -07:00
Harshavardhana	e019f21bda	fix: trigger heal if one of the parts are not found (#11358 ) Previously we added heal trigger when bit-rot checks failed, now extend that to support heal when parts are not found either. This healing gets only triggered if we can successfully decode the object i.e read quorum is still satisfied for the object.	2021-01-27 10:21:14 -08:00
Harshavardhana	c4b1d394d6	erasure: avoid io.Copy in hotpaths to reduce allocation (#11213 )	2021-01-03 16:27:34 -08:00
Anis Elleuch	677e80c0f8	xl: Remove check-dir in ReadVersion (#11200 ) The only purpose of check-dir flag in ReadVersion is to return 404 when an object has xl.meta but without data. This is causing an extract call to the disk which can be penalizing in case of busy system where disks receive many concurrent access.	2021-01-02 10:35:57 -08:00
Poorna Krishnamoorthy	1ebf6f146a	Add support for ILM transition (#10565 ) This PR adds transition support for ILM to transition data to another MinIO target represented by a storage class ARN. Subsequent GET or HEAD for that object will be streamed from the transition tier. If PostRestoreObject API is invoked, the transitioned object can be restored for duration specified to the source cluster.	2020-11-19 18:47:17 -08:00
Harshavardhana	2f681bed57	fix: pop entries from each drives in parallel (#9918 )	2020-06-25 23:20:12 -07:00
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	2020-06-12 20:04:01 -07:00
Klaus Post	4a007e3767	Prefer local disks when fetching data blocks (#9563 ) If the requested server is part of the set this will always read from the local disk, even if the disk contains a parity shard. In default setup there is a 50% chance that at least one shard that otherwise would have been fetched remotely will be read locally instead. It basically trades RPC call overhead for reed-solomon. On distributed localhost this seems to be fairly break-even, with a very small gain in throughput and latency. However on networked servers this should be a bigger 1MB objects, before: ``` Operation: GET. Concurrency: 32. Hosts: 4. Requests considered: 76257: * Avg: 25ms 50%: 24ms 90%: 32ms 99%: 42ms Fastest: 7ms Slowest: 67ms * First Byte: Average: 23ms, Median: 22ms, Best: 5ms, Worst: 65ms Throughput: * Average: 1213.68 MiB/s, 1272.63 obj/s (59.948s, starting 14:45:44 CEST) ``` After: ``` Operation: GET. Concurrency: 32. Hosts: 4. Requests considered: 78845: * Avg: 24ms 50%: 24ms 90%: 31ms 99%: 39ms Fastest: 8ms Slowest: 62ms * First Byte: Average: 22ms, Median: 21ms, Best: 6ms, Worst: 57ms Throughput: * Average: 1255.11 MiB/s, 1316.08 obj/s (59.938s, starting 14:43:58 CEST) ``` Bonus fix: Only ask for heal once on an object.	2020-05-26 16:47:23 -07:00
Bala FA	95e89f1712	proactive deep heal object when a bitrot is detected (#9192 )	2020-04-01 12:14:00 -07:00
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00
Krishna Srinivas	730ac5381c	Simplify parallelReader.Read() (#7109 ) Simplify parallelReader.Read() which also fixes previous implementation where it was returning before all the parallel reading go-routines had terminated which caused race conditions.	2019-01-18 21:18:24 +05:30
Krishna Srinivas	98c950aacd	Streaming bitrot verification support (#7004 )	2019-01-17 18:28:18 +05:30
Krishna Srinivas	52f6d5aafc	Rename of structs and methods (#6230 ) Rename of ErasureStorage to Erasure (and rename of related variables and methods)	2018-08-23 23:35:37 -07:00

29 Commits