minio

Commit Graph

Author	SHA1	Message	Date
Klaus Post	05aebc52c2	feat: Implement listing version 3.0 (#12605 ) Co-authored-by: Harshavardhana <harsha@minio.io>	2021-07-05 15:34:41 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Harshavardhana	81d5688d56	move the dependency to minio/pkg for common libraries (#12397 )	2021-05-28 15:17:01 -07:00
Harshavardhana	ebf75ef10d	fix: remove all unused code (#12360 )	2021-05-24 09:28:19 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	a8e4f64ff3	Revert "fix: remove persistence layer for metacache store in memory (#11538 )" This reverts commit `b23659927c`.	2021-02-24 22:24:51 -08:00
Harshavardhana	b23659927c	fix: remove persistence layer for metacache store in memory (#11538 ) store the cache in-memory instead of disks to avoid large write amplifications for list heavy workloads, store in memory instead and let it auto expire.	2021-02-24 15:51:41 -08:00
Andreas Auernhammer	d4b822d697	pkg/etag: add new package for S3 ETag handling (#11577 ) This commit adds a new package `etag` for dealing with S3 ETags. Even though ETag is often viewed as MD5 checksum of an object, handling S3 ETags correctly is a surprisingly complex task. While it is true that the ETag corresponds to the MD5 for the most basic S3 API operations, there are many exceptions in case of multipart uploads or encryption. In worse, some S3 clients expect very specific behavior when it comes to ETags. For example, some clients expect that the ETag is a double-quoted string and fail otherwise. Non-AWS compliant ETag handling has been a source of many bugs in the past. Therefore, this commit adds a dedicated `etag` package that provides functionality for parsing, generating and converting S3 ETags. Further, this commit removes the ETag computation from the `hash` package. Instead, the `hash` package (i.e. `hash.Reader`) should focus only on computing and verifying the content-sha256. One core feature of this commit is to provide a mechanism to communicate a computed ETag from a low-level `io.Reader` to a high-level `io.Reader`. This problem occurs when an S3 server receives a request and has to compute the ETag of the content. However, the server may also wrap the initial body with several other `io.Reader`, e.g. when encrypting or compressing the content: ``` reader := Encrypt(Compress(ETag(content))) ``` In such a case, the ETag should be accessible by the high-level `io.Reader`. The `etag` provides a mechanism to wrap `io.Reader` implementations such that the `ETag` can be accessed by a type-check. This technique is applied to the PUT, COPY and Upload handlers.	2021-02-23 12:31:53 -08:00
Harshavardhana	b3c56b53fb	fix: metacache should only rename entries during cleanup (#11503 ) To avoid large delays in metacache cleanup, use rename instead of recursive delete calls, renames are cheaper move the content to minioMetaTmpBucket and then cleanup this folder once in 24hrs instead. If the new cache can replace an existing one, we should let it replace since that is currently being saved anyways, this avoids pile up of 1000's of metacache entires for same listing calls that are not necessary to be stored on disk.	2021-02-11 10:22:03 -08:00
Krishnan Parthasarathi	b87fae0049	Simplify PutObjReader for plain-text reader usage (#11470 ) This change moves away from a unified constructor for plaintext and encrypted usage. NewPutObjReader is simplified for the plain-text reader use. For encrypted reader use, WithEncryption should be called on an initialized PutObjReader. Plaintext: func NewPutObjReader(rawReader hash.Reader) PutObjReader The hash.Reader is used to provide payload size and md5sum to the downstream consumers. This is different from the previous version in that there is no need to pass nil values for unused parameters. Encrypted: func WithEncryption(encReader hash.Reader, key crypto.ObjectKey) (*PutObjReader, error) This method sets up encrypted reader along with the key to seal the md5sum produced by the plain-text reader (already setup when NewPutObjReader was called). Usage: ``` pReader := NewPutObjReader(rawReader) // ... other object handler code goes here // Prepare the encrypted hashed reader pReader, err = pReader.WithEncryption(encReader, objEncKey) ```	2021-02-10 08:52:50 -08:00
Klaus Post	9b10118d34	Metacache add abs entry limit (#11483 ) Add an absolute limit to the number of metacaches for a bucket. Delete excess caches if they haven't been handed out in an hour.	2021-02-08 11:36:16 -08:00
Harshavardhana	8bb580abfc	fix: use getObjectNInfo to avoid bytes.Buffer usage (#11428 ) few places were still using legacy call GetObject() which was mainly designed for client response writer, use GetObjectNInfo() for internal calls instead.	2021-02-05 09:57:30 -08:00
Harshavardhana	1debd722b5	rename last remaining Zone->Pool	2021-01-26 20:47:42 -08:00
Harshavardhana	5982965839	fix: re-use bytes.Buffer using sync.Pool (#11156 )	2020-12-22 23:22:37 -08:00
Harshavardhana	f714840da7	add _MINIO_SERVER_DEBUG env for enabling debug messages (#11128 )	2020-12-17 16:52:47 -08:00
Harshavardhana	b390a2a0b9	fix: reuser timers in erasure set hotpaths (#11106 ) reuser timers in - connectDisks() monitoring - healMRFRoutine() channel timeouts	2020-12-16 14:33:05 -08:00
Klaus Post	f6fb27e8f0	Don't copy interesting ids, clean up logging (#11102 ) When searching the caches don't copy the ids, instead inline the loop. ``` Benchmark_bucketMetacache_findCache-32 19200 63490 ns/op 8303 B/op 5 allocs/op Benchmark_bucketMetacache_findCache-32 20338 58609 ns/op 111 B/op 4 allocs/op ``` Add a reasonable, but still the simplistic benchmark. Bonus - make nicer zero alloc logging	2020-12-14 13:13:33 -08:00
Klaus Post	82e2be4239	metacache: Speed up cleanup operation (#11078 ) Perform cleanup operations on copied data. Avoids read locking data while determining which caches to keep. Also, reduce the log(NN) operation to log(NM) where M caches with the same root or below when checking potential replacements.	2020-12-10 12:30:28 -08:00
Klaus Post	e65ed2e44f	listcache: Add path index (#11063 ) Add a root path index. ``` Before: Benchmark_bucketMetacache_findCache-32 10000 730737 ns/op With excluded prints: Benchmark_bucketMetacache_findCache-32 10000 207100 ns/op With the root path: Benchmark_bucketMetacache_findCache-32 705765 1943 ns/op ``` Benchmark used (not linear): ```Go func Benchmark_bucketMetacache_findCache(b *testing.B) { bm := newBucketMetacache("", false) for i := 0; i < b.N; i++ { bm.findCache(listPathOptions{ ID: mustGetUUID(), Bucket: "", BaseDir: "prefix/" + mustGetUUID(), Prefix: "", FilterPrefix: "", Marker: "", Limit: 0, AskDisks: 0, Recursive: false, Separator: slashSeparator, Create: true, CurrentCycle: 0, OldestCycle: 0, }) } } ``` Replaces #11058	2020-12-09 08:37:43 -08:00
Harshavardhana	4ec45753e6	rename server sets to server pools	2020-12-01 13:50:33 -08:00
Klaus Post	990d074f7d	metacache: Allow prefix filtering (#10920 ) Do listings with prefix filter when bloom filter is dirty. This will forward the prefix filter to the lister which will make it only scan the folders/objects with the specified prefix. If we have a clean bloom filter we try to build a more generally useful cache so in that case, we will list all objects/folders.	2020-11-18 10:44:18 -08:00
Harshavardhana	ca88ca753c	ignore typed errors correctly in list cache layer (#10879 ) bonus write bucket metadata cache with enough quorum Possible fix for #10868	2020-11-12 09:28:56 -08:00
Klaus Post	0724205f35	metacache: Add option for life extension (#10837 ) Add `MINIO_API_EXTEND_LIST_CACHE_LIFE` that will extend the life of generated caches for a while. This changes caches to remain valid until no updates have been received for the specified time plus a fixed margin. This also changes the caches from being invalidated when the first set finishes until the last set has finished plus the specified time has passed.	2020-11-05 11:49:56 -08:00
Klaus Post	bd77f29fc4	Don't replace caches that are receiving updates (#10834 ) Keep caches while they are receiving updates. Move update code to separate function.	2020-11-05 07:34:08 -08:00
Klaus Post	f0819cce75	Keep transient lists while they are updating (#10826 ) On extremely long running listings keep the transient list 15 minutes after last update instead of using start time. Also don't do overlap checks on transient lists.	2020-11-04 08:01:33 -08:00
Klaus Post	b9277c8030	metacache: Add trashcan (#10820 ) Add trashcan that keeps recently updated lists after bucket deletion. All caches were deleted once a bucket was deleted, so caches still running would report errors. Now they are canceled. Fix `.minio.sys` not being transient.	2020-11-03 12:47:52 -08:00
Klaus Post	fe9f23e632	Recreate bucket metacache if corrupted (#10800 ) If bucket metadata cannot be read, clean up existing and create a new.	2020-10-31 10:26:16 -07:00
Harshavardhana	5e5cdc581d	remove unnecessary logging and move to log once (#10798 ) the current master logs way too much when a node is down, instead log once and move on.	2020-10-30 14:55:50 -07:00
Klaus Post	6135f072d2	Fix invalidated metacaches (#10784 ) * Fix caches having EOF marked as a failure. * Simplify cache updates. * Provide context for checkMetacacheState failures. * Log 499 when the client disconnects.	2020-10-30 09:33:16 -07:00
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	2020-10-28 09:18:35 -07:00

30 Commits