minio

mirror of https://github.com/minio/minio.git synced 2024-12-25 22:55:54 -05:00

Author	SHA1	Message	Date
Harshavardhana	75741dbf4a	xl: remove cleanupDir instead use Delete() (#11880 ) use a single call to remove directly at disk instead of doing recursively at network layer.	2021-03-24 09:08:05 -07:00
Anis Elleuch	0eb146e1b2	add additional metrics per disk API latency, API call counts #11250 ) ``` mc admin info --json ``` provides these details, for now, we shall eventually expose this at Prometheus level eventually. Co-authored-by: Harshavardhana <harsha@minio.io>	2021-03-16 20:06:57 -07:00
Klaus Post	3ff5f55dcb	Fetch fileinfo concurrently (#11700 ) For non-erasure setups fetch up to 10 fileinfos concurrently. Fixes #11625	2021-03-08 11:30:43 -08:00
Harshavardhana	9ccc483df6	[feat]: change erasure coding default block size from 10MiB to 1MiB (#11721 ) major performance improvements in range GETs to avoid large read amplification when ranges are tiny and random ``` ------------------- Operation: GET Operations: 142014 -> 339421 Duration: 4m50s -> 4m56s * Average: +139.41% (+1177.3 MiB/s) throughput, +139.11% (+658.4) obj/s * Fastest: +125.24% (+1207.4 MiB/s) throughput, +132.32% (+612.9) obj/s * 50% Median: +139.06% (+1175.7 MiB/s) throughput, +133.46% (+660.9) obj/s * Slowest: +203.40% (+1267.9 MiB/s) throughput, +198.59% (+753.5) obj/s ``` TTFB from 10MiB BlockSize ``` * First Access TTFB: Avg: 81ms, Median: 61ms, Best: 20ms, Worst: 2.056s ``` TTFB from 1MiB BlockSize ``` * First Access TTFB: Avg: 22ms, Median: 21ms, Best: 8ms, Worst: 91ms ``` Full object reads however do see a slight change which won't be noticeable in real world, so not doing any comparisons TTFB still had improvements with full object reads with 1MiB ``` * First Access TTFB: Avg: 68ms, Median: 35ms, Best: 11ms, Worst: 1.16s ``` v/s TTFB with 10MiB ``` * First Access TTFB: Avg: 388ms, Median: 98ms, Best: 20ms, Worst: 4.156s ``` This change should affect all new uploads, previous uploads should continue to work with business as usual. But dramatic improvements can be seen with these changes.	2021-03-06 14:09:34 -08:00
Harshavardhana	76e2713ffe	fix: use buffers only when necessary for io.Copy() (#11229 ) Use separate sync.Pool for writes/reads Avoid passing buffers for io.CopyBuffer() if the writer or reader implement io.WriteTo or io.ReadFrom respectively then its useless for sync.Pool to allocate buffers on its own since that will be completely ignored by the io.CopyBuffer Go implementation. Improve this wherever we see this to be optimal. This allows us to be more efficient on memory usage. ``` 385 // copyBuffer is the actual implementation of Copy and CopyBuffer. 386 // if buf is nil, one is allocated. 387 func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) { 388 // If the reader has a WriteTo method, use it to do the copy. 389 // Avoids an allocation and a copy. 390 if wt, ok := src.(WriterTo); ok { 391 return wt.WriteTo(dst) 392 } 393 // Similarly, if the writer has a ReadFrom method, use it to do the copy. 394 if rt, ok := dst.(ReaderFrom); ok { 395 return rt.ReadFrom(src) 396 } ``` From readahead package ``` // WriteTo writes data to w until there's no more data to write or when an error occurs. // The return value n is the number of bytes written. // Any error encountered during the write is also returned. func (a *reader) WriteTo(w io.Writer) (n int64, err error) { if a.err != nil { return 0, a.err } n = 0 for { err = a.fill() if err != nil { return n, err } n2, err := w.Write(a.cur.buffer()) a.cur.inc(n2) n += int64(n2) if err != nil { return n, err } ```	2021-01-06 09:36:55 -08:00
Harshavardhana	17a5ff51ff	fix: move context timeout closer to network for Delete calls (#10897 ) allowing for disconnects to be limited to the drive themselves instead of disconnecting all drives.	2020-11-13 16:56:45 -08:00
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	2020-10-28 09:18:35 -07:00
Harshavardhana	029758cb20	fix: retain the previous UUID for newly replaced drives (#10759 ) only newly replaced drives get the new `format.json`, this avoids disks reloading their in-memory reference format, ensures that drives are online without reloading the in-memory reference format. keeping reference format in-tact means UUIDs never change once they are formatted.	2020-10-26 10:29:29 -07:00
Harshavardhana	6a8c62f9fd	make sure to preserve UUID from reference format (#10748 ) reference format should be source of truth for inconsistent drives which reconnect, add them back to their original position remove automatic fix for existing offline disk uuids	2020-10-24 13:23:08 -07:00
Harshavardhana	18063bf25c	fix: cleanup old directory handling code (#10633 ) we don't need them anymore, remove legacy code.	2020-10-06 12:03:57 -07:00
Klaus Post	2d58a8d861	Add storage layer contexts (#10321 ) Add context to all (non-trivial) calls to the storage layer. Contexts are propagated through the REST client. - `context.TODO()` is left in place for the places where it needs to be added to the caller. - `endWalkCh` could probably be removed from the walkers, but no changes so far. The "dangerous" part is that now a caller disconnecting will propagate down, so a "delete" operation will now be interrupted. In some cases we might want to disconnect this functionality so the operation completes if it has started, leaving the system in a cleaner state.	2020-09-04 09:45:06 -07:00
Harshavardhana	d19b434ffc	fix: bring back delayed leaf detection in listing (#10346 )	2020-08-25 12:26:48 -07:00
Klaus Post	17a1eda702	Disregard healing disks in crawling (#10349 ) When crawling never use a disk we know is healing. Most of the change involves keeping track of the original endpoint on xlStorage and this also fixes DiskInfo.Endpoint never being populated. Heal master will print `data-crawl: Disk "http://localhost:9001/data/mindev/data2/xl1" is Healing, skipping` once on a cycle (no more often than every 5m).	2020-08-25 10:55:15 -07:00
Harshavardhana	35212b673e	add unformatted disk as part of the error list (#10128 ) these errors should be ignored for quorum error calculation to ensure that we don't prematurely return unformatted disk error as part of API calls	2020-07-24 13:16:11 -07:00
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	2020-06-12 20:04:01 -07:00
Anis Elleuch	9baeda781a	fix storage info output with unordered endpoints arguments (#9610 ) Shuffling arguments that we pass to MinIO server are supported. However, when that happens, Prometheus returns wrong information about disks usage and online/offline status. The commit fixes the issue by avoiding relying on xl.endpoints since it is not ordered.	2020-05-19 14:27:20 -07:00
Harshavardhana	bd032d13ff	migrate all bucket metadata into a single file (#9586 ) this is a major overhaul by migrating off all bucket metadata related configs into a single object '.metadata.bin' this allows us for faster bootups across 1000's of buckets and as well as keeps the code simple enough for future work and additions. Additionally also fixes #9396, #9394	2020-05-19 13:53:54 -07:00
Harshavardhana	6ac48a65cb	fix: use unused cacheMetrics code in prometheus (#9588 ) remove all other unusued/deadcode	2020-05-13 08:15:26 -07:00
Harshavardhana	a1de9cec58	cleanup object-lock/bucket tagging for gateways (#9548 ) This PR is to ensure that we call the relevant object layer APIs for necessary S3 API level functionalities allowing gateway implementations to return proper errors as NotImplemented{} This allows for all our tests in mint to behave appropriately and can be handled appropriately as well.	2020-05-08 13:44:44 -07:00
Harshavardhana	27d716c663	simplify usage of mutexes and atomic constants (#9501 )	2020-05-03 22:35:40 -07:00
Anis Elleuch	0af62d35a0	xl: Implement posix.DeletePrefixes to enhance delete perf (#9100 ) Bulk delete API was using cleanupObjectsBulk() which calls posix listing and delete API to remove objects internal files in the backend (xl.json and parts) one by one. Add DeletePrefixes in the storage API to remove the content of a directory in a single call. Also use a remove goroutine for each disk to accelerate removal.	2020-03-11 08:56:36 -07:00
Harshavardhana	6f66f1a910	close channel upon error in Walk()'er (#9042 )	2020-02-25 19:58:58 -08:00
Harshavardhana	23a8411732	Add a generic Walk()'er to list a bucket, optinally prefix (#9026 ) This generic Walk() is used by likes of Lifecyle, or KMS to rotate keys or any other functionality which relies on this functionality.	2020-02-25 21:22:28 +05:30
Klaus Post	9990464cd5	Fix recursive deep scan of buckets (#8900 )	2020-01-30 17:20:07 +05:30
Harshavardhana	cc02bf0442	Remove old ListenBucketNotification API (#8645 )	2019-12-13 11:33:11 -08:00
Anis Elleuch	555969ee42	Add data usage collect with its new admin API (#8553 ) Admin data usage info API returns the following (Only FS & XL, for now) - Number of buckets - Number of objects - The total size of objects - Objects histogram - Bucket sizes	2019-12-12 06:02:37 -08:00
Nitish Tiwari	3df7285c3c	Add Support for Cache and S3 related metrics in Prometheus endpoint (#8591 ) This PR adds support below metrics - Cache Hit Count - Cache Miss Count - Data served from Cache (in Bytes) - Bytes received from AWS S3 - Bytes sent to AWS S3 - Number of requests sent to AWS S3 Fixes #8549	2019-12-05 23:16:06 -08:00
Harshavardhana	e9b2bf00ad	Support MinIO to be deployed on more than 32 nodes (#8492 ) This PR implements locking from a global entity into a more localized set level entity, allowing for locks to be held only on the resources which are writing to a collection of disks rather than a global level. In this process this PR also removes the top-level limit of 32 nodes to an unlimited number of nodes. This is a precursor change before bring in bucket expansion.	2019-11-13 12:17:45 -08:00
Harshavardhana	9e7a3e6adc	Extend further validation of config values (#8469 ) - This PR allows config KVS to be validated properly without being affected by ENV overrides, rejects invalid values during set operation - Expands unit tests and refactors the error handling for notification targets, returns error instead of ignoring targets for invalid KVS - Does all the prep-work for implementing safe-mode style operation for MinIO server, introduces a new global variable to toggle safe mode based operations NOTE: this PR itself doesn't provide safe mode operations	2019-10-30 23:39:09 -07:00
Krishna Srinivas	980bf78b4d	Detect underlying disk mount/unmount (#8408 )	2019-10-25 10:37:53 -07:00
Harshavardhana	d48fd6fde9	Remove unusued params and functions (#8399 )	2019-10-15 18:35:41 -07:00
Nitish Tiwari	1cd801b2e9	Fix DeleteObjects() to remove renamed objects inside (#8072 )	2019-08-14 11:15:25 -07:00
Harshavardhana	e6d8e272ce	Use const slashSeparator instead of "/" everywhere (#8028 )	2019-08-06 12:08:58 -07:00
Harshavardhana	54eded2e6f	Do not assume all HTTP errors as Network errors (#7983 ) In situations such as when client uploading data, prematurely disconnects from server such as pressing ctrl-c before uploading all the data. Under this situation in distributed setup we prematurely disconnect disks causing a reconnect loop. This has an adverse affect we end up leaving a lot of files in temporary location which ideally should have been cleaned up when Put() prematurely fails. This is also a regression which got introduced in #7610	2019-07-29 14:48:18 -07:00
Krishna Srinivas	a2e904b966	Support any string as delimiter for listing (#7882 )	2019-07-05 14:06:12 -07:00
Anis Elleuch	7abadfccc2	Add self-healing feature (#7604 ) - Background Heal routine receives heal requests from a channel, either to heal format, buckets or objects - Daily sweeper lists all objects in all buckets, these objects don't necessarly have read quorum so they can be removed if these objects are unhealable - Heal daily ops receives objects from the daily sweeper and send them to the heal routine.	2019-06-08 22:14:07 -07:00
Anis Elleuch	9c90a28546	Implement bulk delete (#7607 ) Bulk delete at storage level in Multiple Delete Objects API In order to accelerate bulk delete in Multiple Delete objects API, a new bulk delete is introduced in storage layer, which will accept a list of objects to delete rather than only one. Consequently, a new API is also need to be added to Object API.	2019-05-13 12:25:49 -07:00
Harshavardhana	64998fc4ab	Remove delayIsLeaf requirement simplify ListObjects further (#7593 )	2019-05-02 10:36:57 +05:30
Harshavardhana	f767a2538a	Optimize listing with leaf check offloaded to posix (#7541 ) Other listing optimizations include - remove double sorting while filtering object entries - improve error message when upload-id is not in quorum - use jsoniter for full unmarshal json, instead of gjson - remove unused code	2019-04-23 14:54:28 -07:00
Harshavardhana	620e462413	Implement S3-HDFS gateway (#7440 ) - [x] Support bucket and regular object operations - [x] Supports Select API on HDFS - [x] Implement multipart API support - [x] Completion of ListObjects support	2019-04-17 09:52:08 -07:00
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00
Harshavardhana	396d78352d	Support HTTP/2.0 (#7204 ) Fixes #6704	2019-02-14 17:53:46 -08:00
Krishna Srinivas	81bee93b8d	Move remote disk StorageAPI abstraction from RPC to REST (#6464 )	2018-10-04 17:44:06 -07:00
Anis Elleuch	6d5f2a4391	Better support of empty directories (#5890 ) Better support of HEAD and listing of zero sized objects with trailing slash (a.k.a empty directory). For that, isLeafDir function is added to indicate if the specified object is an empty directory or not. Each backend (xl, fs) has the responsibility to store that information. Currently, in both of XL & FS, an empty directory is represented by an empty directory in the backend. isLeafDir() checks if the given path is an empty directory or not, since dir listing is costly if the latter contains too many objects, readDirN() is added in this PR to list only N number of entries. In isLeadDir(), we will only list one entry to check if a directory is empty or not.	2018-05-09 01:38:21 -07:00
Bala FA	0d52126023	Enhance policy handling to support SSE and WORM (#5790 ) - remove old bucket policy handling - add new policy handling - add new policy handling unit tests This patch brings support to bucket policy to have more control not limiting to anonymous. Bucket owner controls to allow/deny any rest API. For example server side encryption can be controlled by allowing PUT/GET objects with encryptions including bucket owner.	2018-04-24 15:53:30 -07:00
kannappanr	f8a3fd0c2a	Create logger package and rename errorIf to LogIf (#5678 ) Removing message from error logging Replace errors.Trace with LogIf	2018-04-05 15:04:40 -07:00
poornas	a3e806ed61	Add disk based edge caching support. (#5182 ) This PR adds disk based edge caching support for minio server. Cache settings can be configured in config.json to take list of disk drives, cache expiry in days and file patterns to exclude from cache or via environment variables MINIO_CACHE_DRIVES, MINIO_CACHE_EXCLUDE and MINIO_CACHE_EXPIRY Design assumes that Atime support is enabled and the list of cache drives is fixed. - Objects are cached on both GET and PUT/POST operations. - Expiry is used as hint to evict older entries from cache, or if 80% of cache capacity is filled. - When object storage backend is down, GET, LIST and HEAD operations fetch object seamlessly from cache. Current Limitations - Bucket policies are not cached, so anonymous operations are not supported in offline mode. - Objects are distributed using deterministic hashing among list of cache drives specified.If one or more drives go offline, or cache drive configuration is altered - performance could degrade to linear lookup. Fixes #4026	2018-03-28 14:14:06 -07:00
Krishna Srinivas	9ede179a21	Use context.Background() instead of nil Rename Context[Get\|Set] -> [Get\|Set]Context	2018-03-15 16:28:25 -07:00
Krishna Srinivas	9083bc152e	Flat multipart backend implementation for Erasure backend (#5447 )	2018-03-15 13:55:23 -07:00
Bala FA	0e4431725c	make notification as separate package (#5294 ) * Remove old notification files * Add net package * Add event package * Modify minio to take new notification system	2018-03-15 13:03:41 -07:00

1 2

67 Commits