minio

Commit Graph

Author	SHA1	Message	Date
Anis Elleuch	d2d49f6c6c	xl: Avoid removing directory content in Delete API (#5548 ) Delete & Multi Delete API should not try to remove the directory content. The only permitted case is with zero size object with a trailing slash in its name.	2018-02-20 15:33:26 -08:00
Harshavardhana	fb96779a8a	Add large bucket support for erasure coded backend (#5160 ) This PR implements an object layer which combines input erasure sets of XL layers into a unified namespace. This object layer extends the existing erasure coded implementation, it is assumed in this design that providing > 16 disks is a static configuration as well i.e if you started the setup with 32 disks with 4 sets 8 disks per pack then you would need to provide 4 sets always. Some design details and restrictions: - Objects are distributed using consistent ordering to a unique erasure coded layer. - Each pack has its own dsync so locks are synchronized properly at pack (erasure layer). - Each pack still has a maximum of 16 disks requirement, you can start with multiple such sets statically. - Static sets set of disks and cannot be changed, there is no elastic expansion allowed. - Static sets set of disks and cannot be changed, there is no elastic removal allowed. - ListObjects() across sets can be noticeably slower since List happens on all servers, and is merged at this sets layer. Fixes #5465 Fixes #5464 Fixes #5461 Fixes #5460 Fixes #5459 Fixes #5458 Fixes #5460 Fixes #5488 Fixes #5489 Fixes #5497 Fixes #5496	2018-02-15 17:45:57 -08:00
poornas	4f73fd9487	Unify gateway and object layer. (#5487 ) * Unify gateway and object layer. Bring bucket policies into object layer.	2018-02-09 15:19:30 -08:00
Harshavardhana	0c880bb852	Deprecate and remove in-memory object caching (#5481 ) in-memory caching cannot be cleanly implemented without the access to GC which Go doesn't naturally provide. At times we have seen that object caching is more of an hindrance rather than a boon for our use cases. Removing it completely from our implementation related to #5160 and #5182	2018-02-02 10:17:13 -08:00
Harshavardhana	1ebbc2ce88	Make sure to convert the disk errors to object errors (#5480 ) Fixes a bug introduced in the directory support PR, with this fix s3fs works properly.	2018-02-02 14:04:15 +05:30
Krishna Srinivas	2afd196c83	Quorum based listing for XL (#5475 ) fixes #5380	2018-02-01 10:47:49 -08:00
Harshavardhana	3ea28e9771	Support creating directories on erasure coded backend (#5443 ) This PR continues from #5049 where we started supporting directories for erasure coded backend	2018-01-30 08:13:13 +05:30
poornas	0bb6247056	Move nslocking from s3 layer to object layer (#5382 ) Fixes #5350	2018-01-13 10:04:52 +05:30
kannappanr	20584dc08f	Remove unnecessary errors printed on the console (#5386 ) Some of the errors printed on server console can be removed as those error message is unnecessary. Fixes #5385	2018-01-11 11:42:05 -08:00
Nitish Tiwari	1e5fb4b79a	Fix storage class related issues (#5338 ) - Update startup banner to print storage class in capitals. This makes it easier to identify different storage classes available. - Update response metadata to not send STANDARD storage class. This is in accordance with AWS S3 behaviour. - Update minio-go library to bring in storage class related changes. This is needed to make transparent translation of storage class headers for Minio S3 Gateway.	2018-01-04 11:44:45 +05:30
Harshavardhana	c0721164be	Automatically set goroutines based on shardSize (#5346 ) Update reedsolomon library to enable feature to automatically set number of go-routines based on the input shard size, since shard size is sort of a constant in Minio for objects > 10MiB (default blocksize) klauspost reported around 15-20% improvement in performance numbers on older systems such as AVX and SSE3 ``` name old speed new speed delta Encode10x2x10000-8 5.45GB/s ± 1% 6.22GB/s ± 1% +14.20% (p=0.000 n=9+9) Encode100x20x10000-8 1.44GB/s ± 1% 1.64GB/s ± 1% +13.77% (p=0.000 n=10+10) Encode17x3x1M-8 10.0GB/s ± 5% 12.0GB/s ± 1% +19.88% (p=0.000 n=10+10) Encode10x4x16M-8 7.81GB/s ± 5% 8.56GB/s ± 5% +9.58% (p=0.000 n=10+9) Encode5x2x1M-8 15.3GB/s ± 2% 19.6GB/s ± 2% +28.57% (p=0.000 n=9+10) Encode10x2x1M-8 12.2GB/s ± 5% 15.0GB/s ± 5% +22.45% (p=0.000 n=10+10) Encode10x4x1M-8 7.84GB/s ± 1% 9.03GB/s ± 1% +15.19% (p=0.000 n=9+9) Encode50x20x1M-8 1.73GB/s ± 4% 2.09GB/s ± 4% +20.59% (p=0.000 n=10+9) Encode17x3x16M-8 10.6GB/s ± 1% 11.7GB/s ± 4% +10.12% (p=0.000 n=8+10) ```	2018-01-03 13:47:22 -08:00
Nitish Tiwari	545a9e4a82	Fix storage class related issues (#5322 ) - Add storage class metadata validation for request header - Change storage class header values to be consistent with AWS S3 - Refactor internal method to take only the reqd argument	2017-12-27 10:06:16 +05:30
Nitish Tiwari	1a3dbbc9dd	Add x-amz-storage-class support (#5295 ) This adds configurable data and parity options on a per object basis. To use variable parity - Users can set environment variables to cofigure variable parity - Then add header x-amz-storage-class to putobject requests with relevant storage class values Fixes #4997	2017-12-22 16:58:13 +05:30
Harshavardhana	8efa82126b	Convert errors tracer into a separate package (#5221 )	2017-11-25 11:58:29 -08:00
Harshavardhana	5eb210dd2e	Set etag properly to calculated value if available (#5106 ) Fixes #5100	2017-10-24 12:25:42 -07:00
Harshavardhana	1d8a8c63db	Simplify data verification with HashReader. (#5071 ) Verify() was being called by caller after the data has been successfully read after io.EOF. This disconnection opens a race under concurrent access to such an object. Verification is not necessary outside of Read() call, we can simply just do checksum verification right inside Read() call at io.EOF. This approach simplifies the usage.	2017-10-22 11:00:34 +05:30
Frank Wessels	f598f4fd1b	Fix typo in comment (#5088 )	2017-10-20 15:08:15 +05:30
A. Elleuch	b919462610	fix: Avoid teeing data into a null cache buffer (#5070 ) In some cases, Cache manager returns ErrCacheFull error when creating a new cache buffer but the code still sends object data to nil cache buffer data.	2017-10-18 14:42:10 -07:00
Harshavardhana	0b546ddfd4	Return errors in PutObject()/PutObjectPart() if input size is -1. (#5015 ) Amazon S3 API expects all incoming stream has a content-length set it was superflous for us to support object layer which supports unknown sized stream as well, this PR removes such requirements and explicitly error out if input stream is less than zero.	2017-10-06 09:38:01 -07:00
Andreas Auernhammer	02af37a394	optimize memory allocs during reconstruct (#4964 ) The reedsolomon library now avoids allocations during reconstruction. This change exploits that to reduce memory allocs and GC preasure during healing and reading.	2017-09-27 10:29:42 -07:00
Andreas Auernhammer	79ba4d3f33	refactor ObjectLayer PutObject and PutObjectPart (#4925 ) This change refactor the ObjectLayer PutObject and PutObjectPart functions. Instead of passing an io.Reader and a size to PUT operations ObejectLayer expects an HashReader. A HashReader verifies the MD5 sum (and SHA256 sum if required) of the object. This change updates all all PutObject(Part) calls and removes unnecessary code in all ObjectLayer implementations. Fixes #4923	2017-09-19 12:40:27 -07:00
Harshavardhana	db5af1b126	fix: tests error conditions should be used properly. (#4833 )	2017-08-23 17:58:52 -07:00
Harshavardhana	2e6ee68409	fix: [minor] Avoid unnecessary typecasting. (#4828 ) We don't need to typecast identifiers from their base to type to same type again. This is not a bug and compiler is fine to skip it but it is better to avoid if not needed.	2017-08-18 11:45:16 -07:00
Andreas Auernhammer	85fcee1919	erasure: simplify XL backend operations (#4649 ) (#4758 ) This change provides new implementations of the XL backend operations: - create file - read file - heal file Further this change adds table based tests for all three operations. This affects also the bitrot algorithm integration. Algorithms are now integrated in an idiomatic way (like crypto.Hash). Fixes #4696 Fixes #4649 Fixes #4359	2017-08-14 18:08:42 -07:00
Frank Wessels	46897b1100	Name return values to prevent the need (and unnecessary code bloat) (#4576 ) This is done to explicitly instantiate objects for every return statement.	2017-06-21 19:53:09 -07:00
Anis Elleuch	af8071c86a	xl: Fix rare freeze after many disk/network errors (#4438 ) xl.storageDisks is sometimes passed to some low-level XL functions. Some disks in xl.storageDisks are set to nil when they encounter some errors. This means all elements in xl.storageDisks will be nil after some time which lead to an unusable XL.	2017-06-14 17:14:27 -07:00
Aditya Manthramurthy	8975da4e84	Add new ReadFileWithVerify storage-layer API (#4349 ) This is an enhancement to the XL/distributed-XL mode. FS mode is unaffected. The ReadFileWithVerify storage-layer call is similar to ReadFile with the additional functionality of performing bit-rot checking. It accepts additional parameters for a hashing algorithm to use and the expected hex-encoded hash string. This patch provides significant performance improvement because: 1. combines the step of reading the file (during erasure-decoding/reconstruction) with bit-rot verification; 2. limits the number of file-reads; and 3. avoids transferring the file over the network for bit-rot verification. ReadFile API is implemented as ReadFileWithVerify with empty hashing arguments. Credits to AB and Harsha for the algorithmic improvement. Fixes #4236.	2017-05-16 14:21:52 -07:00
Harshavardhana	155a90403a	fs/erasure: Rename meta 'md5Sum' as 'etag'. (#4319 ) This PR also does backend format change to 1.0.1 from 1.0.0. Backward compatible changes are still kept to read the 'md5Sum' key. But all new objects will be stored with the same details under 'etag'. Fixes #4312	2017-05-14 12:05:51 -07:00
Harshavardhana	298b470f69	fs/erasure: Ignore objects with / even for DeleteObject() (#4303 ) Additionally GetObject() also returns errFileNotFound similar to HeadObject(). Fixes #4302	2017-05-09 14:32:24 -07:00
Bala FA	1c97dcb10a	Add UTCNow() function. (#3931 ) This patch adds UTCNow() function which returns current UTC time. This is equivalent of UTCNow() == time.Now().UTC()	2017-03-18 11:28:41 -07:00
Anis Elleuch	a5e60706a2	xl,fs: Return 404 if object ends with a separator (#3897 ) HEAD Object for FS and XL was returning invalid object name when an object name has a trailing slash separator, this PR changes the behavior and will always return 404 object not found, this guarantees a better compatibility with S3 spec.	2017-03-13 22:20:46 -07:00
Anis Elleuch	a2eae54d11	xl: Respect min. space by checking PrepareFile err (#3867 ) It was possible to upload a big file which overcomes the minimal disk space limit in XL, PrepareFile was actually checking for disk space but we weren't checking its returned error. This patch fixes this behavior.	2017-03-07 14:48:56 -08:00
Anis Elleuch	dce0345f8f	Set disk to nil after write which needs quorum (#3795 ) Ignore a disk which wasn't able to successfully perform an action to avoid eventual perturbations when the disk comes back in the middle of write change.	2017-02-26 11:58:32 -08:00
Harshavardhana	bcc5b6e1ef	xl: Rename getOrderedDisks as shuffleDisks appropriately. (#3796 ) This PR is for readability cleanup - getOrderedDisks as shuffleDisks - getOrderedPartsMetadata as shufflePartsMetadata Distribution is now a second argument instead being the primary input argument for brevity. Also change the usage of type casted int64(0), instead rely on direct type reference as `var variable int64` everywhere.	2017-02-24 09:20:40 -08:00
Harshavardhana	cc28765025	xl/multipart: Make sure to delete temp renamed object. (#3785 ) Existing objects before overwrites are renamed to temp location in completeMultipart. We make sure that we delete it even if subsequenty calls fail. Additionally move verifying of parent dir is a file earlier to fail the entire operation. Ref #3784	2017-02-21 19:43:44 -08:00
Harshavardhana	7ea1de8245	copyObject: Be case sensitive for windows only server. (#3766 ) For case sensitive platforms we should honor case. Fixes #3765 ``` 1) python s3cmd -c s3cfg_localminio put logo.png s3://testbucket/xyz/etc2/logo.PNG 2) python s3cmd -c s3cfg_localminio ls s3://testbucket/xyz/etc2/ 2017-02-18 10:58 22059 s3://testbucket/xyz/etc2/logo.PNG 3) python s3cmd -c s3cfg_localminio cp s3://testbucket/xyz/etc2/logo.PNG s3://testbucket/xyz/etc2/logo.png remote copy: 's3://testbucket/xyz/etc2/logo.PNG' -> 's3://testbucket/xyz/etc2/logo.png' 4) python s3cmd -c s3cfg_localminio ls s3://testbucket/xyz/etc2/ 2017-02-18 10:58 22059 s3://testbucket/xyz/etc2/logo.PNG 2017-02-18 11:10 22059 s3://testbucket/xyz/etc2/logo.png ```	2017-02-18 13:41:59 -08:00
Harshavardhana	22909c849e	objcache: Return io.ReaderAt to avoid Seeking and Reading. (#3735 )	2017-02-11 17:17:58 -08:00
Harshavardhana	6a6c930f5b	xl: Abort multipart upload should honor quorum properly. (#3670 ) Current implementation didn't honor quorum properly and didn't handle the errors generated properly. This patch addresses that and also moves common code `cleanupMultipartUploads` into xl specific private function. Fixes #3665	2017-02-01 11:16:17 -08:00
Harshavardhana	1b30a3be2b	xl/utils: getPartSizeFromIdx should return error. (#3669 )	2017-01-31 15:34:49 -08:00
Anis Elleuch	e9394dc22d	xl PutObject: Split object into parts (#3651 ) For faster time-to-first-byte when we try to download a big object	2017-01-30 15:44:42 -08:00
Krishna Srinivas	b288eaddb3	xl: bit-rot algo was not set in get-object. (#3652 ) fixes #3650	2017-01-30 14:25:28 -08:00
Anis Elleuch	e1bc99e4fe	xl: Fix GET of an empty multiparted object (#3646 ) GetObject returns unsatisfied range error when we try to download an object uploaded using multipart mechanism.	2017-01-27 10:51:02 -08:00
Harshavardhana	51fa4f7fe3	Make PutObject a nop for an object which ends with "/" and size is '0' (#3603 ) This helps majority of S3 compatible applications while not returning an error upon directory create request. Fixes #2965	2017-01-20 16:33:01 -08:00
Harshavardhana	98a6a2bcab	obj: Return objectInfo for CompleteMultipartUpload(). (#3587 ) This patch avoids doing GetObjectInfo() in similar way how we did for PutOject().	2017-01-16 19:23:43 -08:00
Harshavardhana	1c699d8d3f	fs: Re-implement object layer to remember the fd (#3509 ) This patch re-writes FS backend to support shared backend sharing locks for safe concurrent access across multiple servers.	2017-01-16 17:05:00 -08:00
Harshavardhana	69559aa101	objAPI: Implement CopyObject API. (#3487 ) This is written so that to simplify our handler code and provide a way to only update metadata instead of the data when source and destination in CopyObject request are same. Fixes #3316	2016-12-26 16:29:26 -08:00
Harshavardhana	15b4c49621	fs/xl: Simplify bucket metadata reading. (#3486 ) ObjectLayer GetObject() now returns the entire object if starting offset is 0 and length is negative. This also allows to simplify handler layer code where we always had to use GetObjectInfo() before proceeding to read bucket metadata files examples `policy.json`. This also reduces one additional call overhead.	2016-12-21 11:29:32 -08:00
Harshavardhana	faa6b1e925	vendorize deps for snappy, blake2b and sha256 (#3476 ) Bring in new optimization and portability changes. Fixes https://github.com/minio/minio-go/issues/578	2016-12-19 19:32:55 -08:00
Harshavardhana	4daa0d2cee	lock: Moving locking to handler layer. (#3381 ) This is implemented so that the issues like in the following flow don't affect the behavior of operation. ``` GetObjectInfo() .... --> Time window for mutation (no lock held) .... --> Time window for mutation (no lock held) GetObject() ``` This happens when two simultaneous uploads are made to the same object the object has returned wrong info to the client. Another classic example is "CopyObject" API itself which reads from a source object and copies to destination object. Fixes #3370 Fixes #2912	2016-12-10 16:15:12 -08:00
Harshavardhana	b363709c11	caching: Optimize memory allocations. (#3405 ) This change brings in changes at multiple places - Reuse buffers at almost all locations ranging from rpc, fs, xl, checksum etc. - Change caching behavior to disable itself under low memory conditions i.e < 8GB of RAM. - Only objects cached are of size 1/10th the size of the cache for example if 4GB is the cache size the maximum object size which will be cached is going to be 400MB. This change is an optimization to cache more objects rather than few larger objects. - If object cache is enabled default GC percent has been reduced to 20% in lieu with newly found behavior of GC. If the cache utilization reaches 75% of the maximum value GC percent is reduced to 10% to make GC more aggressive. - Do not use bytes.Buffer due to its growth requirements. For every allocation bytes.Buffer allocates an additional buffer for its internal purposes. This is undesirable for us, so implemented a new cappedWriter which is capped to a desired size, beyond this all writes rejected. Possible fix for #3403.	2016-12-08 20:35:07 -08:00
Harshavardhana	ff4ce0ee14	fs/xl: Combine input checks into re-usable functions. (#3383 ) Repeated code around both object layers are moved and combined into simple re-usable functions.	2016-12-01 23:15:17 -08:00
Bala FA	1d4ac4b084	Rename getUUID() into mustGetUUID() (#3320 ) In case of UUID generation failure mustGetUUID() will panic than infinitely trying in for loop.	2016-11-22 16:52:37 -08:00
Harshavardhana	5197649081	utils: reduceErrs returns and validates quorum errors. (#3300 ) This is needed as explained by @krisis Lets say we have following errors. ``` []error{nil, errFileNotFound, errDiskAccessDenied, errDiskAccesDenied} ``` Since the last two errors are filtered, the maximum is nil, depending on map order. Let's say we get nil from reduceErr. Clearly at this point we don't have quorum nodes agreeing about the data and since GetObject only requires N/2 (Read quorum) and isDiskQuorum would have returned true. This is problematic and can lead to undersiable consequences. Fixes #3298	2016-11-21 01:47:26 -08:00
Krishnan Parthasarathi	eed9ab0464	XL: pickValidXLMeta should return error instead of panic'ing (#3277 )	2016-11-20 20:56:44 -08:00
Harshavardhana	0b9f0d14a1	auth/rpc: Take remote disk offline after maximum allowed attempts. (#3288 ) Disks when are offline for a long period of time, we should ignore the disk after trying Login upto 5 times. This is to reduce the network chattiness, this also reduces the overall time spent on `net.Dial`. Fixes #3286	2016-11-20 16:57:12 -08:00
Anis Elleuch	ffbee70e04	Avoid removing 'tmp' directory inside '.minio.sys' (#3294 )	2016-11-20 14:25:43 -08:00
Aditya Manthramurthy	dd0698d14c	Improve namespace lock API: (#3203 ) - abstract out instrumentation information. - use separate lockInstance type that encapsulates the nsMutex, volume, path and opsID as the frontend or top-level lock object.	2016-11-09 10:58:41 -08:00
Anis Elleuch	a47ce7ab22	Add support of fallocate for FS and XL backends (#3032 )	2016-10-29 12:44:44 -07:00
Krishnan Parthasarathi	6fc81dc162	Delete temp object/part when PutObject{,Part} fails (#3004 )	2016-10-19 22:52:03 -07:00
Anis Elleuch	2208992e6a	More informative message when erasure fails to read a part of an object (#2989 )	2016-10-18 13:09:26 -07:00
Harshavardhana	39331b6b4e	xl: GetCheckSumInfo() shouldn't fail if hash not available. (#2984 ) In a multipart upload scenario disks going down and coming backup can lead to certain parts missing on the disk/server which was going down. This is a valid case since these blocks can be missing and should be healed through heal operation. But we are not supposed to fail prematurely since we have enough data on the other disks as well within read-quorum. This fix relaxes previous assumption, fixes a major corruption issue reproduced by @vadmeste. Fixes #2976	2016-10-18 11:13:25 -07:00
Krishnan Parthasarathi	b89609dc2e	XL: Filter out md5Sum from user defined headers (#2962 )	2016-10-17 08:41:33 -07:00
Harshavardhana	fee3f99a6e	xl: heal bucket should validate if bucket exists first. (#2953 ) Fixes #2944	2016-10-17 02:10:23 -07:00
Harshavardhana	f22862aa28	heal: Refactor heal command. (#2901 ) - return errors for heal operation through rpc replies. - implement rotating wheel for healing status. Fixes #2491	2016-10-14 19:57:40 -07:00
Krishna Srinivas	0320a77dc0	HealBucket: create the bucket if it is missing in one of the disks. (#2924 )	2016-10-14 11:12:17 -07:00
Harshavardhana	6494b77d41	server: Add more elaborate startup messages. (#2731 ) These messages based on our prep stage during XL and prints more informative message regarding drive information. This change also does a much needed refactoring.	2016-10-05 12:48:07 -07:00
Krishna Srinivas	61a18ed48f	sha256: Verify sha256 along with md5sum, signature is verified on the request early. (#2813 )	2016-10-02 15:51:49 -07:00
Aditya Manthramurthy	10d2ef5449	Remove comments relating to deprecated MINIO_DEBUG envvar (#2797 )	2016-09-27 18:28:46 -07:00
Krishnan Parthasarathi	669783f875	Purge stale object cache entry (#2770 )	2016-09-23 19:55:28 -07:00
Karthic Rao	8bd78fbdfb	performance: gjson parsing for readXLMeta, listParts, getObjectInfo. (#2631 ) - Using gjson for constructing xlMetaV1{} in realXLMeta. - Test for parsing constructing xlMetaV1{} using gjson. - Changes made since benchmarks showed 30-40% improvement in speed. - Follow up comments in issue https://github.com/minio/minio/issues/2208 for more details. - gjson parsing of parts from xl.json for listParts. - gjson parsing of statInfo from xl.json for getObjectInfo. - Vendorizing gjson dependency.	2016-09-13 21:18:30 -07:00
Krishna Srinivas	b4e4846e9f	PutObject: object layer now returns ObjectInfo instead of md5sum to avoid extra GetObjectInfo call. (#2599 ) From the S3 layer after PutObject we were calling GetObjectInfo for bucket notification. This can be avoided if PutObjectInfo returns ObjectInfo. fixes #2567	2016-09-13 21:18:30 -07:00
Krishna Srinivas	9358ee011b	logging: Print stack trace in case of errors. fixes #1827	2016-09-13 21:18:30 -07:00
Karthic Rao	07d232c7b4	instrumentation: instrumentation for locks. (#2584 ) - Instrumentation for locks. - Detailed test coverage. - Adding RPC control handler to fetch lock instrumentation. - RPC control handlers suite tests with a test RPC server.	2016-09-13 21:18:30 -07:00
Harshavardhana	bccf549463	server: Move all the top level files into cmd folder. (#2490 ) This change brings a change which was done for the 'mc' package to allow for clean repo and have a cleaner github drop in experience.	2016-08-18 16:23:42 -07:00

1 2 3

124 Commits