minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	4550ac6fff	fix: refactor locks to apply them uniquely per node (#11052 ) This refactor is done for few reasons below - to avoid deadlocks in scenarios when number of nodes are smaller < actual erasure stripe count where in N participating local lockers can lead to deadlocks across systems. - avoids expiry routines to run 1000 of separate network operations and routes per disk where as each of them are still accessing one single local entity. - it is ideal to have since globalLockServer per instance. - In a 32node deployment however, each server group is still concentrated towards the same set of lockers that partipicate during the write/read phase, unlike previous minio/dsync implementation - this potentially avoids send 32 requests instead we will still send at max requests of unique nodes participating in a write/read phase. - reduces overall chattiness on smaller setups.	2020-12-10 07:28:37 -08:00
Klaus Post	2294e53a0b	Don't retain context in locker (#10515 ) Use the context for internal timeouts, but disconnect it from outgoing calls so we always receive the results and cancel it remotely.	2020-11-04 08:25:42 -08:00
Harshavardhana	eafa775952	fix: add lock ownership to expire locks (#10571 ) - Add owner information for expiry, locking, unlocking a resource - TopLocks returns now locks in quorum by default, provides a way to capture stale locks as well with `?stale=true` - Simplify the quorum handling for locks to avoid from storage class, because there were challenges to make it consistent across all situations. - And other tiny simplifications to reset locks.	2020-09-25 19:21:52 -07:00
Harshavardhana	83a82d818e	allow lock tolerance to match storage-class drive tolerance (#10270 )	2020-08-14 18:17:14 -07:00
Harshavardhana	d55f4336ae	preserve context per request for local locks (#9828 ) In the Current bug we were re-using the context from previously granted lockers, this would lead to lock timeouts for existing valid read or write locks, leading to premature timeout of locks. This bug affects only local lockers in FS or standalone erasure coded mode. This issue is rather historical as well and was present in lsync for some time but we were lucky to not see it. Similar changes are done in dsync as well to keep the code more familiar Fixes #9827	2020-06-14 07:43:10 -07:00
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	2020-06-12 20:04:01 -07:00
Harshavardhana	febe9cc26a	fix: avoid timer leaks in dsync/lsync (#9781 ) At a customer setup with lots of concurrent calls it can be observed that in newRetryTimer there were lots of tiny alloations which are not relinquished upon retries, in this codepath we were only interested in re-using the timer and use it wisely for each locker. ``` (pprof) top Showing nodes accounting for 8.68TB, 97.02% of 8.95TB total Dropped 1198 nodes (cum <= 0.04TB) Showing top 10 nodes out of 79 flat flat% sum% cum cum% 5.95TB 66.50% 66.50% 5.95TB 66.50% time.NewTimer 1.16TB 13.02% 79.51% 1.16TB 13.02% github.com/ncw/directio.AlignedBlock 0.67TB 7.53% 87.04% 0.70TB 7.78% github.com/minio/minio/cmd.xlObjects.putObject 0.21TB 2.36% 89.40% 0.21TB 2.36% github.com/minio/minio/cmd.(posix).Walk 0.19TB 2.08% 91.49% 0.27TB 2.99% os.statNolog 0.14TB 1.59% 93.08% 0.14TB 1.60% os.(File).readdirnames 0.10TB 1.09% 94.17% 0.11TB 1.25% github.com/minio/minio/cmd.readDirN 0.10TB 1.07% 95.23% 0.10TB 1.07% syscall.ByteSliceFromString 0.09TB 1.03% 96.27% 0.09TB 1.03% strings.(Builder).grow 0.07TB 0.75% 97.02% 0.07TB 0.75% path.(lazybuf).append ```	2020-06-08 11:28:40 -07:00
Harshavardhana	6de410a0aa	fix: possiblity of double write lockers on same resource (#9616 ) To avoid this issue with refCounter refactor the code such that - locker() always increases refCount upon success - unlocker() always decrements refCount upon success (as a special case removes the resource if the refCount is zero) By these two assumptions we are able to see that we are never granted two write lockers in any situation. Thanks to @vcabbage for writing a nice reproducer.	2020-05-18 17:33:35 -07:00
Harshavardhana	b730bd1396	fix: possible race in FS local lockMap (#9598 )	2020-05-14 23:59:07 -07:00
Harshavardhana	f44cfb2863	use GlobalContext whenever possible (#9280 ) This change is throughout the codebase to ensure that all codepaths honor GlobalContext	2020-04-09 09:30:02 -07:00
Harshavardhana	ab7d3cd508	fix: Speed up multi-object delete by taking bulk locks (#8974 ) Change distributed locking to allow taking bulk locks across objects, reduces usually 1000 calls to 1. Also allows for situations where multiple clients sends delete requests to objects with following names ``` {1,2,3,4,5} ``` ``` {5,4,3,2,1} ``` will block and ensure that we do not fail the request on each other.	2020-02-21 11:29:57 +05:30
Harshavardhana	347b29d059	Implement bucket expansion (#8509 )	2019-11-19 17:42:27 -08:00
Harshavardhana	e9b2bf00ad	Support MinIO to be deployed on more than 32 nodes (#8492 ) This PR implements locking from a global entity into a more localized set level entity, allowing for locks to be held only on the resources which are writing to a collection of disks rather than a global level. In this process this PR also removes the top-level limit of 32 nodes to an unlimited number of nodes. This is a precursor change before bring in bucket expansion.	2019-11-13 12:17:45 -08:00
Harshavardhana	4e63e0e372	Return appropriate errors API versions changes across REST APIs (#8480 ) This PR adds code to appropriately handle versioning issues that come up quite constantly across our API changes. Currently we were also routing our requests wrong which sort of made it harder to write a consistent error handling code to appropriately reject or honor requests. This PR potentially fixes issues - old mc is used against new minio release which is incompatible returns an appropriate for client action. - any older servers talking to each other, report appropriate error - incompatible peer servers should report error and reject the calls with appropriate error	2019-11-04 09:30:59 -08:00
Harshavardhana	d48fd6fde9	Remove unusued params and functions (#8399 )	2019-10-15 18:35:41 -07:00
Harshavardhana	36e12a6038	Assume local endpoints appropriately in k8s deployments (#8375 ) On Kubernetes/Docker setups DNS resolves inappropriately sometimes where there are situations same endpoints with multiple disks come online indicating either one of them is local and some of them are not local. This situation can never happen and its only a possibility in orchestrated deployments with dynamic DNS. Following code ensures that we treat if one of the endpoint says its local for a given host it is true for all endpoints for the same host. Following code ensures that this assumption is true and it works in all scenarios and it is safe to assume for a given host. This PR also adds validation such that we do not crash the server if there are bugs in the endpoints list in dsync initialization. Thanks to Daniel Valdivia <hola@danielvaldivia.com> for reproducing this, this fix is needed as part of the https://github.com/minio/m3 project.	2019-10-10 10:14:17 +05:30
Krishna Srinivas	2ab0681c0c	Do not ignore Lock()'s return value (#8142 )	2019-08-28 16:12:57 -07:00
Krishna Srinivas	338e9a9be9	Put object client disconnect (#7824 ) Fail putObject and postpolicy in case client prematurely disconnects Use request's context to cancel lock requests on client disconnects	2019-06-28 22:09:17 -07:00
Harshavardhana	2c0b3cadfc	Update go mod with sem versions of our libraries (#7687 )	2019-05-29 16:35:12 -07:00
kannappanr	d2f42d830f	Lock: Use REST API instead of RPC (#7469 ) In distributed mode, use REST API to acquire and manage locks instead of RPC. RPC has been completely removed from MinIO source. Since we are moving from RPC to REST, we cannot use rolling upgrades as the nodes that have not yet been upgraded cannot talk to the ones that have been upgraded. We expect all minio processes on all nodes to be stopped and then the upgrade process to be completed. Also force http1.1 for inter-node communication	2019-04-17 23:16:27 -07:00
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00
Krishna Srinivas	ef791764e0	Do no access nsLockMap.lockMap when using dsync (#7464 ) There is no need to access nsLockMap.lockMap when using dsync	2019-04-02 12:27:20 -07:00
Harshavardhana	df35d7db9d	Introduce staticcheck for stricter builds (#7035 )	2019-02-13 18:29:36 +05:30
kannappanr	ce870466ff	Top Locks command implementation (#7052 ) API to list locks used in distributed XL mode	2019-01-24 07:22:14 -08:00
Pontus Leitzler	df60b3c733	Remove unnecessary contexts passed as data to FatalIf. No need to log an empty context. (#6487 )	2018-09-21 16:04:11 -07:00
Harshavardhana	13fbb96736	Hold locks granularly in nslockMap (#6242 ) With benchmarks increases the performance for small files by almost 4x times the previous releases.	2018-08-06 08:55:25 +05:30
Harshavardhana	556a51120c	Deprecate ListLocks and ClearLocks (#6233 ) No locks are ever left in memory, we also have a periodic interval of clearing stale locks anyways. The lock instrumentation was not complete and was seldom used. Deprecate this for now and bring it back later if it is really needed. This also in-turn seems to improve performance slightly.	2018-08-02 23:09:42 +05:30
Harshavardhana	28d526bc68	Change CriticalIf to FatalIf for proper error message (#6040 ) During startup until the object layer is initialized logger is disabled to provide for a cleaner UI error message. CriticalIf is disabled, use FatalIf instead. Also never call os.Exit(1) on running servers where you can return error to client in handlers.	2018-06-14 10:17:07 -07:00
Bala FA	6a53dd1701	Implement HTTP POST based RPC (#5840 ) Added support for new RPC support using HTTP POST. RPC's arguments and reply are Gob encoded and sent as HTTP request/response body. This patch also removes Go RPC based implementation.	2018-06-06 14:21:56 +05:30
kannappanr	f8a3fd0c2a	Create logger package and rename errorIf to LogIf (#5678 ) Removing message from error logging Replace errors.Trace with LogIf	2018-04-05 15:04:40 -07:00
Harshavardhana	fb96779a8a	Add large bucket support for erasure coded backend (#5160 ) This PR implements an object layer which combines input erasure sets of XL layers into a unified namespace. This object layer extends the existing erasure coded implementation, it is assumed in this design that providing > 16 disks is a static configuration as well i.e if you started the setup with 32 disks with 4 sets 8 disks per pack then you would need to provide 4 sets always. Some design details and restrictions: - Objects are distributed using consistent ordering to a unique erasure coded layer. - Each pack has its own dsync so locks are synchronized properly at pack (erasure layer). - Each pack still has a maximum of 16 disks requirement, you can start with multiple such sets statically. - Static sets set of disks and cannot be changed, there is no elastic expansion allowed. - Static sets set of disks and cannot be changed, there is no elastic removal allowed. - ListObjects() across sets can be noticeably slower since List happens on all servers, and is merged at this sets layer. Fixes #5465 Fixes #5464 Fixes #5461 Fixes #5460 Fixes #5459 Fixes #5458 Fixes #5460 Fixes #5488 Fixes #5489 Fixes #5497 Fixes #5496	2018-02-15 17:45:57 -08:00
Harshavardhana	f3f09ed14e	Fix a bug in dsync initialization and communication (#5428 ) In current implementation we used as many dsync clients as per number of endpoints(along with path) which is not the expected implementation. The implementation of Dsync was expected to be just for the endpoint Host alone such that if you have 4 servers and each with 4 disks we need to only have 4 dsync clients and 4 dsync servers. But we currently had 8 clients, servers which in-fact is unexpected and should be avoided. This PR brings the implementation back to its original intention. This issue was found #5160	2018-01-22 10:25:10 -08:00
ebozduman	24d9d7e5fa	Removes logrus package and refactors logging messages (#5293 ) This fix removes logrus package dependency and refactors the console logging as the only logging mechanism by removing file logging support. It rearranges the log message format and adds stack trace information whenever trace information is not available in the error structure. It also adds `--json` flag support for server logging. When minio server is started with `--json` flag, all log messages are displayed in json format, with no start-up and informational log messages. Fixes #5265 #5220 #5197	2018-01-17 07:24:46 -08:00
poornas	0bb6247056	Move nslocking from s3 layer to object layer (#5382 ) Fixes #5350	2018-01-13 10:04:52 +05:30
Krishna Srinivas	14e6c5ec08	Simplify the steps to make changes to config.json (#5186 ) This change introduces following simplified steps to follow during config migration. ``` // Steps to move from version N to version N+1 // 1. Add new struct serverConfigVN+1 in config-versions.go // 2. Set configCurrentVersion to "N+1" // 3. Set serverConfigCurrent to serverConfigVN+1 // 4. Add new migration function (ex. func migrateVNToVN+1()) in config-migrate.go // 5. Call migrateVNToVN+1() from migrateConfig() in config-migrate.go // 6. Make changes in config-current_test.go for any test change ```	2017-11-29 13:12:47 -08:00
Harshavardhana	3d0dced23c	Remove go1.9 specific code for windows (#5033 ) Following fix https://go-review.googlesource.com/#/c/41834/ has been merged upstream and released with go1.9.	2017-10-13 15:31:15 +05:30
Frank Wessels	61e0b1454a	Add support for timeouts for locks (#4377 )	2017-08-31 14:43:59 -07:00
Aditya Manthramurthy	986aa8fabf	Bypass network in lock requests to local server (#4465 ) This makes lock RPCs similar to other RPCs where requests to the local server bypass the network. Requests to the local lock-subsystem may bypass the network layer and directly access the locking data-structures. This incidentally fixes #4451.	2017-06-05 12:25:04 -07:00
Frank	cae4683971	Make clearing of stale debug lock info independent of deleting map entry of lock itself. (#4353 ) This is believed to address issue #4337 where stale information for debug locks in shown.	2017-05-16 07:19:17 -07:00
Bala FA	de204a0a52	Add extensive endpoints validation (#4019 )	2017-04-11 15:44:27 -07:00
Harshavardhana	1b1b9e4801	lock/rpc: change rpcPath to be called serviceEndpoint. (#4088 ) This is a cleanup to ensure proper naming.	2017-04-11 10:25:21 -07:00
Bala FA	2df8160f6a	server: handle command line and env variables at one place. (#3975 )	2017-03-30 11:21:19 -07:00
Bala FA	d3cb79a57c	Refactor logger (#3924 ) This patch fixes below * Previously fatalIf() never writes log other than first logging target. * quiet flag is not honored to show progress messages other than startup messages. * Removes console package usage for progress messages.	2017-03-23 16:36:00 -07:00
Harshavardhana	34d9a6b46a	Make sure client initializes to proper lock RPC path. (#3763 ) Fixes a regression introduced in previous commit.	2017-02-18 02:52:11 -08:00
Harshavardhana	1c699d8d3f	fs: Re-implement object layer to remember the fd (#3509 ) This patch re-writes FS backend to support shared backend sharing locks for safe concurrent access across multiple servers.	2017-01-16 17:05:00 -08:00
Harshavardhana	08b6cfb082	ssl: Set a global boolean to enable SSL across Minio (#3558 ) We have been using `isSSL()` everywhere we can set a global value once and re-use it again.	2017-01-11 13:59:51 -08:00
Bala.FA	6d10f4c19a	Adopt dsync interface changes and major cleanup on RPC server/client. * Rename GenericArgs to AuthRPCArgs * Rename GenericReply to AuthRPCReply * Remove authConfig.loginMethod and add authConfig.ServiceName * Rename loginServer to AuthRPCServer * Rename RPCLoginArgs to LoginRPCArgs * Rename RPCLoginReply to LoginRPCReply * Version and RequestTime are added to LoginRPCArgs and verified by server side, not client side. * Fix data race in lockMaintainence loop.	2017-01-02 20:57:42 +05:30
Krishnan Parthasarathi	36fd317eb2	Clean up lock-instrumentation and improve comments (#3499 ) - Add a lockStat type to group counters - Remove unnecessary helper functions - Fix stats computation on force unlock - Removed unnecessary checks and cleaned up comments	2016-12-26 10:29:55 -08:00
Bala FA	e8ce3b64ed	Generate and use access/secret keys properly (#3498 )	2016-12-26 10:21:23 -08:00
Harshavardhana	4daa0d2cee	lock: Moving locking to handler layer. (#3381 ) This is implemented so that the issues like in the following flow don't affect the behavior of operation. ``` GetObjectInfo() .... --> Time window for mutation (no lock held) .... --> Time window for mutation (no lock held) GetObject() ``` This happens when two simultaneous uploads are made to the same object the object has returned wrong info to the client. Another classic example is "CopyObject" API itself which reads from a source object and copies to destination object. Fixes #3370 Fixes #2912	2016-12-10 16:15:12 -08:00

1 2

75 Commits