Commit Graph

285 Commits

Author SHA1 Message Date
Harshavardhana
b52a3e523c Avoid using fastjson parser pool, move back to jsoniter (#8190)
It looks like from implementation point of view fastjson
parser pool doesn't behave the same way as expected
when dealing many `xl.json` from multiple disks.

The fastjson parser pool usage ends up returning incorrect
xl.json entries for checksums, with references pointing
to older entries. This led to the subtle bug where checksum
info is duplicated from a previous xl.json read of a different
file from different disk.
2019-09-06 04:21:27 +05:30
Harshavardhana
9ca7470ccc
Avoid using jsoniter, move to fastjson (#8063)
This is to avoid using unsafe.Pointer type
code dependency for MinIO, this causes
crashes on ARM64 platforms

Refer #8005 collection of runtime crashes due
to unsafe.Pointer usage incorrectly. We have
seen issues like this before when using
jsoniter library in the past.

This PR hopes to fix this using fastjson
2019-08-19 08:35:52 -10:00
Harshavardhana
e6d8e272ce
Use const slashSeparator instead of "/" everywhere (#8028) 2019-08-06 12:08:58 -07:00
Harshavardhana
ac82798d0a Remove uneeded calls on FS (#7967) 2019-07-24 15:59:13 +05:30
Krishnan Parthasarathi
559a59220e Add initial support for bucket lifecycle (#7563)
This PR is based off @sinhaashish's PR for object lifecycle
management, which includes support only for,
- Expiration of object
- Filter using object prefix (_not_ object tags)

N B the code for actual expiration of objects will be included in a
subsequent PR.
2019-07-19 21:20:33 +01:00
Krishna Srinivas
338e9a9be9 Put object client disconnect (#7824)
Fail putObject  and postpolicy in case client prematurely disconnects
Use request's context to cancel lock requests on client disconnects
2019-06-28 22:09:17 -07:00
Harshavardhana
38224a4c1a Ignore errors reading fs.json (#7777) 2019-06-12 16:42:03 -07:00
Anis Elleuch
7abadfccc2 Add self-healing feature (#7604)
- Background Heal routine receives heal requests from a channel, either to
heal format, buckets or objects
- Daily sweeper lists all objects in all buckets, these objects
don't necessarly have read quorum so they can be removed if
these objects are unhealable
- Heal daily ops receives objects from the daily sweeper
and send them to the heal routine.
2019-06-08 22:14:07 -07:00
Harshavardhana
2c0b3cadfc Update go mod with sem versions of our libraries (#7687) 2019-05-29 16:35:12 -07:00
Anis Elleuch
9c90a28546 Implement bulk delete (#7607)
Bulk delete at storage level in Multiple Delete Objects API

In order to accelerate bulk delete in Multiple Delete objects API,
a new bulk delete is introduced in storage layer, which will accept
a list of objects to delete rather than only one. Consequently,
a new API is also need to be added to Object API.
2019-05-13 12:25:49 -07:00
Praveen raj Mani
d9a7f80f68 Remove duplicate checkPutObjectArgs in PutObject and (#7396)
Fixes #7384
2019-05-13 10:12:06 -07:00
Harshavardhana
64998fc4ab Remove delayIsLeaf requirement simplify ListObjects further (#7593) 2019-05-02 10:36:57 +05:30
Harshavardhana
f767a2538a
Optimize listing with leaf check offloaded to posix (#7541)
Other listing optimizations include

- remove double sorting while filtering object entries
- improve error message when upload-id is not in quorum
- use jsoniter for full unmarshal json, instead of gjson
- remove unused code
2019-04-23 14:54:28 -07:00
Harshavardhana
620e462413 Implement S3-HDFS gateway (#7440)
- [x] Support bucket and regular object operations
- [x] Supports Select API on HDFS
- [x] Implement multipart API support
- [x] Completion of ListObjects support
2019-04-17 09:52:08 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Harshavardhana
0188009c7e Expose total and available disk space (#7453) 2019-04-05 09:51:50 +05:30
Harshavardhana
c184038b6a Add proper custom errors object creations (#7387)
In scenario 1

```
- bucket/object-prefix
- bucket/object-prefix/object
```

Server responds with `XMinioParentIsObject`

In scenario 2

```
- bucket/object-prefix/object
- bucket/object-prefix
```

Server responds with `XMinioObjectExistsAsDirectory`

Fixes #6566
2019-03-20 13:06:53 -07:00
Anis Elleuch
facbd653ba Add normal/deep type of heal scanning (#7251)
Healing scan used to read all objects parts to check for bitrot
checksum. This commit will add a quicker way of healing scan
by only checking if parts are actually present in disks or not.
2019-03-14 13:08:51 -07:00
Harshavardhana
7079abc931 Implement HealObjects API to simplify healing (#7351) 2019-03-13 17:35:09 -07:00
Anis Elleuch
b05825ffe8 s3: Fix precondition failed in CopyObjectPart when src is encrypted (#7276)
CopyObject precondition checks into GetObjectReader
in order to perform SSE-C pre-condition checks using the
last 32 bytes of encrypted ETag rather than the decrypted
ETag

This also necessitates moving precondition checks for
gateways to gateway layer rather than object handler check
2019-03-06 12:38:41 -08:00
kannappanr
c57159a0fe
fs mode: List already existing buckets with capital letters (#7244)
if a bucket with `Captialized letters` is created, `InvalidBucketName` error
will be returned. 
In the case of pre-existing buckets, it will be listed.

Fixes #6938
2019-03-05 10:42:32 -08:00
poornas
8022a6efd9 Return ETag for 0-byte object prefixes (#7291)
Fixes: #7290
2019-02-26 15:09:14 -08:00
Harshavardhana
df35d7db9d Introduce staticcheck for stricter builds (#7035) 2019-02-13 18:29:36 +05:30
Harshavardhana
082f777281 Revamp bucket metadata healing (#7208)
Bucket metadata healing in the current code was executed multiple
times each time for a given set. Bucket metadata just like
objects are hashed in accordance with its name on any given set,
to allow hashing to play a role we should let the top level
code decide where to navigate.

Current code also had 3 bucket metadata files hardcoded, whereas
we should make it generic by listing and navigating the .minio.sys
to heal such objects.

We also had another bug where due to isObjectDangling changes
without pre-existing bucket metadata files, we were erroneously
reporting it as grey/corrupted objects.

This PR fixes all of the above items.
2019-02-11 09:23:13 +05:30
poornas
40b8d11209 Move metadata into ObjectOptions for NewMultipart and PutObject (#7060) 2019-02-09 11:01:06 +05:30
Harshavardhana
30135eed86 Redo how to handle stale dangling files (#7171)
foo.CORRUPTED should never be created because when
multiple sets are involved we would hash the file
to wrong a location, this PR removes the code.

But allows DeleteBucket() to work properly to delete
dangling buckets/objects. Also adds another option
to Healing where a user needs to specify `--remove`
such that all dangling objects will be deleted with
user confirmation.
2019-02-05 17:58:48 -08:00
poornas
5a80cbec2a Add double encryption at S3 gateway. (#6423)
This PR adds pass-through, single encryption at gateway and double
encryption support (gateway encryption with pass through of SSE
headers to backend).

If KMS is set up (either with Vault as KMS or using
MINIO_SSE_MASTER_KEY),gateway will automatically perform
single encryption. If MINIO_GATEWAY_SSE is set up in addition to
Vault KMS, double encryption is performed.When neither KMS nor
MINIO_GATEWAY_SSE is set, do a pass through to backend.

When double encryption is specified, MINIO_GATEWAY_SSE can be set to
"C" for SSE-C encryption at gateway and backend, "S3" for SSE-S3
encryption at gateway/backend or both to support more than one option.

Fixes #6323, #6696
2019-01-05 14:16:42 -08:00
Anis Elleuch
632022971b s3: Don't set NextMarker when listing is not truncated (#7012)
Setting NextMarker when IsTruncated is not set seems to be confusing
AWS C++ SDK, this commit will avoid setting any string in NextMarker.
2018-12-20 13:30:25 -08:00
poornas
f6980c4630 fix ConfigSys and NotificationSys initialization for NAS (#6920) 2018-12-05 14:03:42 -08:00
Nitish Tiwari
2a810c7da2
Improve du thread performance (#6849) 2018-11-26 10:35:14 +05:30
poornas
5f6d717b7a Fix: Preserve MD5Sum for SSE encrypted objects (#6680)
To conform with AWS S3 Spec on ETag for SSE-S3 encrypted objects,
encrypt client sent MD5Sum and store it on backend as ETag.Extend
this behavior to SSE-C encrypted objects.
2018-11-14 17:36:41 -08:00
kannappanr
c872c1f1dc
Return default ETag if fs.json is empty (#6787) 2018-11-09 10:34:59 -08:00
Harshavardhana
54ae364def Introduce STS client grants API and OPA policy integration (#6168)
This PR introduces two new features

- AWS STS compatible STS API named AssumeRoleWithClientGrants

```
POST /?Action=AssumeRoleWithClientGrants&Token=<jwt>
```

This API endpoint returns temporary access credentials, access
tokens signature types supported by this API

  - RSA keys
  - ECDSA keys

Fetches the required public key from the JWKS endpoints, provides
them as rsa or ecdsa public keys.

- External policy engine support, in this case OPA policy engine

- Credentials are stored on disks
2018-10-09 14:00:01 -07:00
Praveen raj Mani
c7722fbb1b Simplify pkg mimedb (#6549)
Content-Type resolution can now use a function `TypeByExtension(extension)` 
to resolve to the respective content-type.
2018-10-02 11:48:17 +05:30
Praveen raj Mani
ce9d36d954 Add object compression support (#6292)
Add support for streaming (golang/LZ77/snappy) compression.
2018-09-28 09:06:17 +05:30
poornas
ed703c065d Add ObjectOptions to GetObjectNInfo (#6533) 2018-09-27 15:36:45 +05:30
Anis Elleuch
aa4e2b1542 Use GetObjectNInfo in CopyObject and CopyObjectPart (#6489) 2018-09-25 12:39:46 -07:00
Aditya Manthramurthy
36e51d0cee Add GetObjectNInfo to object layer (#6449)
The new call combines GetObjectInfo and GetObject, and returns an
object with a ReadCloser interface.

Also adds a number of end-to-end encryption tests at the handler
level.
2018-09-20 19:22:09 -07:00
poornas
5c0b98abf0 Add ObjectOptions to ObjectLayer calls (#6382) 2018-09-10 09:42:43 -07:00
Harshavardhana
4487f70f08 Revert all GetObjectNInfo related PRs (#6398)
* Revert "Encrypted reader wrapped in NewGetObjectReader should be closed (#6383)"

This reverts commit 53a0bbeb5b.

* Revert "Change SelectAPI to use new GetObjectNInfo API (#6373)"

This reverts commit 5b05df215a.

* Revert "Implement GetObjectNInfo object layer call (#6290)"

This reverts commit e6d740ce09.
2018-08-31 13:10:12 -07:00
Aditya Manthramurthy
e6d740ce09 Implement GetObjectNInfo object layer call (#6290)
This combines calling GetObjectInfo and GetObject while returning a
io.ReadCloser for the object's body. This allows the two operations to
be under a single lock, fixing a race between getting object info and
reading the object body.
2018-08-27 15:28:23 +05:30
Krishna Srinivas
52f6d5aafc Rename of structs and methods (#6230)
Rename of ErasureStorage to Erasure (and rename of related variables and methods)
2018-08-23 23:35:37 -07:00
Harshavardhana
1ffa6adcd4 Ignore io.EOF returned by ReadFrom for zero byte fs.json (#6346)
Fixes #6256
2018-08-24 11:34:21 +05:30
Harshavardhana
556a51120c Deprecate ListLocks and ClearLocks (#6233)
No locks are ever left in memory, we also
have a periodic interval of clearing stale locks
anyways. The lock instrumentation was not complete
and was seldom used.

Deprecate this for now and bring it back later if
it is really needed. This also in-turn seems to improve
performance slightly.
2018-08-02 23:09:42 +05:30
Harshavardhana
ad86454580 Make sure to handle FaultyDisks in listing ops (#6204)
Continuing from PR 157ed65c35

Our posix.go implementation did not handle  I/O errors
properly on the disks, this led to situations where
top-level callers such as ListObjects might return early
without even verifying all the available disks.

This commit tries to address this in Kubernetes, drbd/nbd based
persistent volumes which can disconnect under load and
result in the situations with disks return I/O errors.

This commit also simplifies listing operation, listing
never returns any error. We can avoid this since we pretty
much ignore most of the errors anyways. When objects are
accessed directly we return proper errors.
2018-07-27 15:32:19 -07:00
Anis Elleuch
be1700f595 Avoid startup abort when a notify target is down (#6126)
Minio server was preventing itself to start when any notification
target is down and not running. The PR changes the behavior by
avoiding startup abort in that case, so the user will still
be able to access Minio server using mc admin commands after
a restart or set config commands.
2018-07-10 07:20:31 +05:30
wd256
25f9b0bc3b Handle ListObjectsV2 start-after parameter in ObjectLayer (#6078) 2018-07-01 09:52:45 +05:30
Harshavardhana
e5e522fc61
docs: fix all Chinese doc links for the new docs site (#6097)
Additionally fix typos, default to US locale words
2018-06-28 16:02:02 -07:00
Harshavardhana
de251483d1 Avoid ticker timer to simplify disk usage (#6101)
This PR simplifies the code to avoid tracking
any running usage events. This PR also brings
in an upper threshold of upto 1 minute suspend
the usage function after which the usage would
proceed without waiting any longer.
2018-06-28 15:05:45 -07:00
Praveen raj Mani
ea76e72054 Incorrect error message for insufficient volume fix (#6099)
Reply back with appropriate error message when the server is spawn
with volume of insufficient size (< 1GiB).

Fixes #5993.
2018-06-28 12:01:05 -07:00
Harshavardhana
25de775560 disable disk-usage when export is root mount path (#6091)
disk usage crawling is not needed when a tenant
is not sharing the same disk for multiple other
tenants. This PR adds an optimization when we
see a setup uses entire disk, we simply rely on
statvfs() to give us total usage.

This PR also additionally adds low priority
scheduling for usage check routine, such that
other go-routines blocked will be automatically
unblocked and prioritized before usage.
2018-06-27 18:59:38 -07:00
Harshavardhana
abf209b1dd load bucket policies using object layer API (#6084)
This PR fixes an issue during gateway mode
where underlying policies were not translated
into meaningful policies.
2018-06-27 12:29:48 +05:30
Nitish Tiwari
ad79c626c6
Throw 404 for head requests for prefixes without trailing "/" (#5966)
Minio server returns 403 (access denied) for head requests to prefixes
without trailing "/", this is different from S3 behaviour. S3 returns
404 in such cases.

Fixes #6080
2018-06-26 06:54:00 +05:30
Ashish Kumar Sinha
0bbdd02a57 Updating disk storage for FS/Erasure mode (#6081)
Updating the disk storage stats for FS/Erasure coded backend
2018-06-25 10:46:48 -07:00
Harshavardhana
6fb0604502 Allow usage check to be configurable (#6006) 2018-06-04 18:35:41 -07:00
Harshavardhana
000e360196 Deprecate showing drive capacity and total free (#5976)
This addresses a situation that we shouldn't be
displaying Total/Free anymore, instead we should simply
show the total usage.
2018-05-23 17:30:25 -07:00
Harshavardhana
e6ec645035 Implement support for calculating disk usage per tenant (#5969)
Fixes #5961
2018-05-23 15:41:29 +05:30
Bala FA
4eb788df79 rename checkPathValid() to getValidPath() (#5949) 2018-05-17 07:27:07 -07:00
Krishna Srinivas
bb34bd91f1 Fix unnecessary log messages to avoid flooding the logs (#5900) 2018-05-09 01:38:27 -07:00
Anis Elleuch
6d5f2a4391 Better support of empty directories (#5890)
Better support of HEAD and listing of zero sized objects with trailing
slash (a.k.a empty directory). For that, isLeafDir function is added
to indicate if the specified object is an empty directory or not. Each
backend (xl, fs) has the responsibility to store that information.
Currently, in both of XL & FS, an empty directory is represented by
an empty directory in the backend.

isLeafDir() checks if the given path is an empty directory or not,
since dir listing is costly if the latter contains too many objects,
readDirN() is added in this PR to list only N number of entries.
In isLeadDir(), we will only list one entry to check if a directory
is empty or not.
2018-05-09 01:38:21 -07:00
Anis Elleuch
32700fca52 Enhance fatal errors printing of common issues seen by users (#5878) 2018-05-08 19:04:36 -07:00
Harshavardhana
c98d8cb1c7 fs: fix a regression allow reading of existing files (#5889) 2018-05-07 17:00:44 -07:00
kannappanr
fe126de98b
Regenerate fs.json if it is corrupted in FS mode (#5778)
Also return a default e-tag for pre-existing objects.
Fixes #5712
2018-04-24 17:36:43 -07:00
Bala FA
0d52126023 Enhance policy handling to support SSE and WORM (#5790)
- remove old bucket policy handling
- add new policy handling
- add new policy handling unit tests

This patch brings support to bucket policy to have more control not
limiting to anonymous.  Bucket owner controls to allow/deny any rest
API.

For example server side encryption can be controlled by allowing
PUT/GET objects with encryptions including bucket owner.
2018-04-24 15:53:30 -07:00
Harshavardhana
ccdb7bc286 Fix s3 compatibility fixes for getBucketLocation,headBucket,deleteBucket (#5842)
- getBucketLocation
- headBucket
- deleteBucket

Should return 404 or NoSuchBucket even for invalid bucket names, invalid
bucket names are only validated during MakeBucket operation
2018-04-24 08:57:33 +05:30
kannappanr
cef992a395
Remove error package and cause functions (#5784) 2018-04-10 09:36:37 -07:00
Harshavardhana
217fb470a7 Add a check to check if disk is writable (#5662)
This check is a pre-emptive check to return
error early before we attempt to use the disk
for any other operations later.

refer #5645
2018-04-10 09:26:09 +05:30
Harshavardhana
1d31ad499f Make sure to re-load reference format after HealFormat (#5772)
This PR introduces ReloadFormat API call at objectlayer
to facilitate this. Previously we repurposed HealFormat
but we never ended up updating our reference format on
peers.

Fixes #5700
2018-04-09 22:55:41 +05:30
kannappanr
f8a3fd0c2a
Create logger package and rename errorIf to LogIf (#5678)
Removing message from error logging
Replace errors.Trace with LogIf
2018-04-05 15:04:40 -07:00
Krishna Srinivas
804a4f9c15 Fix backend format for disk-cache - not to use FS format.json (#5732) 2018-03-29 14:38:26 -07:00
poornas
af024a9c69 Remove deadcode related to multipart cleanup for fs (#5716)
The cleanup code is no longer needed as we moved to lockfree 
multipart backend for fs
2018-03-29 08:26:52 +05:30
poornas
a3e806ed61 Add disk based edge caching support. (#5182)
This PR adds disk based edge caching support for minio server.

Cache settings can be configured in config.json to take list of disk drives,
cache expiry in days and file patterns to exclude from cache or via environment
variables MINIO_CACHE_DRIVES, MINIO_CACHE_EXCLUDE and MINIO_CACHE_EXPIRY

Design assumes that Atime support is enabled and the list of cache drives is
fixed.
 - Objects are cached on both GET and PUT/POST operations.
 - Expiry is used as hint to evict older entries from cache, or if 80% of cache
   capacity is filled.
 - When object storage backend is down, GET, LIST and HEAD operations fetch
   object seamlessly from cache.

Current Limitations
 - Bucket policies are not cached, so anonymous operations are not supported in
   offline mode.
 - Objects are distributed using deterministic hashing among list of cache
   drives specified.If one or more drives go offline, or cache drive
   configuration is altered - performance could degrade to linear lookup.

Fixes #4026
2018-03-28 14:14:06 -07:00
poornas
76d1e8bbcd change fs.json format to include checksum fields (#5685) 2018-03-27 17:23:10 -07:00
Bala FA
3ebe61abdf Quick support to server level WORM (#5602)
This is a trival fix to support server level WORM.  The feature comes
with an environment variable `MINIO_WORM`.

Usage:
```
$ export MINIO_WORM=on
$ minio server endpoint
```
2018-03-27 16:44:45 -07:00
Krishna Srinivas
e452377b24 Add context to the object-interface methods.
Make necessary changes to xl fs azure sia
2018-03-15 16:28:25 -07:00
Krishna Srinivas
9083bc152e Flat multipart backend implementation for Erasure backend (#5447) 2018-03-15 13:55:23 -07:00
Bala FA
0e4431725c make notification as separate package (#5294)
* Remove old notification files

* Add net package

* Add event package

* Modify minio to take new notification system
2018-03-15 13:03:41 -07:00
Harshavardhana
29ef7d29e4 Fix deadlock in in-place CopyObject decryption/encryption (#5637)
In-place decryption/encryption already holds write
locks on them, attempting to acquire a read lock would
fail.
2018-03-12 13:52:38 -07:00
Harshavardhana
7aaf01eb74 Save ETag when updating metadata (#5626)
Fixes #5622
2018-03-09 10:50:39 -08:00
Harshavardhana
52eea7b9c1
Support SSE-C multipart source objects in CopyObject (#5603)
Current code didn't implement the logic to support
decrypting encrypted multiple parts, this PR fixes
by supporting copying encrypted multipart objects.
2018-03-02 17:24:02 -08:00
Anis Elleuch
120b061966 Add multipart support in SSE-C encryption (#5576)
*) Add Put/Get support of multipart in encryption
*) Add GET Range support for encryption
*) Add CopyPart encrypted support
*) Support decrypting of large single PUT object
2018-03-01 11:37:57 -08:00
Harshavardhana
7cc678c653 Support encryption for CopyObject, GET-Range requests (#5544)
- Implement CopyObject encryption support
- Handle Range GETs for encrypted objects

Fixes #5193
2018-02-23 15:07:21 -08:00
Harshavardhana
0ea54c9858 Change CopyObject{Part} to single srcInfo argument (#5553)
Refactor such that metadata and etag are
combined to a single argument `srcInfo`.

This is a precursor change for #5544 making
it easier for us to provide encryption/decryption
functions.
2018-02-21 14:18:47 +05:30
poornas
25107c2e11 Add NAS gateway support (#5516) 2018-02-20 12:21:12 -08:00
Harshavardhana
fb96779a8a Add large bucket support for erasure coded backend (#5160)
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.

This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.

Some design details and restrictions:

- Objects are distributed using consistent ordering
  to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
  properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
  requirement, you can start with multiple
  such sets statically.
- Static sets set of disks and cannot be
  changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
  changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
  slower since List happens on all servers,
  and is merged at this sets layer.

Fixes #5465
Fixes #5464
Fixes #5461
Fixes #5460
Fixes #5459
Fixes #5458
Fixes #5460
Fixes #5488
Fixes #5489
Fixes #5497
Fixes #5496
2018-02-15 17:45:57 -08:00
Harshavardhana
91101b11bb Converge repeated code to common deleteBucketMetadata() (#5508) 2018-02-12 18:34:30 -08:00
poornas
4f73fd9487 Unify gateway and object layer. (#5487)
* Unify gateway and object layer. Bring bucket policies into
object layer.
2018-02-09 15:19:30 -08:00
Harshavardhana
033cfb5cef Remove stale code from minio server (#5479) 2018-01-31 18:28:28 -08:00
Krishna Srinivas
3b2486ebaf Lock free multipart backend implementation for FS (#5401) 2018-01-31 13:17:24 -08:00
Harshavardhana
3ea28e9771 Support creating directories on erasure coded backend (#5443)
This PR continues from #5049 where we started supporting
directories for erasure coded backend
2018-01-30 08:13:13 +05:30
Aditya Manthramurthy
a337ea4d11 Move admin APIs to new path and add redesigned heal APIs (#5351)
- Changes related to moving admin APIs
   - admin APIs now have an endpoint under /minio/admin
   - admin APIs are now versioned - a new API to server the version is
     added at "GET /minio/admin/version" and all API operations have the
     path prefix /minio/admin/v1/<operation>
   - new service stop API added
   - credentials change API is moved to /minio/admin/v1/config/credential
   - credentials change API and configuration get/set API now require TLS
     so that credentials are protected
   - all API requests now receive JSON
   - heal APIs are disabled as they will be changed substantially

- Heal API changes
   Heal API is now provided at a single endpoint with the ability for a
   client to start a heal sequence on all the data in the server, a
   single bucket, or under a prefix within a bucket.

   When a heal sequence is started, the server returns a unique token
   that needs to be used for subsequent 'status' requests to fetch heal
   results.

   On each status request from the client, the server returns heal result
   records that it has accumulated since the previous status request. The
   server accumulates upto 1000 records and pauses healing further
   objects until the client requests for status. If the client does not
   request any further records for a long time, the server aborts the
   heal sequence automatically.

   A heal result record is returned for each entity healed on the server,
   such as system metadata, object metadata, buckets and objects, and has
   information about the before and after states on each disk.

   A client may request to force restart a heal sequence - this causes
   the running heal sequence to be aborted at the next safe spot and
   starts a new heal sequence.
2018-01-22 14:54:55 -08:00
Aditya Manthramurthy
aa7e5c71e9 Remove upload healing related dead code (#5404) 2018-01-15 18:20:39 -08:00
Harshavardhana
12f67d47f1 Fix a possible race during PutObject() (#5376)
Under any concurrent removeObjects in progress
might have removed the parents of the same prefix
for which there is an ongoing putObject request.
An inconsistent situation may arise as explained
below even under sufficient locking.

PutObject is almost successful at the last stage when
a temporary file is renamed to its actual namespace
at `a/b/c/object1`. Concurrently a RemoveObject is
also in progress at the same prefix for an `a/b/c/object2`.

To create the object1 at location `a/b/c` PutObject has
to create all the parents recursively.

```
a/b/c - os.MkdirAll loops through has now created
        'a/' and 'b/' about to create 'c/'
a/b/c/object2 - at this point 'c/' and 'object2'
        are deleted about to delete b/
```

Now for os.MkdirAll loop the expected situation is
that top level parent 'a/b/' exists which it created
, such that it can create 'c/' - since removeObject
and putObject do not compete for lock due to holding
locks at different resources. removeObject proceeds
to delete parent 'b/' since 'c/' is not yet present,
once deleted 'os.MkdirAll' would receive an error as
syscall.ENOENT which would fail the putObject request.

This PR tries to address this issue by implementing
a safer/guarded approach where we would retry an operation
such as `os.MkdirAll` and `os.Rename` if both operations
observe syscall.ENOENT.

Fixes #5254
2018-01-13 22:43:02 +05:30
poornas
0bb6247056 Move nslocking from s3 layer to object layer (#5382)
Fixes #5350
2018-01-13 10:04:52 +05:30
kannappanr
20584dc08f
Remove unnecessary errors printed on the console (#5386)
Some of the errors printed on server console can be
removed as those error message is unnecessary.

Fixes #5385
2018-01-11 11:42:05 -08:00
Harshavardhana
490c30f853
erasure: Support cleaning up of stale multipart objects (#5250)
Just like our single directory/disk setup, this PR brings
the functionality to cleanup stale multipart objects
older > 2 weeks.
2017-11-30 18:11:42 -08:00
Harshavardhana
8efa82126b
Convert errors tracer into a separate package (#5221) 2017-11-25 11:58:29 -08:00
Nitish Tiwari
f7b6f7b22f Update getObjectInfo to stat for objects with trailing / (#5179)
Apache Spark sends getObject requests with trailing "/".
This PR updates the getObjectInfo to stat for files
even if they are sent with trailing "/".

Fixes #2965
2017-11-16 16:00:27 -08:00
Harshavardhana
5eb210dd2e Set etag properly to calculated value if available (#5106)
Fixes #5100
2017-10-24 12:25:42 -07:00
Harshavardhana
1d8a8c63db Simplify data verification with HashReader. (#5071)
Verify() was being called by caller after the data
has been successfully read after io.EOF. This disconnection
opens a race under concurrent access to such an object.
Verification is not necessary outside of Read() call,
we can simply just do checksum verification right inside
Read() call at io.EOF.

This approach simplifies the usage.
2017-10-22 11:00:34 +05:30
Harshavardhana
b2cbade477 Support creating empty directories. (#5049)
Every so often we get requirements for creating
directories/prefixes and we end up rejecting
such requirements. This PR implements this and
allows empty directories without any new file
addition to backend.

Existing lower APIs themselves are leveraged to provide
this behavior. Only FS backend supports this for
the time being as desired.
2017-10-16 17:20:54 -07:00
Harshavardhana
3d0dced23c Remove go1.9 specific code for windows (#5033)
Following fix https://go-review.googlesource.com/#/c/41834/ has
been merged upstream and released with go1.9.
2017-10-13 15:31:15 +05:30
Harshavardhana
0b546ddfd4 Return errors in PutObject()/PutObjectPart() if input size is -1. (#5015)
Amazon S3 API expects all incoming stream has a content-length
set it was superflous for us to support object layer which supports
unknown sized stream as well, this PR removes such requirements
and explicitly error out if input stream is less than zero.
2017-10-06 09:38:01 -07:00
Harshavardhana
d3eb5815d9 Avoid DDOS in PutObject() when objectName is '/' and size '0' (#4962)
It can happen that an incoming PutObject() request might
have inputs of following form eg:-

 - bucketName is 'testbucket'
 - objectName is '/'

bucketName exists and was previously created but there
are no other objects in this bucket. In a situation like
this parentDirIsObject() goes into an infinite loop.

Verifying that if '/' is an object fails on both backends
but the resulting `path.Dir('/')` returns `'/'` this causes
the closure to loop onto itself.

Fixes #4940
2017-09-25 14:47:58 -07:00
Andreas Auernhammer
79ba4d3f33 refactor ObjectLayer PutObject and PutObjectPart (#4925)
This change refactor the ObjectLayer PutObject and PutObjectPart
functions. Instead of passing an io.Reader and a size to PUT operations
ObejectLayer expects an HashReader.
A HashReader verifies the MD5 sum (and SHA256 sum if required) of the object.
This change updates all all PutObject(Part) calls and removes unnecessary code
in all ObjectLayer implementations.

Fixes #4923
2017-09-19 12:40:27 -07:00
Frank Wessels
61e0b1454a Add support for timeouts for locks (#4377) 2017-08-31 14:43:59 -07:00
Harshavardhana
1bb9d49eaa fs: ListObjects() was reading ETag at wrong offsets (#4846)
Current code was just using io.ReadAll() on an fd()
which might have moved underneath due to a concurrent
read operation. Subsequent read will result in EOF
We should always seek back and read again. pread()
is allowed on all platforms use io.SectionReader to
read from the beginning of the file.

Fixes #4842
2017-08-23 17:59:14 -07:00
Harshavardhana
d864e00e24 posix: Deprecate custom removeAll/mkdirAll implementations. (#4808)
Since go1.8 os.RemoveAll and os.MkdirAll both support long
path names i.e UNC path on windows. The code we are carrying
was directly borrowed from `pkg/os` package and doesn't need
to be in our repo anymore. As a side affect this also
addresses our codecoverage issue.

Refer #4658
2017-08-12 19:25:43 -07:00
Harshavardhana
3544e5ad01 fs: Fix Shutdown() behavior and handle tests properly. (#4796)
Fixes #4795
2017-08-10 14:11:57 -07:00
Harshavardhana
e7cdd8f02c fs: Avoid non-idempotent code flow in ListBuckets() (#4798)
Under the call flow

```
Readdir
   +
   |
   |
   | path-entry
   |
   |
   v
StatDir
```

Existing code was written in a manner where say
a bucket/top-level directory was indeed deleted
between Readdir() and before StatDir() we would
ignore certain errors. This is not a plausible
situation and might not happen in almost all
practical cases. We do not have to look for
or interpret these errors returned by StatDir()
instead we can just collect the successful
values and return back to the client. We do not
need to pre-maturely decide on bucket access
we just let filesystem decide subsequently for
real I/O operations.

Refer #4658
2017-08-10 13:36:11 -07:00
Krishnan Parthasarathi
75c43bfb6c ListMultipartUploads, ListObjectParts return empty response (#4694)
Also, periodically removes incomplete multipart uploads older than 2 weeks.
2017-08-04 10:45:57 -07:00
Harshavardhana
cc8a8cb877 posix: Check for min disk space and inodes (#4618)
This is needed such that we don't start or
allow writing to a posix disk which doesn't
have minimum total disk space available.

One part fix for #4617
2017-07-10 18:14:48 -07:00
Frank Wessels
46897b1100 Name return values to prevent the need (and unnecessary code bloat) (#4576)
This is done to explicitly instantiate objects for every return statement.
2017-06-21 19:53:09 -07:00
Harshavardhana
353f2d3a6e fs: Hold format.json readLock ref to avoid GC. (#4532)
Looks like if we follow pattern such as

```
_ = rlk
```

Go can potentially kick in GC and close the fd when
the reference is lost, only speculation is that
the cause here is `SetFinalizer` which is set on
`os.close()` internally in `os` stdlib.

This is unexpected and unsual endeavour for Go, but
we have to make sure the reference is never lost
and always dies with the server.

Fixes #4530
2017-06-13 08:29:07 -07:00
Harshavardhana
075b8903d7 fs: Add safe locking semantics for format.json (#4523)
This patch also reverts previous changes which were
merged for migration to the newer disk format. We will
be bringing these changes in subsequent releases. But
we wish to add protection in this release such that
future release migrations are protected.

Revert "fs: Migration should handle bucketConfigs as regular objects. (#4482)"
This reverts commit 976870a391.

Revert "fs: Migrate object metadata to objects directory. (#4195)"
This reverts commit 76f4f20609.
2017-06-12 17:40:28 -07:00
Harshavardhana
976870a391 fs: Migration should handle bucketConfigs as regular objects. (#4482)
Current code failed to anticipate the existence of files
which could have been created to corrupt the namespace such
as `policy.json` file created at the bucket top level.

In the current release creating such as file conflicts
with the namespace for future bucket policy operations.
We implemented migration of backend format to avoid situations
such as these.

This PR handles this situation, makes sure that the
erroneous files should have been moved properly.

Fixes #4478
2017-06-06 12:15:35 -07:00
poornas
18c4e5d357 Enable browser support for gateway (#4425) 2017-06-01 09:43:20 -07:00
Harshavardhana
072fcf3ba6 fs: Make sure to validate bucket first in PutObject() (#4427)
Currently even when bucket doesn't exist we wrongly
return success, when an object is a directory prefix with
 '/' as suffix and is of size 0.

This PR fixes this behavior.
2017-05-25 09:22:43 -07:00
Harshavardhana
155a90403a fs/erasure: Rename meta 'md5Sum' as 'etag'. (#4319)
This PR also does backend format change to 1.0.1
from 1.0.0.  Backward compatible changes are still
kept to read the 'md5Sum' key. But all new objects
will be stored with the same details under 'etag'.

Fixes #4312
2017-05-14 12:05:51 -07:00
Harshavardhana
fa3f6d75b6 fs: Verify if parent is an object before i/o. (#4304)
PutObject() needs to verify and fail.

Fixes #4301
2017-05-09 17:46:46 -07:00
Harshavardhana
298b470f69 fs/erasure: Ignore objects with / even for DeleteObject() (#4303)
Additionally GetObject() also returns errFileNotFound similar
to HeadObject().

Fixes #4302
2017-05-09 14:32:24 -07:00
Harshavardhana
76f4f20609 fs: Migrate object metadata to objects directory. (#4195)
Fixes #3352
2017-05-05 08:49:09 -07:00
Harshavardhana
f0b5c0ec7c windows: Support all REPARSE_POINT attrib files properly. (#4203)
This change adopts the upstream fix in this regard at
https://go-review.googlesource.com/#/c/41834/ for Minio's
purposes.

Go's current os.Stat() lacks support for lot of strange
windows files such as

 - share symlinks on SMB2
 - symlinks on docker nanoserver
 - de-duplicated files on NTFS de-duplicated volume.

This PR attempts to incorporate the change mentioned here

   https://blogs.msdn.microsoft.com/oldnewthing/20100212-00/?p=14963/

The article suggests to use Windows I/O manager to
dereference the symbolic link.

Fixes #4122
2017-05-02 02:35:27 -07:00
Anis Elleuch
14f0047295 fs: Remove fs meta lock when PutObject() fails (#4114)
Removing the fs meta lock file when PutObject() encounters any error
during its execution, such as upload getting permatuerly cancelled
by the client.
2017-04-14 12:06:24 -07:00
Harshavardhana
4747adfcb4 fs: Enable returning ETag along with ListObjects() (#4042)
This is to comply with S3 behavior, we previously removed
reading `fs.json` for optimization reasons but we have a
reason to believe that providing ETag and using gjson
provides needed benefit of not having to deal with
unmarshalling overhead of golang stdlib.

Fixes #4028
2017-04-04 09:14:03 -07:00
Krishnan Parthasarathi
2bd694dbc8 Add disksUnavailable healStatus const (#3990)
`disksUnavailable` healStatus constant indicates that a given object
needs healing but one or more of disks requiring heal are offline. This
can be used by admin heal API consumers to distinguish between a
successful heal and a no-op since the outdated disks were offline.
2017-03-31 17:55:15 -07:00
Krishnan Parthasarathi
051f9bb5c6 Implement list uploads heal admin API (#3885) 2017-03-16 00:15:06 -07:00
Anis Elleuch
a5e60706a2 xl,fs: Return 404 if object ends with a separator (#3897)
HEAD Object for FS and XL was returning invalid object name when
an object name has a trailing slash separator, this PR changes the
behavior and will always return 404 object not found, this guarantees
a better compatibility with S3 spec.
2017-03-13 22:20:46 -07:00
Harshavardhana
47ac410ab0 Code cleanup - simplify server side code. (#3870)
Fix all the issues reported by `gosimple` tool.
2017-03-08 10:00:47 -08:00
Anis Elleuch
79e0b9e69a Relax minio server start when disk threshold is reached and adds space check in FS (#3865)
* fs: Rename tempObjPath variable in fsCreateFile()
* fs/posix: Factor checkDiskFree() function
* fs: Add disk free check in fsCreateFile()
* posix: Move free disk check to createFile()
* xl: Relax free disk check in POSIX initialization
* fs: checkDiskFree checks for space to store data
2017-03-07 12:25:40 -08:00
Harshavardhana
bcc5b6e1ef xl: Rename getOrderedDisks as shuffleDisks appropriately. (#3796)
This PR is for readability cleanup

- getOrderedDisks as shuffleDisks
- getOrderedPartsMetadata as shufflePartsMetadata

Distribution is now a second argument instead being the
primary input argument for brevity.

Also change the usage of type casted int64(0), instead
rely on direct type reference as `var variable int64` everywhere.
2017-02-24 09:20:40 -08:00
Harshavardhana
7ea1de8245 copyObject: Be case sensitive for windows only server. (#3766)
For case sensitive platforms we should honor case.

Fixes #3765

```
1) python s3cmd -c s3cfg_localminio put logo.png s3://testbucket/xyz/etc2/logo.PNG

2) python s3cmd -c s3cfg_localminio ls s3://testbucket/xyz/etc2/
2017-02-18 10:58     22059   s3://testbucket/xyz/etc2/logo.PNG

3) python s3cmd -c s3cfg_localminio cp s3://testbucket/xyz/etc2/logo.PNG s3://testbucket/xyz/etc2/logo.png
remote copy: 's3://testbucket/xyz/etc2/logo.PNG' -> 's3://testbucket/xyz/etc2/logo.png'

4) python s3cmd -c s3cfg_localminio ls s3://testbucket/xyz/etc2/
2017-02-18 10:58     22059   s3://testbucket/xyz/etc2/logo.PNG
2017-02-18 11:10     22059   s3://testbucket/xyz/etc2/logo.png
```
2017-02-18 13:41:59 -08:00
Harshavardhana
50b4e54a75 fs: Do not return reservedBucket names in ListBuckets() (#3754)
Make sure to skip reserved bucket names in `ListBuckets()`
current code didn't skip this properly and also generalize
this behavior for both XL and FS.
2017-02-16 14:52:14 -08:00
Krishna Srinivas
152cdf1c05 fs: Move traceError() to lower functions where possible. (#3633) 2017-01-26 15:40:10 -08:00
Krishna Srinivas
17dd1c19df cleanup: refactor common code between FS and XL listDirFactory. (#3639) 2017-01-26 15:39:22 -08:00
Harshavardhana
dafdc74605 fs: if fs.json is empty ignore it while reading metadata. (#3634)
This is needed so that we don't send wrong errors
on previously failed PutObject() which would have
left a stale `fs.json` entry.
2017-01-26 10:19:07 -08:00
Krishna Srinivas
82373e3d50 fs: cleanup - do not cache size of metafiles (#3630)
* Remove Size() method and size field from lock.LockedFile
* WriteTo method of fsMeta and uploadsV1 now takes concrete type *lock.LockedFile
2017-01-25 12:29:06 -08:00
Harshavardhana
51fa4f7fe3 Make PutObject a nop for an object which ends with "/" and size is '0' (#3603)
This helps majority of S3 compatible applications while not returning
an error upon directory create request.

Fixes #2965
2017-01-20 16:33:01 -08:00
Andrei Kopats
c3f7d1026f fs: start even if there are not enough free space (#3606) 2017-01-20 09:30:20 -08:00
Jeffery Utter
9e1f1b50e0 Don't Check Available Inodes on NFS (#3598)
In some cases (such as with VirutualBox, this value gets hardcoded
to 1000, which is less than the required minimum of 10000.

Fixes #3592
2017-01-19 10:39:44 -08:00
Anis Elleuch
0715032598 heal: Add ListBucketsHeal object API (#3563)
ListBucketsHeal will list which buckets that need to be healed:
  * ListBucketsHeal() (buckets []BucketInfo, err error)
2017-01-19 09:34:18 -08:00
Harshavardhana
62f8343879 Add constants for commonly used values. (#3588)
This is a consolidation effort, avoiding usage
of naked strings in codebase. Whenever possible
use constants which can be repurposed elsewhere.

This also fixes `goconst ./...` reported issues.
2017-01-18 12:24:34 -08:00
Harshavardhana
1c699d8d3f fs: Re-implement object layer to remember the fd (#3509)
This patch re-writes FS backend to support shared backend sharing locks for safe concurrent access across multiple servers.
2017-01-16 17:05:00 -08:00
Harshavardhana
69559aa101 objAPI: Implement CopyObject API. (#3487)
This is written so that to simplify our handler code
and provide a way to only update metadata instead of
the data when source and destination in CopyObject
request are same.

Fixes #3316
2016-12-26 16:29:26 -08:00
Harshavardhana
15b4c49621 fs/xl: Simplify bucket metadata reading. (#3486)
ObjectLayer GetObject() now returns the entire object
if starting offset is 0 and length is negative. This
also allows to simplify handler layer code where
we always had to use GetObjectInfo() before proceeding
to read bucket metadata files examples `policy.json`.

This also reduces one additional call overhead.
2016-12-21 11:29:32 -08:00
Harshavardhana
faa6b1e925 vendorize deps for snappy, blake2b and sha256 (#3476)
Bring in new optimization and portability changes.

Fixes https://github.com/minio/minio-go/issues/578
2016-12-19 19:32:55 -08:00
Harshavardhana
4daa0d2cee lock: Moving locking to handler layer. (#3381)
This is implemented so that the issues like in the
following flow don't affect the behavior of operation.

```
GetObjectInfo()
.... --> Time window for mutation (no lock held)
.... --> Time window for mutation (no lock held)
GetObject()
```

This happens when two simultaneous uploads are made
to the same object the object has returned wrong
info to the client.

Another classic example is "CopyObject" API itself
which reads from a source object and copies to
destination object.

Fixes #3370
Fixes #2912
2016-12-10 16:15:12 -08:00
Harshavardhana
cf17fc7774 fs: PutObject create 0byte objects properly. (#3387)
Current code always appends to a file only if 1byte or
more was sent on the wire was affecting both PutObject
and PutObjectPart uploads.

This patch fixes such a situation and resolves #3385
2016-12-03 11:53:12 -08:00
Harshavardhana
ff4ce0ee14 fs/xl: Combine input checks into re-usable functions. (#3383)
Repeated code around both object layers are moved
and combined into simple re-usable functions.
2016-12-01 23:15:17 -08:00
Harshavardhana
834007728c fs: Do not print redundant md5Sum response header. (#3369)
For both GET and HEAD requests.
2016-11-29 16:47:01 -08:00