Commit Graph

324 Commits

Author SHA1 Message Date
Harshavardhana 040ac5cad8
fix: when logger queue is full exit quickly upon doneCh (#14928)
Additionally only reload requested sub-system not everything
2022-05-16 16:10:51 -07:00
Anis Elleuch 05685863e3
Cancel old logger/audit targets outside lock (#14927)
When configuring a new target, such as an audit target, the server waits
until all audit events are sent to the audit target before doing the
swap from the old to the new audit target. Therefore current S3 operations
can suffer from this since the audit swap lock will be held.

This behavior is unnecessary as the new audit target can enter in a
functional mode immediately and the old audit will just cancel itself
at its own pace.
2022-05-16 13:32:36 -07:00
Anis Elleuch b0e2c2da78
lifecycle: Support tags with special characters (#14906)
Object tags can have special characters such as whitespace. However
the current code doesn't properly consider those characters while
evaluating the lifecycle document.

ObjectInfo.UserTags contains an url encoded form of object tags
(e.g. key+1=val)

This commit fixes the issue by using the tags package to parse object tags.
2022-05-14 10:25:55 -07:00
Harshavardhana 9341201132
logger lock should be more granular (#14901)
This PR simplifies few things by splitting
the locks between audit, logger targets to
avoid potential contention between them.

any failures inside audit/logger HTTP
targets must only log to console instead
of other targets to avoid cyclical dependency.

avoids unneeded atomic variables instead
uses RWLock to differentiate a more common
read phase v/s lock phase.
2022-05-12 07:20:58 -07:00
Aditya Manthramurthy 83071a3459
Add support for Access Management Plugin (#14875)
- This change renames the OPA integration as Access Management Plugin - there is
nothing specific to OPA in the integration, it is just a webhook.

- OPA configuration is automatically migrated to Access Management Plugin and
OPA specific configuration is marked as deprecated.

- OPA doc is updated and moved.
2022-05-10 17:14:55 -07:00
Harshavardhana 5cffd3780a
fix: multiple fixes in prefix exclude implementation (#14877)
- do not need to restrict prefix exclusions that do not
  have `/` as suffix, relax this requirement as spark may
  have staging folders with other autogenerated characters
  , so we are better off doing full prefix March and skip. 

- multiple delete objects was incorrectly creating a
  null delete marker on a versioned bucket instead of
  creating a proper versioned delete marker.

- do not suspend paths on the excluded prefixes during
  delete operations to avoid creating `null` delete markers,
  honor suspension of versioning only at bucket level for
  delete markers.
2022-05-07 22:06:44 -07:00
Krishnan Parthasarathi ad8e611098
feat: implement prefix-level versioning exclusion (#14828)
Spark/Hadoop workloads which use Hadoop MR 
Committer v1/v2 algorithm upload objects to a 
temporary prefix in a bucket. These objects are 
'renamed' to a different prefix on Job commit. 
Object storage admins are forced to configure 
separate ILM policies to expire these objects 
and their versions to reclaim space.

Our solution:

This can be avoided by simply marking objects 
under these prefixes to be excluded from versioning, 
as shown below. Consequently, these objects are 
excluded from replication, and don't require ILM 
policies to prune unnecessary versions.

-  MinIO Extension to Bucket Version Configuration
```xml
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> 
        <Status>Enabled</Status>
        <ExcludeFolders>true</ExcludeFolders>
        <ExcludedPrefixes>
          <Prefix>app1-jobs/*/_temporary/</Prefix>
        </ExcludedPrefixes>
        <ExcludedPrefixes>
          <Prefix>app2-jobs/*/__magic/</Prefix>
        </ExcludedPrefixes>

        <!-- .. up to 10 prefixes in all -->     
</VersioningConfiguration>
```
Note: `ExcludeFolders` excludes all folders in a bucket 
from versioning. This is required to prevent the parent 
folders from accumulating delete markers, especially
those which are shared across spark workloads 
spanning projects/teams.

- To enable version exclusion on a list of prefixes

```
mc version enable --excluded-prefixes "app1-jobs/*/_temporary/,app2-jobs/*/_magic," --exclude-prefix-marker myminio/test
```
2022-05-06 19:05:28 -07:00
Aditya Manthramurthy e55104a155
Reorganize OpenID config (#14871)
- Split into multiple files
- Remove JSON unmarshaler for Config and providerCfg types (unused)
2022-05-05 13:40:06 -07:00
Klaus Post 111745c564
Add "enable" to config help (#14866)
Most help sections were missing "enable", which means it
is filtered out with `mc admin config get --json`.

Add it where missing.
2022-05-05 04:17:04 -07:00
Aditya Manthramurthy 2b7e75e079
Add OPA doc and remove deprecation marking (#14863) 2022-05-04 23:53:42 -07:00
Anis Elleuch 44a3b58e52
Add audit log for decommissioning (#14858) 2022-05-04 00:45:27 -07:00
Harshavardhana c3f689a7d9
JWKS should be parsed before usage (#14842)
fixes #14811
2022-04-30 15:23:53 -07:00
Aditya Manthramurthy 0e502899a8
Add support for multiple OpenID providers with role policies (#14223)
- When using multiple providers, claim-based providers are not allowed. All
providers must use role policies.

- Update markdown config to allow `details` HTML element
2022-04-28 18:27:09 -07:00
Harshavardhana 5a9a898ba2
allow forcibly creating metadata on buckets (#14820)
introduce x-minio-force-create environment variable
to force create a bucket and its metadata as required,
it is useful in some situations when bucket metadata
needs recovery.
2022-04-27 04:44:07 -07:00
Sidhartha Mani fe1fbe0005
standardize config help defaults (#14788) 2022-04-26 20:11:37 -07:00
Harshavardhana d087e28dce
start using t.SetEnv instead of os.Setenv (#14787) 2022-04-23 15:33:45 -07:00
Klaus Post 96adfaebe1
Make storage class config dynamic (#14791)
Updating the storage class is already thread safe, so we can do this safely.
2022-04-21 12:07:33 -07:00
Aditya Manthramurthy e8e48e4c4a
S3 select switch to new parquet library and reduce locking (#14731)
- This change switches to a new parquet library
- SelectObjectContent now takes a single lock at the beginning and holds it
during the operation. Previously the operation took a lock every time the
parquet library performed a Seek on the underlying object stream.
- Add basic support for LogicalType annotations for timestamps.
2022-04-14 06:54:47 -07:00
Harshavardhana eda34423d7 update gofumpt -w - new changes 2022-04-13 12:00:11 -07:00
Harshavardhana 153a612253
fetch bucket retention config once for ILM evalAction (#14727)
This is mainly an optimization, does not change any
existing functionality.
2022-04-11 13:25:32 -07:00
Anis Elleuch 16431d222c
heal: Enable periodic bitrot scan configuration (#14464) 2022-04-07 08:10:40 -07:00
Andreas Auernhammer 6b1c62133d
listing: improve listing of encrypted objects (#14667)
This commit improves the listing of encrypted objects:
 - Use `etag.Format` and `etag.Decrypt`
 - Detect SSE-S3 single-part objects in a single iteration
 - Fix batch size to `250`
 - Pass request context to `DecryptAll` to not waste resources
   when a client cancels the operation.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-04-04 11:42:03 -07:00
Andreas Auernhammer b9d1698d74
etag: add `Format` and `Decrypt` functions (#14659)
This commit adds two new functions to the
internal `etag` package:
 - `ETag.Format`
 - `Decrypt`

The `Decrypt` function decrypts an encrypted
ETag using a decryption key. It returns not
encrypted / multipart ETags unmodified.

The `Decrypt` function is mainly used when
handling SSE-S3 encrypted single-part objects.
In particular, the ETag of an SSE-S3 encrypted
single-part object needs to be decrypted since
S3 clients expect that this ETag is equal to the
content MD5.

The `ETag.Format` method also covers SSE ETag handling.
MinIO encrypts all ETags of SSE single part objects.
However, only the ETag of SSE-S3 encrypted single part
objects needs to be decrypted.
The ETag of an SSE-C or SSE-KMS single part object
does not correspond to its content MD5 and can be
a random value.
The `ETag.Format` function formats an ETag such that
it is an AWS S3 compliant ETag. In particular, it
returns non-encrypted ETags (single / multipart)
unmodified. However, for encrypted ETags it returns
the trailing 16 bytes as ETag. For encrypted ETags
the last 16 bytes will be a random value.

The main purpose of `Format` is to format ETags
such that clients accept them as well-formed AWS S3
ETags.
It differs from the `String` method since `String`
will return string representations for encrypted
ETags that are not AWS S3 compliant.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-04-03 13:29:13 -07:00
Andreas Auernhammer e955aa7f2a
kes: add support for encrypted private keys (#14650)
This commit adds support for encrypted KES
client private keys.

Now, it is possible to encrypt the KES client
private key (`MINIO_KMS_KES_KEY_FILE`) with
a password.

For example, KES CLI already supports the
creation of encrypted private keys:
```
kes identity new --encrypt --key client.key --cert client.crt MinIO
```

To decrypt an encrypted private key, the password
needs to be provided:
```
MINIO_KMS_KES_KEY_PASSWORD=<password>
```

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-29 09:53:33 -07:00
Harshavardhana ecfae074dc
do not crash when KMS is not enabled (#14634)
KMS when not enabled might crash when listing
an object that previously had SSE-S3 enabled,
fail appropriately in such situations.
2022-03-27 08:54:01 -07:00
Andreas Auernhammer 062f3ea43a
etag: fix incorrect multipart detection (#14631)
This commit fixes a subtle bug in the ETag
`IsEncrypted` implementation.

An encrypted ETag may contain random bytes,
i.e. some randomness used for encryption.
This random value can contain a '-' byte
simple due to being randomly generated.

Before, the `IsEncrypted` implementation
incorrectly assumed that an encrypted ETag
cannot contain a '-' since it would be a
multipart ETag. Multipart ETags have a
16 byte value followed by a '-' and the part number.
For example:
```
059ba80b807c3c776fb3bcf3f33e11ae-2
```

However, the following encrypted ETag
```
20000f00db2d90a7b40782d4cff2b41a7799fc1e7ead25972db65150118dfbe2ba76a3c002da28f85c840cd2001a28a9
```
also contains a '-' byte but is not a multipart ETag.

This commit fixes the `IsEncrypted` implementation
simply by checking whether the ETag is at least 32
bytes long. A valid multipart ETag is never 32 bytes
long since a part number must be <= 10000.

However, an encrypted ETag must be at least 32 bytes
long. It contains the encrypted ETag bytes (16 bytes)
and the authentication tag added by the AEAD cipher (again
16 bytes).

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-25 18:21:01 -07:00
Harshavardhana 5cfedcfe33
askDisks for strict quorum to be equal to read quorum (#14623) 2022-03-25 16:29:45 -07:00
Andreas Auernhammer 4d2fc530d0
add support for SSE-S3 bulk ETag decryption (#14627)
This commit adds support for bulk ETag
decryption for SSE-S3 encrypted objects.

If KES supports a bulk decryption API, then
MinIO will check whether its policy grants
access to this API. If so, MinIO will use
a bulk API call instead of sending encrypted
ETags serially to KES.

Note that MinIO will not use the KES bulk API
if its client certificate is an admin identity.

MinIO will process object listings in batches.
A batch has a configurable size that can be set
via `MINIO_KMS_KES_BULK_API_BATCH_SIZE=N`.
It defaults to `500`.

This env. variable is experimental and may be
renamed / removed in the future.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-25 15:01:41 -07:00
Aditya Manthramurthy 79ba458051
fix: free up reader resources in S3Select properly (#14600) 2022-03-23 20:58:53 -07:00
Avimitin fb9b53026d
Add riscv64 support (#14601)
In riscv64, the `syscall.Uname` function will return a uint8 slice.

    func main() {
      var buf syscall.Utsname
      fmt.Printf("Buffer Type: %T\n", buf.Release)
    }

    output:
      Buffer Type: [65]uint8

This is tested in the Arch Linux RISC-V 64 QEMU environment.

Signed-off-by: Avimitin <avimitin@gmail.com>
2022-03-22 20:36:59 -07:00
Klaus Post 472c2d828c
Fix waitgroup add after wait on config reload (#14584)
Fix `panic: "POST /minio/peer/v21/signalservice?signal=2": sync: WaitGroup is reused before previous Wait has returned`

Log entries already on the channel would cause `logEntry` to increment the
 waitgroup when sending messages, after Cancel has been called.

Instead of tracking every single message, just check the send goroutine. Faster 
and safe, since it will not decrement until the channel is closed.

Regression from #14289
2022-03-19 09:15:45 -07:00
Anis Elleuch b20ecc7b54
Add support of TLS session tickets with KES server (#14577)
Reduce overhead for communication between MinIO server and KES server.
2022-03-18 15:14:10 -07:00
Harshavardhana 43eb5a001c
re-use transport for AdminInfo() call (#14571)
avoids creating new transport for each `isServerResolvable`
request, instead re-use the available global transport and do
not try to forcibly close connections to avoid TIME_WAIT
build upon large clusters.

Never use httpClient.CloseIdleConnections() since that can have
a drastic effect on existing connections on the transport pool.

Remove it everywhere.
2022-03-17 16:20:10 -07:00
Aditya Manthramurthy ce97313fda
Add extra LDAP configuration validation (#14535)
- The result now contains suggestions on fixing common configuration issues.
- These suggestions will subsequently be exposed in console/mc
2022-03-16 19:57:36 -07:00
Harshavardhana ae3b369fe1
logger webhook failure can overrun the queue_size (#14556)
PR introduced in #13819 was incorrect and was not
handling the situation where a buffer is full can
cause incessant amount of logs that would keep the
logger webhook overrun by the requests.

To avoid this only log failures to console logger
instead of all targets as it can cause self reference,
leading to an infinite loop.
2022-03-15 17:45:51 -07:00
Klaus Post c07af89e48
select: Add ScanRange to CSV&JSON (#14546)
Implements https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-ScanRange

Fixes #14539
2022-03-14 09:48:36 -07:00
Aditya Manthramurthy b7ed3b77bd
Indicate required fields in LDAP configuration correctly (#14526) 2022-03-10 19:03:38 -08:00
Poorna 75b925c326
Deprecate root disk for disk caching (#14527)
This PR modifies #14513 to issue a deprecation
warning rather than reject settings on startup.
2022-03-10 18:42:44 -08:00
Harshavardhana 91d419ee6c
warn issues about large block I/O performance for Linux older than 4.0.0 (#14524)
This PR simply adds a warning message when it detects older kernel
versions and warn's them about potential performance issues on this
kernel.

The issue can be seen only with parallel I/O across all drives
on denser setups such as 90 drives or 45 drives per server configurations.
2022-03-10 17:36:13 -08:00
Poorna 7ce91ea1a1
Disallow root disk to be used for cache drives (#14513) 2022-03-10 02:45:31 -08:00
Klaus Post b890bbfa63
Add local disk health checks (#14447)
The main goal of this PR is to solve the situation where disks stop 
responding to operations. This generally causes an FD build-up and 
eventually will crash the server.

This adds detection of hung disks, where calls on disk get stuck.

We add functionality to `xlStorageDiskIDCheck` where it keeps 
track of the number of concurrent requests on a given disk.

A total number of 100 operations are allowed. If this limit is reached 
we will block (but not reject) new requests, but we will monitor the 
state of the disk.

If no requests have been completed or updated within a 15-second 
window, we mark the disk as offline. Requests that are blocked will be 
unblocked and return an error as "faulty disk".

New requests will be rejected until the disk is marked OK again.

Once a disk has been marked faulty, a check will run every 5 seconds that 
will attempt to write and read back a file. As long as this fails the disk will 
remain faulty.

To prevent lots of long-running requests to mark the disk faulty we 
implement a callback feature that allows updating the status as parts 
of these operations are running.

We add a reader and writer wrapper that will update the status of each 
successful read/write operation. This should allow fine enough granularity 
that a slow, but still operational disk will not reach 15 seconds where 
50 operations have not progressed.

Note that errors themselves are not enough to mark a disk faulty. 
A nil (or io.EOF) error will mark a disk as "good".

* Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`.

* de-couple IsOnline() from disk health tracker

The purpose of IsOnline() is to ensure that we
reconnect the drive only when the "drive" was

- disconnected from network we need to validate
  if the drive is "correct" and is the same drive
  which belongs to this server.

- drive was replaced we have to format it - we
  support hot swapping of the drives.

IsOnline() is not meant for taking the drive offline
when it is hung, it is not useful we can let the
drive be online instead "return" errors for relevant
calls.

* return errFaultyDisk for DiskInfo() call

Co-authored-by: Harshavardhana <harsha@minio.io>

Possible future Improvements:

* Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly.
* Allow reads/writes to be aborted by the context.
* Add usage stats, concurrent count, blocked operations, etc.
2022-03-09 11:38:54 -08:00
Klaus Post 7060c809c0
Add authorization header to HEAD requests (#14510)
Add Authorization to network check requests.

Fixes #14507
2022-03-09 10:48:56 -08:00
Harshavardhana 0e3bafcc54
improve logs, fix banner formatting (#14456) 2022-03-03 13:21:16 -08:00
Andreas Auernhammer b48f719b8e
kes: remove unnecessary error conversion (#14459)
This commit removes some duplicate code that
converts KES API errors.

This code was added since KES `0.18.0` changed
some exported API errors. However, the KES SDK
handles this error conversion itself.
Therefore, it is not necessary to duplicate this
behavior in MinIO.

See: 21555fa624/error.go (L94)

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-03 09:42:37 -08:00
Lenin Alevski 289fcbd08c
KES dependency upgrade (#14454)
- Updating KES dependency to v.0.18.0
- Fixing incompatibility issue when checking for errors during KES key creation

Signed-off-by: Lenin Alevski <alevsk.8772@gmail.com>
2022-03-02 23:03:40 -08:00
Harshavardhana f6875bb893
fix: regression from refactor in AMQP notification (#14455)
fixes a regression introduced in #14269 that refactored
the notification registration logic, all the amqp targets
however online will not be available for use anymore.

fixes #14451
2022-03-02 21:35:48 -08:00
Klaus Post b030ef1aca
tests: Clean up dsync package (#14415)
Add non-constant timeouts to dsync package.

Reduce test runtime by minutes. Hopefully not too aggressive.
2022-03-01 11:14:28 -08:00
Klaus Post 88fd1cba71
select: add MISSING operator support (#14406)
Probably not full support, but for regular checks it should work.

Fixes #14358
2022-02-25 12:31:19 -08:00
hellivan 5307e18085
use keycloak_realm properly for keycloak user lookups (#14401)
In case a user-defined a value for the MINIO_IDENTITY_OPENID_KEYCLOAK_REALM 
environment variable, construct the path properly.
2022-02-24 10:16:53 -08:00
Klaus Post 2cea944cdb
select: Allow lower case 'is' (#14405)
Ref: #14358
2022-02-24 09:10:48 -08:00
Shireesh Anjal 3934700a08
Make audit webhook and kafka config dynamic (#14390) 2022-02-24 09:05:33 -08:00
hellivan 0913eb6655
fix: openid config provider not initialized correctly (#14399)
Up until now `InitializeProvider` method of `Config` struct was
implemented on a value receiver which is why changes on `provider`
field where never reflected to method callers. In order to fix this
issue, the method was implemented on a pointer receiver.
2022-02-23 23:42:37 -08:00
Harshavardhana 1bfbe354f5
fix: clientId must be unique for all servers (#14398)
This is a regression from #14037, distributed setups
with MQTT was not working anymore. According to MQTT
spec it is expected this is unique per server.

We shall proceed to use unix nano timestamp hex
value instead here.
2022-02-23 20:19:59 -08:00
Shireesh Anjal 25144fedd5
Send deployment id and minio version in http header (#14378) 2022-02-23 13:36:01 -08:00
Shireesh Anjal 94d37d05e5
Apply dynamic config at sub-system level (#14369)
Currently, when applying any dynamic config, the system reloads and
re-applies the config of all the dynamic sub-systems.

This PR refactors the code in such a way that changing config of a given
dynamic sub-system will work on only that sub-system.
2022-02-22 10:59:28 -08:00
Shireesh Anjal c1437c7b46
allow `config reset api` to work by overloading default values (#14368)
The `LookupConfig` code was not using `GetWithDefault`, because of which
some of the config values were being returned as empty string, and calls
like `strconv.Atoi` and `time.ParseDuration` on these were failing.
2022-02-21 15:50:45 -08:00
Aditya Manthramurthy bc110d8055
fix: mysql notification target table creation (#14350)
Add a generated hash column as the primary key for the key name as 
MySQL does not allow indexes on long VARCHAR columns.
2022-02-18 12:13:49 -08:00
Harshavardhana 65b1a4282e
fix: console logger regression with dynamic logger webhook registration (#14346)
fixes a regression from #14289
2022-02-17 17:50:10 -08:00
Shireesh Anjal 28f188e3ef
Make logger webhook config dynamic (#14289)
It should not be required to restart the 
server after setting the logger webhook config.
2022-02-17 11:11:15 -08:00
Shireesh Anjal 1a5496eced
Add `enable` key to logger webhook help (#14326)
This key is supported by the logger webhook config - but is not returned in the help.
2022-02-16 11:59:50 -08:00
Shireesh Anjal 16939ca192
Mark SUBNET credentials as sensitive (#14320)
So that they are redacted in the health report
2022-02-16 08:40:34 -08:00
Klaus Post 5ec57a9533
Add GetObject gzip option (#14226)
Enabled with `mc admin config set alias/ api gzip_objects=on`

Standard filtering applies (1K response minimum, not compressed content 
type, not range request, gzip accepted by client).
2022-02-14 09:19:01 -08:00
Shireesh Anjal 9890f579f8
Add subsystem level validation on `config set` (#14269)
When setting a config of a particular sub-system, validate the existing
config and notification targets of only that sub-system, so that
existing errors related to one sub-system (e.g. notification target
offline) do not result in errors for other sub-systems.
2022-02-08 10:36:41 -08:00
Shireesh Anjal 3882da6ac5
Add subnet proxy config (#14225)
Will store the HTTP(S) proxy URL to use for connecting to SUBNET.
2022-02-01 09:52:38 -08:00
Harshavardhana c39eb3bacd
fix: possible crash if private.key is empty (#14208)
Before
```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x9f54f7]

goroutine 1 [running]:
crypto/x509.IsEncryptedPEMBlock(...)
	crypto/x509/pem_decrypt.go:105
github.com/minio/minio/internal/config.LoadX509KeyPair({0xc00061e270, 0x0}, {0xc00061e2d0, 0x25})
	github.com/minio/minio/internal/config/certs.go:88 +0xf7
github.com/minio/pkg/certs.(*Manager).AddCertificate(0xc000576150, {0xc00061e270, 0x25}, {0xc00061e2d0, 0x25})
	github.com/minio/pkg@v1.1.15/certs/certs.go:132 +0x368
github.com/minio/pkg/certs.NewManager({0x51f5910, 0xc00053e140}, {0xc00061e270, 0xc000580400}, {0xc00061e2d0, 0x25}, 0x4dc5880)
	github.com/minio/pkg@v1.1.15/certs/certs.go:97 +0x170
github.com/minio/minio/cmd.getTLSConfig()
```

After
```
ERROR Unable to load the TLS configuration: The private key is not readable
      > Please check your certificate
```
2022-01-30 12:55:21 -08:00
Poorna a4be47d7ad
Validate config before saving changes after config reset (#14203) 2022-01-27 18:28:16 -08:00
Aditya Manthramurthy 7dfa565d00
Identity LDAP: Allow multiple search base DNs (#14191)
This change allows the MinIO server to lookup users in different directory
sub-trees by allowing specification of multiple search bases separated by
semicolons.
2022-01-26 15:05:59 -08:00
Poorna 295730408b
Disallow delete replication for tag based rules (#14167) 2022-01-24 15:22:20 -08:00
Harshavardhana 5a9f133491
speed up startup sequence for all operations (#14148)
This speed-up is intended for faster startup times
for almost all MinIO operations. Changes here are

- Drives are not re-read for 'format.json' on a regular
  basis once read during init is remembered and refreshed
  at 5 second intervals.

- Do not do O_DIRECT tests on drives with existing 'format.json'
  only fresh setups need this check.

- Parallelize initializing erasureSets for multiple sets.

- Avoid re-reading format.json when migrating 'format.json'
  from really old V1->V2->V3

- Keep a copy of local drives for any given server in memory
  for a quick lookup.
2022-01-24 11:28:45 -08:00
Anis Elleuch 3e9bd931ed
tests: Remove RPC wording from the code (#14142)
The lock was using net/rpc in the past but it got replaced with a REST API. 
This commit will fix function names/comments to avoid confusion.
2022-01-20 09:36:09 -08:00
Harshavardhana 1a56ebea70
cleanup dsync tests and remove net/rpc references (#14118) 2022-01-18 12:44:38 -08:00
Harshavardhana 70e1cbda21
allow disabling O_DIRECT in certain environments for reads (#14115)
repeated reads on single large objects in HPC like
workloads, need the following option to disable
O_DIRECT for a more effective usage of the kernel
page-cache.

However this optional should be used in very specific
situations only, and shouldn't be enabled on all
servers.

NVMe servers benefit always from keeping O_DIRECT on.
2022-01-17 08:34:14 -08:00
Anis Elleuch b106b1c131
lock: Fix decision when a lock needs to be removed (#14095)
The code was not properly deciding if a lock needs to be removed 
when it doesn't have quorum anymore. After this commit, a lock will be
forcefully unlocked if nodes reporting they are not able to find a lock
internally breaks the quorum.

Simplify the code as well.
2022-01-14 10:33:08 -08:00
Harshavardhana 76b21de0c6
feat: decommission feature for pools (#14012)
```
λ mc admin decommission start alias/ http://minio{1...2}/data{1...4}
```

```
λ mc admin decommission status alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────┐
│ ID  │ Pools                           │ Capacity                         │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Active │
│ 2nd │ http://minio{3...4}/data{1...4} │ 329 GiB (used) / 421 GiB (total) │ Active │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────┘
```

```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
Progress: ===================> [1GiB/sec] [15%] [4TiB/50TiB]
Time Remaining: 4 hours (started 3 hours ago)
```

```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
ERROR: This pool is not scheduled for decommissioning currently.
```

```
λ mc admin decommission cancel alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬──────────┐
│ ID  │ Pools                           │ Capacity                         │ Status   │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining │
└─────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘
```

> NOTE: Canceled decommission will not make the pool active again, since we might have
> Potentially partial duplicate content on the other pools, to avoid this scenario be
> very sure to start decommissioning as a planned activity.

```
λ mc admin decommission cancel alias/ http://minio{1...2}/data{1...4}
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────────────────┐
│ ID  │ Pools                           │ Capacity                         │ Status             │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining(Canceled) │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────────────────┘
```
2022-01-10 09:07:49 -08:00
Aditya Manthramurthy 1981fe2072
Add internal IDP and OIDC users support for site-replication (#14041)
- This allows site-replication to be configured when using OpenID or the
  internal IDentity Provider.

- Internal IDP IAM users and groups will now be replicated to all members of the
  set of replicated sites.

- When using OpenID as the external identity provider, STS and service accounts
  are replicated.

- Currently this change dis-allows root service accounts from being
  replicated (TODO: discuss security implications).
2022-01-06 15:52:43 -08:00
Minio Trusted 76877eb6fa move gofumpt to golang-ci 2022-01-06 13:08:21 -08:00
Harshavardhana 0d3ae3810f
make sure to comply with MQTT spec (#14037)
- keep-alive cannot be 0 by default anymore
- client_id cannot be empty

fixes #13993
2022-01-06 11:25:39 -08:00
Anis Elleuch 9d91d32d82
typo: Low capital in some JSON field names in log/audit output (#14020)
Use a low capital in some fields in JSON log/audit output to follow
other fields names.
2022-01-03 09:26:26 -08:00
Harshavardhana a60ac7ca17
fix: audit log to support object names in multipleObjectNames() handler (#14017) 2022-01-03 01:28:52 -08:00
Harshavardhana f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana 46fd9f4a53 fix: update storage-class properly
fixes #14005
2021-12-28 22:49:06 -08:00
Harshavardhana 9ad6012782
simplify logger time and avoid possible crashes (#13986)
time.Format() is not necessary prematurely for JSON
marshalling, since JSON marshalling indeed defaults
to RFC3339Nano.

This also ensures the 'time' is remembered until its
logged and it is the same time when the 'caller'
invoked 'log' functions.
2021-12-23 15:33:54 -08:00
Harshavardhana 416977436e rename MINIO_CACHE_.._MASTER_KEY to MINIO_CACHE_.._SECRET_KEY
fixes #13975
2021-12-22 12:11:07 -08:00
Klaus Post ebd78e983f
Limit key size to 3K (#13974)
User is reporting `Error 1071 :Specified key was too long,max key 
length is 3072 bytes`.

Regression caused by #13414
2021-12-22 11:41:51 -08:00
Harshavardhana 499872f31d
Add configurable channel queue_size for audit/logger webhook targets (#13819)
Also log all the missed events and logs instead of silently
swallowing the events.

Bonus: Extend the logger webhook to support mTLS
similar to audit webhook target.
2021-12-20 13:16:53 -08:00
Poorna K 111c6177d2
Deprecate caching for erasure/distributed mode (#13909)
Fixes: #13907

Also removing default value of `writethrough` for cache commit
which was interfering with cache_after setting
2021-12-15 16:48:34 -08:00
Klaus Post 91f72f25ab
select: Return early from bool AND, OR (#13914)
Return as soon as an AND fails and whenever an OR succeeds. Faster and more flexible.

For example makes `select * from S3object where _2 != '' AND _2 > 1` able to operate on empty fields.

Followup to #13900
2021-12-15 16:47:21 -08:00
Klaus Post a8d4042853
select: Add IS (NOT) operators (#13906)
Add `IS` and `IS NOT` as comparison operators.

This may be a bit wider than the S3 spec, but we can rather 
easily remove the forwarding.
2021-12-14 09:54:50 -08:00
Krishnan Parthasarathi 44a9339c0a
Newer noncurrent versions (#13815)
- Rename MaxNoncurrentVersions tag to NewerNoncurrentVersions

Note: We apply overlapping NewerNoncurrentVersions rules such that 
we honor the highest among applicable limits. e.g if 2 overlapping rules 
are configured with 2 and 3 noncurrent versions to be retained, we 
will retain 3.

- Expire newer noncurrent versions after noncurrent days
- MinIO extension: allow noncurrent days to be zero, allowing expiry 
  of noncurrent version as soon as more than configured 
  NewerNoncurrentVersions are present.
- Allow NewerNoncurrentVersions rules on object-locked buckets
- No x-amz-expiration when NewerNoncurrentVersions configured
- ComputeAction should skip rules with NewerNoncurrentVersions > 0
- Add unit tests for lifecycle.ComputeAction
- Support lifecycle rules with MaxNoncurrentVersions
- Extend ExpectedExpiryTime to work with zero days
- Fix all-time comparisons to be relative to UTC
2021-12-14 09:41:44 -08:00
Harshavardhana 113c7ff49a
add code to parse secrets natively instead of shell scripts (#13883) 2021-12-13 18:23:31 -08:00
Harshavardhana 8591d17d82
return appropriate errors upon parseErrors (#13831) 2021-12-05 11:36:26 -08:00
Klaus Post f56cac6381
jwt: Parse standard claims faster (#13821)
* Use structless/allocationless decoding for header (note "typ" isn't used)
* Create custom unmarshal code using jsonparser for StandardClaims.

Before/After:

```
BenchmarkParseJWTStandardClaims-32    	 4270724	       294.0 ns/op	     706 B/op	      16 allocs/op
BenchmarkParseJWTStandardClaims-32    	 5634847	       214.7 ns/op	     544 B/op	       9 allocs/op

BenchmarkParseJWTMapClaims-32    	 2763045	       428.6 ns/op	    1251 B/op	      29 allocs/op
BenchmarkParseJWTMapClaims-32    	 2839455	       410.9 ns/op	    1219 B/op	      26 allocs/op
```
2021-12-03 13:19:38 -08:00
Aditya Manthramurthy 4f35054d29
Ensure that role ARNs don't collide (#13817)
This is to prepare for multiple providers enhancement.
2021-12-03 13:15:56 -08:00
Shireesh Anjal d29df6714a
Introduce new config `subnet api_key` (#13793)
The earlier approach of using a license token for 
communicating with SUBNET is being replaced 
with a simpler mechanism of API keys. Unlike the 
license which is a JWT token, these API keys will 
be simple UUID tokens and don't have any embedded 
information in them. SUBNET would generate the 
API key on cluster registration, and then it would 
be saved in this config, to be used for subsequent 
communication with SUBNET.
2021-12-03 09:32:11 -08:00
Harshavardhana 24d904d194
reload certs from disk upon SIGHUP (#13792) 2021-12-01 00:38:32 -08:00
Klaus Post d6fe0f61a9
do not panic when input cannot be parsed (#13791)
Fix cases where `s3Select.Open` fails and doesn't set the recordReader.

Fixes #13786
2021-11-30 08:42:42 -08:00
Harshavardhana e49c184595
add configurable 'shutdown-timeout' for HTTP server (#13771)
fixes #12317
2021-11-29 09:06:56 -08:00
Aditya Manthramurthy 4c0f48c548
Add role ARN support for OIDC identity provider (#13651)
- Allows setting a role policy parameter when configuring OIDC provider

- When role policy is set, the server prints a role ARN usable in STS API requests

- The given role policy is applied to STS API requests when the roleARN parameter is provided.

- Service accounts for role policy are also possible and work as expected.
2021-11-26 19:22:40 -08:00
Aditya Manthramurthy 4ce6d35e30
Add new `site` config sub-system intended to replace `region` (#13672)
- New sub-system has "region" and "name" fields.

- `region` subsystem is marked as deprecated, however still works, unless the
new region parameter under `site` is set - in this case, the region subsystem is
ignored. `region` subsystem is hidden from top-level help (i.e. from `mc admin
config set myminio`), but appears when specifically requested (i.e. with `mc
admin config set myminio region`).

- MINIO_REGION, MINIO_REGION_NAME are supported as legacy environment variables for server region.

- Adds MINIO_SITE_REGION as the current environment variable to configure the
server region and MINIO_SITE_NAME for the site name.
2021-11-25 13:06:25 -08:00
Harshavardhana fee3f88cb5
use acceptedResponseStatusCode everywhere in HTTP logger (#13755) 2021-11-24 13:53:11 -08:00
Harshavardhana 08f4a0a816
fix: make sure esClient is allocated before use (#13727) 2021-11-22 12:46:46 -08:00
Krishnan Parthasarathi 3da9ee15d3
Add MaxNoncurrentVersions to NoncurrentExpiration action (#13580)
This unit allows users to limit the maximum number of noncurrent 
versions of an object.

To enable this rule you need the following *ilm.json*
```
cat >> ilm.json <<EOF
{
    "Rules": [
        {
            "ID": "test-max-noncurrent",
            "Status": "Enabled",
            "Filter": {
                "Prefix": "user-uploads/"
            },
            "NoncurrentVersionExpiration": {
                "MaxNoncurrentVersions": 5
            }
        }
    ]
}
EOF
mc ilm import myminio/mybucket < ilm.json
```
2021-11-19 17:54:10 -08:00
Harshavardhana fb268add7a
do not flush if Write() failed (#13597)
- Go might reset the internal http.ResponseWriter() to `nil`
  after Write() failure if the go-routine has returned, do not
  flush() such scenarios and avoid spurious flushes() as
  returning handlers always flush.
- fix some racy tests with the console 
- avoid ticker leaks in certain situations
2021-11-18 17:19:58 -08:00
Harshavardhana 9c5d9ae376
fallback O_DIRECT if not supported, do regular reads() (#13680) 2021-11-17 15:48:47 -08:00
Harshavardhana 8378bc9958
support dynamic redirect_uri based on incoming 'host' header (#13666)
This feature is useful in situations when console is exposed
over multiple intranent or internet entities when users are
connecting over local IP v/s going through load balancer.

Related console work was merged here

373bfbfe3f
2021-11-16 18:40:39 -08:00
Harshavardhana 661b263e77
add gocritic/ruleguard checks back again, cleanup code. (#13665)
- remove some duplicated code
- reported a bug, separately fixed in #13664
- using strings.ReplaceAll() when needed
- using filepath.ToSlash() use when needed
- remove all non-Go style comments from the codebase

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2021-11-16 09:28:29 -08:00
Shireesh Anjal d008e90d50
Support dynamic reset of minio config (#13626)
If a given MinIO config is dynamic (can be changed without restart),
ensure that it can be reset also without restart.

Signed-off-by: Shireesh Anjal <shireesh@minio.io>
2021-11-10 10:01:32 -08:00
Harshavardhana ea820b30bf
fix: use equalFold() instead of lower and compare (#13624) 2021-11-10 08:12:50 -08:00
Harshavardhana 0a6f9bc1eb
allocate new highwayhash for each string hash (#13623)
fixes #13622
2021-11-09 15:28:08 -08:00
Harshavardhana 520037e721
move to jwt-go v4 with correct releases (#13586) 2021-11-05 12:20:08 -07:00
Aditya Manthramurthy 947c423824
fix: user DN filtering that causes some unnecessary logs (#13584)
Additionally, remove the unnecessary `isUsingLookupBind` field in the LDAP struct
2021-11-04 13:11:20 -07:00
Pavel M 112f9ae087
claim exp should be integer (#13582)
claim exp can be 

- float64
- json.Number

As per OIDC spec https://openid.net/specs/openid-connect-core-1_0.html#IDToken

Avoid using strings since the upstream library only supports these two types now.
2021-11-04 12:03:43 -07:00
Harshavardhana 34680c5ccf
fix: SQL select to honor limits properly for array queries (#13568)
added tests to cover the scenarios as well.
2021-11-02 19:14:46 -07:00
Poorna K 7c33a33ef3
cache: fix commit value lookup in config (#13551) 2021-11-02 14:20:52 -07:00
Poorna K 3dfcca68e6
fix race in TestComputeActions test (#13564) 2021-11-02 14:20:15 -07:00
Harshavardhana 14d8a931fe
re-use io.Copy buffers with 32k pools (#13553)
Borrowed idea from Go's usage of this
optimization for ReadFrom() on client
side, we should re-use the 32k buffers
io.Copy() allocates for generic copy
from a reader to writer.

the performance increase for reads for
really tiny objects is at this range
after this change.

> * Fastest: +7.89% (+1.3 MiB/s) throughput, +7.89% (+1308.1) obj/s
2021-11-02 08:11:50 -07:00
Poorna K 15dcacc1fc
Add support for caching multipart in writethrough mode (#13507) 2021-11-01 08:11:58 -07:00
Klaus Post 8ed7346273
Disable AVX512 on Darwin (#13550)
Preemptively disable AVX512 until https://github.com/golang/go/issues/49233 has been resolved.

This potentially affects reedsolomon, simdjson, sha256-simd, md5-simd packages.

Init order requires a separate package since main itself is initialized last, but imports are initialized in the order they are imported from main (confirmed).
2021-11-01 08:03:16 -07:00
Klaus Post 9424dca9e4
jwt: Improve allocations (#13532)
Avoid string -> byte allocations.

```
BenchmarkParseJWTStandardClaims-32       3527152           343.2 ns/op      1489 B/op         21 allocs/op
BenchmarkParseJWTStandardClaims-32       4713199           259.2 ns/op       706 B/op         16 allocs/op

BenchmarkParseJWTMapClaims-32        2666668           448.7 ns/op      1883 B/op         32 allocs/op
BenchmarkParseJWTMapClaims-32        3120709           377.1 ns/op      1227 B/op         28 allocs/op
```
2021-10-28 17:04:48 -07:00
Harshavardhana db84bb9bd3
avoid atomics for self contained reader/writers (#13531)
read/writers are not concurrent in handlers
and self contained - no need to use atomics on
them.

avoids unnecessary contentions where it's not
required.
2021-10-28 17:03:00 -07:00
Klaus Post d9c1d79e30
Protect logger targets (#13529)
Logger targets were not race protected against concurrent updates from for example `HTTPConsoleLoggerSys`.

Restrict direct access to targets and make slices immutable so a returned slice can be processed safely without locks.
2021-10-28 07:35:28 -07:00
moon d158607f8e
fix(AuditLog): panic while st is nil (#13510) 2021-10-27 09:29:42 -07:00
Aditya Manthramurthy 29d885b40f
Add IAM system tests (#13487)
For internal IDP user, policy and groups
2021-10-22 01:33:28 -07:00
Harshavardhana 087dc13965
fix: server in shutdown should return 503 instead of 403 (#13496)
various situations where the client is retrying the request
server going through shutdown might incorrectly send 403
which is a non-retriable error, this PR allows for clients
when they retry an attempt to go to another healthy pod
or server in a distributed cluster - assuming it is a properly
load-balanced setup.
2021-10-22 01:30:27 -07:00
Harshavardhana ac36a377b0
fix: remove deprecated jwks_url from config KV (#13477) 2021-10-20 11:31:09 -07:00
Krishnan Parthasarathi 45d145a823
fix: immediate tiering for NoncurrentVersionTransition (#13464) 2021-10-18 17:24:30 -07:00
Anis Elleuch d86513cbba
tls: Better error message when certificate curve is not supported (#13462) 2021-10-18 09:32:16 -07:00
Klaus Post c2eb60df4a
bz2: limit max concurrent CPU (#13458)
Ensure that bz2 decompression will never take more than 50% CPU.
2021-10-18 08:44:36 -07:00
Harshavardhana 838de23357
re-use rand.New() do not repeat allocate. (#13448)
also simplify readerLocks to be just like
writeLocks, DRWMutex() is never shared
and there are order guarantees that need
for such a thing to work for RLock's
2021-10-18 08:39:59 -07:00
Anis Elleuch d7b7040408
tls: Avoid 3DES cipher (#13459)
3DES is enabled by default in Golang, this commit will use
tls.CipherSuites() which returns all ciphers excluding those with
security issues, such as 3DES.
2021-10-18 08:39:15 -07:00
Klaus Post 5e53f767c4
Use concurrent bz2 decompression (#13360)
Testing with `mc sql --compression BZIP2 --csv-input "rd=\n,fh=USE,fd=;" --query="select COUNT(*) from S3Object" local2/testbucket/nyc-taxi-data-10M.csv.bz2`

Before 96.98s, after 10.79s. Uses about 70% CPU while running.
2021-10-14 11:11:07 -07:00
Klaus Post 974073a2e5
directio: Check if buffers are set. (#13440)
Check if directio buffers have actually been fetched and prevent errors on double Close. Return error on Read after Close.

Fixes

```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xf8582f]

goroutine 210 [running]:
github.com/minio/minio/internal/ioutil.(*ODirectReader).Read(0xc0054f8320, {0xc0014560b0, 0xa8, 0x44d012})
	github.com/minio/minio/internal/ioutil/odirect_reader.go:88 +0x10f
io.ReadAtLeast({0x428c5c0, 0xc0054f8320}, {0xc0014560b0, 0xa8, 0xa8}, 0xa8)
	io/io.go:328 +0x9a
io.ReadFull(...)
	io/io.go:347
github.com/minio/minio/internal/ioutil.ReadFile({0xc001bf60e0, 0x6})
	github.com/minio/minio/internal/ioutil/read_file.go:48 +0x19b
github.com/minio/minio/cmd.(*FSObjects).scanBucket.func1({{0xc00444e1e0, 0x4d}, 0x0, {0xc0040cf240, 0xe}, {0xc0040cf24f, 0x18}, {0xc0040cf268, 0x18}, 0x0, ...})
	github.com/minio/minio/cmd/fs-v1.go:366 +0x1ea
github.com/minio/minio/cmd.(*folderScanner).scanFolder.func1({0xc00474a6a8, 0xc0065d6793}, 0x0)
	github.com/minio/minio/cmd/data-scanner.go:494 +0xb15
github.com/minio/minio/cmd.readDirFn({0xc002803e80, 0x34}, 0xc000670270)
	github.com/minio/minio/cmd/os-readdir_unix.go:172 +0x638
github.com/minio/minio/cmd.(*folderScanner).scanFolder(0xc002deeb40, {0x42dc9d0, 0xc00068cbc0}, {{0xc001c6e2d0, 0x27}, 0xc0023db8e0, 0x1}, 0xc0001c7ab0)
	github.com/minio/minio/cmd/data-scanner.go:427 +0xa8f
github.com/minio/minio/cmd.(*folderScanner).scanFolder.func2({{0xc001c6e2d0, 0x27}, 0xc0023db8e0, 0x27})
	github.com/minio/minio/cmd/data-scanner.go:549 +0xd0
github.com/minio/minio/cmd.(*folderScanner).scanFolder(0xc002deeb40, {0x42dc9d0, 0xc00068cbc0}, {{0xc0013fa9e0, 0xe}, 0x0, 0x1}, 0xc000670dd8)
	github.com/minio/minio/cmd/data-scanner.go:623 +0x205d
github.com/minio/minio/cmd.scanDataFolder({_, _}, {_, _}, {{{0xc0013fa9e0, 0xe}, 0x802, {0x210f15d2, 0xed8f903b8, 0x5bc0e80}, ...}, ...}, ...)
	github.com/minio/minio/cmd/data-scanner.go:333 +0xc51
github.com/minio/minio/cmd.(*FSObjects).scanBucket(_, {_, _}, {_, _}, {{{0xc0013fa9e0, 0xe}, 0x802, {0x210f15d2, 0xed8f903b8, ...}, ...}, ...})
	github.com/minio/minio/cmd/fs-v1.go:364 +0x305
github.com/minio/minio/cmd.(*FSObjects).NSScanner(0x42dc9d0, {0x42dc9d0, 0xc00068cbc0}, 0x0, 0xc003bcfda0, 0x802)
	github.com/minio/minio/cmd/fs-v1.go:307 +0xa16
github.com/minio/minio/cmd.runDataScanner({0x42dc9d0, 0xc00068cbc0}, {0x436a6c0, 0xc000bfcf50})
	github.com/minio/minio/cmd/data-scanner.go:150 +0x749
created by github.com/minio/minio/cmd.initDataScanner
	github.com/minio/minio/cmd/data-scanner.go:73 +0xb0
```
2021-10-14 10:19:17 -07:00
Aditya Manthramurthy 91a0e7bdaa
update mysql notification key length, character set and collation (#13414)
fixes #13227
2021-10-11 17:40:11 -07:00
Harshavardhana b07e309627 fix: ignore empty values while parsing tlsEnabled value 2021-10-11 17:04:02 -07:00
Harshavardhana 9ea45399ce
fix: enable AssumeRoleWithCertificate API only when asked (#13410)
This is a breaking change but we need to do this to avoid
issues discussed in #13409 based on discussions from #13371

fixes #13371
fixes #13409
2021-10-11 14:23:51 -07:00
Klaus Post 9f652708ee
Fix Elastic crash with no index (#13406)
Removed naked assert.

Fixes #13389
2021-10-11 10:07:38 -07:00
David Regla a188554fe1
Add missing keys to API config help (#13255)
Added missing `apiClusterDeadline` and `apiListQuorum` to API config.HelpKVS structure
2021-10-10 09:52:21 -07:00
Harshavardhana acc9645249
allow more socket listeners per instance for multi-core setups (#13385) 2021-10-08 16:58:24 -07:00
Harshavardhana 60f961dfe8
allow disabling strict sha256 validation with some broken clients (#13383)
with some broken clients allow non-strict validation
of sha256 when ContentLength > 0, it has been found in
the wild some applications that need this behavior. This
shall be only allowed if `--no-compat` is used.
2021-10-08 12:40:34 -07:00
Harshavardhana d57b57bddc
feat: Add RX/TX to audit logging (#13382)
add additional values for audit logging
2021-10-07 19:03:46 -07:00
Harshavardhana cb2c2905c5
fix: do not make TLS strict based on serverName (#13372)
LDAP TLS dialer shouldn't be strict with ServerName, there
maybe many certs talking to common DNS endpoint it is
better to allow Dialer to choose appropriate public cert.
2021-10-06 14:19:32 -07:00
Harshavardhana 3d5750f31c
update and use rs/dnscache implementation instead of custom (#13348)
additionally optimize for IP only setups, avoid doing
unnecessary lookups if the Dial addr is an IP.

allow support for multiple listeners on same socket,
this is mainly meant for future purposes.
2021-10-05 10:13:04 -07:00
Harshavardhana fabf60bc4c
fix: allow configuring cleanup of stale multipart uploads (#13354)
allow dynamically changing cleanup of stale multipart
uploads, their expiry and how frequently its checked.

Improves #13270
2021-10-04 10:52:28 -07:00
Klaus Post 75699a3825
Add basic scanner metrics (#13317)
Add number of objects/versions/folders scanned as well as ILM action outcomes.
2021-10-02 09:31:05 -07:00
Krishnan Parthasarathi f3aeed77e5
Add immediate inline tiering support (#13298) 2021-10-01 11:58:17 -07:00
Harshavardhana ffd497673f
internode lockArgs should use messagepack (#13329)
it would seem like using `bufio.Scan()` is very
slow for heavy concurrent I/O, ie. when r.Body
is slow , instead use a proper
binary exchange format, to marshal and unmarshal
the LockArgs datastructure in a cleaner way.

this PR increases performance of the locking
sub-system for tiny repeated read lock requests
on same object.

```
BenchmarkLockArgs
BenchmarkLockArgs-4              6417609               185.7 ns/op            56 B/op          2 allocs/op
BenchmarkLockArgsOld
BenchmarkLockArgsOld-4           1187368              1015 ns/op            4096 B/op          1 allocs/op
```
2021-09-30 11:53:01 -07:00
Harshavardhana d00ff3c453
use O_DIRECT for all ReadFileStream (#13324)
This PR also removes #13312 to ensure
that we can use a better mechanism to
handle page-cache, using O_DIRECT
even for Range GETs.
2021-09-29 16:40:28 -07:00
Harshavardhana 38027c8f52
use fadvise to control Linux page-cache (#13312)
This PR brings two optimizations mainly
for page-cache build-up and how to avoid
getting OOM killed in the process. Although
these memories are reclaimable Linux is not
fast enough to reclaim them as needed on a
very busy system. fadvise is a system call
implemented in Linux to advise page-cache to
avoid overload as we get significant amount
of requests on the server.

- FADV_SEQUENTIAL tells that all I/O from now
  is going to be sequential, allowing for more
  resposive throughput.

- FADV_NOREUSE tells kernel to start removing
  things for this 'fd' from page-cache.
2021-09-28 10:02:56 -07:00
Harshavardhana 3c70eca758
enable SO_REUSEPORT sockets, allow cleaner reuse of time_waits (#13307)
Refer here https://lwn.net/Articles/542629/
2021-09-27 09:27:16 -07:00
Harshavardhana 200caab82b
fix: multi-pool setup make sure acquire locks properly (#13280)
This was a regression introduced in '14bb969782'
this has the potential to cause corruption when
there are concurrent overwrites attempting to update
the content on the namespace.

This PR adds a situation where PutObject(), CopyObject()
compete properly for the same locks with NewMultipartUpload()
however it ends up turning off competing locks for the actual
object with GetObject() and DeleteObject() - since they do not
compete due to concurrent I/O on a versioned bucket it can lead
to loss of versions.

This PR fixes this bug with multi-pool setup with replication
that causes corruption of inlined data due to lack of competing
locks in a multi-pool setup.

Instead CompleteMultipartUpload holds the necessary
locks when finishing the transaction, knowing the exact
location of an object to schedule the multipart upload
doesn't need to compete in this manner, a pool id location
for existing object.
2021-09-22 21:46:24 -07:00