2021-04-18 15:41:13 -04:00
|
|
|
// Copyright (c) 2015-2021 MinIO, Inc.
|
|
|
|
//
|
|
|
|
// This file is part of MinIO Object Storage stack
|
|
|
|
//
|
|
|
|
// This program is free software: you can redistribute it and/or modify
|
|
|
|
// it under the terms of the GNU Affero General Public License as published by
|
|
|
|
// the Free Software Foundation, either version 3 of the License, or
|
|
|
|
// (at your option) any later version.
|
|
|
|
//
|
|
|
|
// This program is distributed in the hope that it will be useful
|
|
|
|
// but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
// GNU Affero General Public License for more details.
|
|
|
|
//
|
|
|
|
// You should have received a copy of the GNU Affero General Public License
|
|
|
|
// along with this program. If not, see <http://www.gnu.org/licenses/>.
|
2017-10-22 01:30:34 -04:00
|
|
|
|
|
|
|
package hash
|
|
|
|
|
|
|
|
import (
|
|
|
|
"bytes"
|
2023-09-18 13:00:54 -04:00
|
|
|
"context"
|
2017-12-21 20:27:33 -05:00
|
|
|
"encoding/base64"
|
2017-10-22 01:30:34 -04:00
|
|
|
"encoding/hex"
|
|
|
|
"errors"
|
|
|
|
"hash"
|
|
|
|
"io"
|
2022-08-29 19:57:16 -04:00
|
|
|
"net/http"
|
2018-01-17 13:54:31 -05:00
|
|
|
|
2021-06-01 17:59:40 -04:00
|
|
|
"github.com/minio/minio/internal/etag"
|
2022-05-27 09:00:19 -04:00
|
|
|
"github.com/minio/minio/internal/hash/sha256"
|
2023-05-05 22:53:12 -04:00
|
|
|
"github.com/minio/minio/internal/ioutil"
|
2017-10-22 01:30:34 -04:00
|
|
|
)
|
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
// A Reader wraps an io.Reader and computes the MD5 checksum
|
|
|
|
// of the read content as ETag. Optionally, it also computes
|
|
|
|
// the SHA256 checksum of the content.
|
|
|
|
//
|
|
|
|
// If the reference values for the ETag and content SHA26
|
|
|
|
// are not empty then it will check whether the computed
|
|
|
|
// match the reference values.
|
2017-10-22 01:30:34 -04:00
|
|
|
type Reader struct {
|
2023-05-15 17:08:54 -04:00
|
|
|
src io.Reader
|
|
|
|
bytesRead int64
|
|
|
|
expectedMin int64
|
|
|
|
expectedMax int64
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
|
2018-09-27 23:36:17 -04:00
|
|
|
size int64
|
|
|
|
actualSize int64
|
2017-10-22 01:30:34 -04:00
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
checksum etag.ETag
|
|
|
|
contentSHA256 []byte
|
|
|
|
|
2022-08-29 19:57:16 -04:00
|
|
|
// Content checksum
|
|
|
|
contentHash Checksum
|
|
|
|
contentHasher hash.Hash
|
2023-07-05 06:16:05 -04:00
|
|
|
disableMD5 bool
|
2022-08-29 19:57:16 -04:00
|
|
|
|
2023-05-05 22:53:12 -04:00
|
|
|
trailer http.Header
|
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
sha256 hash.Hash
|
2017-10-22 01:30:34 -04:00
|
|
|
}
|
|
|
|
|
2023-07-05 06:16:05 -04:00
|
|
|
// Options are optional arguments to NewReaderWithOpts, Options
|
|
|
|
// simply converts positional arguments to NewReader() into a
|
|
|
|
// more flexible way to provide optional inputs. This is currently
|
|
|
|
// used by the FanOut API call mostly to disable expensive md5sum
|
|
|
|
// calculation repeatedly under hash.Reader.
|
|
|
|
type Options struct {
|
|
|
|
MD5Hex string
|
|
|
|
SHA256Hex string
|
|
|
|
Size int64
|
|
|
|
ActualSize int64
|
|
|
|
DisableMD5 bool
|
2023-09-18 13:00:54 -04:00
|
|
|
ForceMD5 []byte
|
2023-07-05 06:16:05 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
// NewReaderWithOpts is like NewReader but takes `Options` as argument, allowing
|
|
|
|
// callers to indicate if they want to disable md5sum checksum.
|
2023-09-18 13:00:54 -04:00
|
|
|
func NewReaderWithOpts(ctx context.Context, src io.Reader, opts Options) (*Reader, error) {
|
2023-07-05 06:16:05 -04:00
|
|
|
// return hard limited reader
|
2023-09-18 13:00:54 -04:00
|
|
|
return newReader(ctx, src, opts.Size, opts.MD5Hex, opts.SHA256Hex, opts.ActualSize, opts.DisableMD5, opts.ForceMD5)
|
2023-07-05 06:16:05 -04:00
|
|
|
}
|
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
// NewReader returns a new Reader that wraps src and computes
|
|
|
|
// MD5 checksum of everything it reads as ETag.
|
|
|
|
//
|
|
|
|
// It also computes the SHA256 checksum of everything it reads
|
|
|
|
// if sha256Hex is not the empty string.
|
|
|
|
//
|
|
|
|
// If size resp. actualSize is unknown at the time of calling
|
|
|
|
// NewReader then it should be set to -1.
|
2023-05-16 16:14:37 -04:00
|
|
|
// When size is >=0 it *must* match the amount of data provided by r.
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
//
|
|
|
|
// NewReader may try merge the given size, MD5 and SHA256 values
|
|
|
|
// into src - if src is a Reader - to avoid computing the same
|
|
|
|
// checksums multiple times.
|
2023-05-12 14:19:08 -04:00
|
|
|
// NewReader enforces S3 compatibility strictly by ensuring caller
|
|
|
|
// does not send more content than specified size.
|
2023-09-18 13:00:54 -04:00
|
|
|
func NewReader(ctx context.Context, src io.Reader, size int64, md5Hex, sha256Hex string, actualSize int64) (*Reader, error) {
|
|
|
|
return newReader(ctx, src, size, md5Hex, sha256Hex, actualSize, false, nil)
|
2023-05-12 14:19:08 -04:00
|
|
|
}
|
|
|
|
|
2023-09-18 13:00:54 -04:00
|
|
|
func newReader(ctx context.Context, src io.Reader, size int64, md5Hex, sha256Hex string, actualSize int64, disableMD5 bool, forceMD5 []byte) (*Reader, error) {
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
MD5, err := hex.DecodeString(md5Hex)
|
|
|
|
if err != nil {
|
|
|
|
return nil, BadDigest{ // TODO(aead): Return an error that indicates that an invalid ETag has been specified
|
|
|
|
ExpectedMD5: md5Hex,
|
|
|
|
CalculatedMD5: "",
|
|
|
|
}
|
|
|
|
}
|
|
|
|
SHA256, err := hex.DecodeString(sha256Hex)
|
|
|
|
if err != nil {
|
|
|
|
return nil, SHA256Mismatch{ // TODO(aead): Return an error that indicates that an invalid Content-SHA256 has been specified
|
|
|
|
ExpectedSHA256: sha256Hex,
|
|
|
|
CalculatedSHA256: "",
|
|
|
|
}
|
2017-10-22 01:30:34 -04:00
|
|
|
}
|
2017-10-24 15:25:42 -04:00
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
// Merge the size, MD5 and SHA256 values if src is a Reader.
|
|
|
|
// The size may be set to -1 by callers if unknown.
|
|
|
|
if r, ok := src.(*Reader); ok {
|
|
|
|
if r.bytesRead > 0 {
|
|
|
|
return nil, errors.New("hash: already read from hash reader")
|
|
|
|
}
|
2022-08-29 19:57:16 -04:00
|
|
|
if len(r.checksum) != 0 && len(MD5) != 0 && !etag.Equal(r.checksum, MD5) {
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
return nil, BadDigest{
|
|
|
|
ExpectedMD5: r.checksum.String(),
|
|
|
|
CalculatedMD5: md5Hex,
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if len(r.contentSHA256) != 0 && len(SHA256) != 0 && !bytes.Equal(r.contentSHA256, SHA256) {
|
|
|
|
return nil, SHA256Mismatch{
|
|
|
|
ExpectedSHA256: hex.EncodeToString(r.contentSHA256),
|
|
|
|
CalculatedSHA256: sha256Hex,
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if r.size >= 0 && size >= 0 && r.size != size {
|
2023-05-15 17:08:54 -04:00
|
|
|
return nil, SizeMismatch{Want: r.size, Got: size}
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
}
|
2017-10-22 01:30:34 -04:00
|
|
|
|
2022-08-29 19:57:16 -04:00
|
|
|
r.checksum = MD5
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
r.contentSHA256 = SHA256
|
|
|
|
if r.size < 0 && size >= 0 {
|
2023-05-16 16:14:37 -04:00
|
|
|
r.src = etag.Wrap(ioutil.HardLimitReader(r.src, size), r.src)
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
r.size = size
|
2019-05-08 21:35:40 -04:00
|
|
|
}
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
if r.actualSize <= 0 && actualSize >= 0 {
|
|
|
|
r.actualSize = actualSize
|
2017-10-22 01:30:34 -04:00
|
|
|
}
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
return r, nil
|
2017-10-22 01:30:34 -04:00
|
|
|
}
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
|
|
|
|
if size >= 0 {
|
2023-05-16 16:14:37 -04:00
|
|
|
r := ioutil.HardLimitReader(src, size)
|
2023-07-05 06:16:05 -04:00
|
|
|
if !disableMD5 {
|
|
|
|
if _, ok := src.(etag.Tagger); !ok {
|
2023-09-18 13:00:54 -04:00
|
|
|
src = etag.NewReader(ctx, r, MD5, forceMD5)
|
2023-07-05 06:16:05 -04:00
|
|
|
} else {
|
|
|
|
src = etag.Wrap(r, src)
|
|
|
|
}
|
2021-05-27 11:18:41 -04:00
|
|
|
} else {
|
2023-07-05 06:16:05 -04:00
|
|
|
src = r
|
2021-05-27 11:18:41 -04:00
|
|
|
}
|
|
|
|
} else if _, ok := src.(etag.Tagger); !ok {
|
2023-07-05 06:16:05 -04:00
|
|
|
if !disableMD5 {
|
2023-09-18 13:00:54 -04:00
|
|
|
src = etag.NewReader(ctx, src, MD5, forceMD5)
|
2023-07-05 06:16:05 -04:00
|
|
|
}
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
}
|
2022-08-29 19:57:16 -04:00
|
|
|
var h hash.Hash
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
if len(SHA256) != 0 {
|
2022-08-29 19:57:16 -04:00
|
|
|
h = sha256.New()
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
}
|
|
|
|
return &Reader{
|
2021-05-27 11:18:41 -04:00
|
|
|
src: src,
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
size: size,
|
|
|
|
actualSize: actualSize,
|
2022-08-29 19:57:16 -04:00
|
|
|
checksum: MD5,
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
contentSHA256: SHA256,
|
2022-08-29 19:57:16 -04:00
|
|
|
sha256: h,
|
2023-07-05 06:16:05 -04:00
|
|
|
disableMD5: disableMD5,
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
}, nil
|
|
|
|
}
|
2023-05-12 14:19:08 -04:00
|
|
|
|
2022-08-29 19:57:16 -04:00
|
|
|
// ErrInvalidChecksum is returned when an invalid checksum is provided in headers.
|
|
|
|
var ErrInvalidChecksum = errors.New("invalid checksum")
|
|
|
|
|
2023-05-15 17:08:54 -04:00
|
|
|
// SetExpectedMin set expected minimum data expected from reader
|
|
|
|
func (r *Reader) SetExpectedMin(expectedMin int64) {
|
|
|
|
r.expectedMin = expectedMin
|
|
|
|
}
|
|
|
|
|
|
|
|
// SetExpectedMax set expected max data expected from reader
|
|
|
|
func (r *Reader) SetExpectedMax(expectedMax int64) {
|
|
|
|
r.expectedMax = expectedMax
|
|
|
|
}
|
|
|
|
|
2022-08-29 19:57:16 -04:00
|
|
|
// AddChecksum will add checksum checks as specified in
|
|
|
|
// https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html
|
|
|
|
// Returns ErrInvalidChecksum if a problem with the checksum is found.
|
|
|
|
func (r *Reader) AddChecksum(req *http.Request, ignoreValue bool) error {
|
2023-05-23 10:58:33 -04:00
|
|
|
cs, err := GetContentChecksum(req.Header)
|
2022-08-29 19:57:16 -04:00
|
|
|
if err != nil {
|
|
|
|
return ErrInvalidChecksum
|
|
|
|
}
|
|
|
|
if cs == nil {
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
r.contentHash = *cs
|
2023-05-05 22:53:12 -04:00
|
|
|
if cs.Type.Trailing() {
|
|
|
|
r.trailer = req.Trailer
|
|
|
|
}
|
2023-05-23 10:58:33 -04:00
|
|
|
return r.AddNonTrailingChecksum(cs, ignoreValue)
|
|
|
|
}
|
|
|
|
|
2023-05-25 01:51:07 -04:00
|
|
|
// AddChecksumNoTrailer will add checksum checks as specified in
|
|
|
|
// https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html
|
|
|
|
// Returns ErrInvalidChecksum if a problem with the checksum is found.
|
|
|
|
func (r *Reader) AddChecksumNoTrailer(headers http.Header, ignoreValue bool) error {
|
|
|
|
cs, err := GetContentChecksum(headers)
|
|
|
|
if err != nil {
|
|
|
|
return ErrInvalidChecksum
|
|
|
|
}
|
|
|
|
if cs == nil {
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
r.contentHash = *cs
|
|
|
|
return r.AddNonTrailingChecksum(cs, ignoreValue)
|
|
|
|
}
|
|
|
|
|
2023-05-23 10:58:33 -04:00
|
|
|
// AddNonTrailingChecksum will add a checksum to the reader.
|
|
|
|
// The checksum cannot be trailing.
|
|
|
|
func (r *Reader) AddNonTrailingChecksum(cs *Checksum, ignoreValue bool) error {
|
|
|
|
if cs == nil {
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
r.contentHash = *cs
|
2023-05-05 22:53:12 -04:00
|
|
|
if ignoreValue {
|
|
|
|
// Do not validate, but allow for transfer
|
2022-08-29 19:57:16 -04:00
|
|
|
return nil
|
|
|
|
}
|
2023-05-05 22:53:12 -04:00
|
|
|
|
2022-08-29 19:57:16 -04:00
|
|
|
r.contentHasher = cs.Type.Hasher()
|
|
|
|
if r.contentHasher == nil {
|
|
|
|
return ErrInvalidChecksum
|
|
|
|
}
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
func (r *Reader) Read(p []byte) (int, error) {
|
|
|
|
n, err := r.src.Read(p)
|
2020-05-14 17:01:31 -04:00
|
|
|
r.bytesRead += int64(n)
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
if r.sha256 != nil {
|
|
|
|
r.sha256.Write(p[:n])
|
|
|
|
}
|
2022-08-29 19:57:16 -04:00
|
|
|
if r.contentHasher != nil {
|
|
|
|
r.contentHasher.Write(p[:n])
|
|
|
|
}
|
2017-10-22 01:30:34 -04:00
|
|
|
|
2024-12-09 21:59:22 -05:00
|
|
|
// If we have reached our expected size,
|
|
|
|
// do one more read to ensure we are at EOF
|
|
|
|
// and that any trailers have been read.
|
|
|
|
attempts := 0
|
|
|
|
for err == nil && r.size >= 0 && r.bytesRead >= r.size {
|
|
|
|
attempts++
|
|
|
|
if r.bytesRead > r.size {
|
|
|
|
return 0, SizeTooLarge{Want: r.size, Got: r.bytesRead}
|
|
|
|
}
|
|
|
|
var tmp [1]byte
|
|
|
|
var n2 int
|
|
|
|
n2, err = r.src.Read(tmp[:])
|
|
|
|
if n2 > 0 {
|
|
|
|
return 0, SizeTooLarge{Want: r.size, Got: r.bytesRead}
|
|
|
|
}
|
|
|
|
if attempts == 100 {
|
|
|
|
return 0, io.ErrNoProgress
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
if err == io.EOF { // Verify content SHA256, if set.
|
2023-05-15 17:08:54 -04:00
|
|
|
if r.expectedMin > 0 {
|
|
|
|
if r.bytesRead < r.expectedMin {
|
|
|
|
return 0, SizeTooSmall{Want: r.expectedMin, Got: r.bytesRead}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if r.expectedMax > 0 {
|
|
|
|
if r.bytesRead > r.expectedMax {
|
|
|
|
return 0, SizeTooLarge{Want: r.expectedMax, Got: r.bytesRead}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
if r.sha256 != nil {
|
|
|
|
if sum := r.sha256.Sum(nil); !bytes.Equal(r.contentSHA256, sum) {
|
|
|
|
return n, SHA256Mismatch{
|
|
|
|
ExpectedSHA256: hex.EncodeToString(r.contentSHA256),
|
|
|
|
CalculatedSHA256: hex.EncodeToString(sum),
|
|
|
|
}
|
|
|
|
}
|
2017-10-22 01:30:34 -04:00
|
|
|
}
|
2022-08-29 19:57:16 -04:00
|
|
|
if r.contentHasher != nil {
|
2023-05-05 22:53:12 -04:00
|
|
|
if r.contentHash.Type.Trailing() {
|
|
|
|
var err error
|
|
|
|
r.contentHash.Encoded = r.trailer.Get(r.contentHash.Type.Key())
|
|
|
|
r.contentHash.Raw, err = base64.StdEncoding.DecodeString(r.contentHash.Encoded)
|
|
|
|
if err != nil || len(r.contentHash.Raw) == 0 {
|
|
|
|
return 0, ChecksumMismatch{Got: r.contentHash.Encoded}
|
|
|
|
}
|
|
|
|
}
|
2022-10-15 14:58:47 -04:00
|
|
|
if sum := r.contentHasher.Sum(nil); !bytes.Equal(r.contentHash.Raw, sum) {
|
2022-08-29 19:57:16 -04:00
|
|
|
err := ChecksumMismatch{
|
|
|
|
Want: r.contentHash.Encoded,
|
|
|
|
Got: base64.StdEncoding.EncodeToString(sum),
|
|
|
|
}
|
|
|
|
return n, err
|
|
|
|
}
|
|
|
|
}
|
2017-10-22 01:30:34 -04:00
|
|
|
}
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
if err != nil && err != io.EOF {
|
|
|
|
if v, ok := err.(etag.VerifyError); ok {
|
|
|
|
return n, BadDigest{
|
|
|
|
ExpectedMD5: v.Expected.String(),
|
|
|
|
CalculatedMD5: v.Computed.String(),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return n, err
|
2017-10-22 01:30:34 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
// Size returns the absolute number of bytes the Reader
|
|
|
|
// will return during reading. It returns -1 for unlimited
|
|
|
|
// data.
|
|
|
|
func (r *Reader) Size() int64 { return r.size }
|
|
|
|
|
2018-09-27 23:36:17 -04:00
|
|
|
// ActualSize returns the pre-modified size of the object.
|
|
|
|
// DecompressedSize - For compressed objects.
|
|
|
|
func (r *Reader) ActualSize() int64 { return r.actualSize }
|
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
// ETag returns the ETag computed by an underlying etag.Tagger.
|
|
|
|
// If the underlying io.Reader does not implement etag.Tagger
|
|
|
|
// it returns nil.
|
|
|
|
func (r *Reader) ETag() etag.ETag {
|
|
|
|
if t, ok := r.src.(etag.Tagger); ok {
|
|
|
|
return t.ETag()
|
|
|
|
}
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
|
|
|
// MD5Current returns the MD5 checksum of the content
|
|
|
|
// that has been read so far.
|
|
|
|
//
|
|
|
|
// Calling MD5Current again after reading more data may
|
|
|
|
// result in a different checksum.
|
2017-10-24 15:25:42 -04:00
|
|
|
func (r *Reader) MD5Current() []byte {
|
2023-07-05 06:16:05 -04:00
|
|
|
if r.disableMD5 {
|
|
|
|
return r.checksum
|
|
|
|
}
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
return r.ETag()[:]
|
2017-10-24 15:25:42 -04:00
|
|
|
}
|
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
// SHA256 returns the SHA256 checksum set as reference value.
|
|
|
|
//
|
|
|
|
// It corresponds to the checksum that is expected and
|
|
|
|
// not the actual SHA256 checksum of the content.
|
2017-10-22 01:30:34 -04:00
|
|
|
func (r *Reader) SHA256() []byte {
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
return r.contentSHA256
|
2017-10-22 01:30:34 -04:00
|
|
|
}
|
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
// SHA256HexString returns a hex representation of the SHA256.
|
2017-10-22 01:30:34 -04:00
|
|
|
func (r *Reader) SHA256HexString() string {
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
return hex.EncodeToString(r.contentSHA256)
|
2017-10-22 01:30:34 -04:00
|
|
|
}
|
2020-05-14 17:01:31 -04:00
|
|
|
|
2022-08-29 19:57:16 -04:00
|
|
|
// ContentCRCType returns the content checksum type.
|
|
|
|
func (r *Reader) ContentCRCType() ChecksumType {
|
|
|
|
return r.contentHash.Type
|
|
|
|
}
|
|
|
|
|
|
|
|
// ContentCRC returns the content crc if set.
|
|
|
|
func (r *Reader) ContentCRC() map[string]string {
|
|
|
|
if r.contentHash.Type == ChecksumNone || !r.contentHash.Valid() {
|
|
|
|
return nil
|
|
|
|
}
|
2023-05-05 22:53:12 -04:00
|
|
|
if r.contentHash.Type.Trailing() {
|
|
|
|
return map[string]string{r.contentHash.Type.String(): r.trailer.Get(r.contentHash.Type.Key())}
|
|
|
|
}
|
2022-08-29 19:57:16 -04:00
|
|
|
return map[string]string{r.contentHash.Type.String(): r.contentHash.Encoded}
|
|
|
|
}
|
|
|
|
|
2024-09-19 08:59:07 -04:00
|
|
|
// Checksum returns the content checksum if set.
|
|
|
|
func (r *Reader) Checksum() *Checksum {
|
|
|
|
if !r.contentHash.Type.IsSet() || !r.contentHash.Valid() {
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
return &r.contentHash
|
|
|
|
}
|
|
|
|
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
var _ io.Closer = (*Reader)(nil) // compiler check
|
2020-05-14 17:01:31 -04:00
|
|
|
|
|
|
|
// Close and release resources.
|
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 15:31:53 -05:00
|
|
|
func (r *Reader) Close() error { return nil }
|