azure.storage.blob.blockblobservice module

class azure.storage.blob.blockblobservice.BlockBlobService(account_name=None, account_key=None, sas_token=None, is_emulated=False, protocol='https', endpoint_suffix='core.windows.net', custom_domain=None, request_session=None, connection_string=None, socket_timeout=None)[source]

Bases: azure.storage.blob.baseblobservice.BaseBlobService

Block blobs let you upload large blobs efficiently. Block blobs are comprised of blocks, each of which is identified by a block ID. You create or modify a block blob by writing a set of blocks and committing them by their block IDs. Each block can be a different size, up to a maximum of 4 MB, and a block blob can include up to 50,000 blocks. The maximum size of a block blob is therefore slightly more than 195 GB (4 MB X 50,000 blocks). If you are writing a block blob that is no more than 64 MB in size, you can upload it in its entirety with a single write operation; see create_blob_from_bytes.

Variables:
  • MAX_SINGLE_PUT_SIZE (int) – The largest size upload supported in a single put call. This is used by the create_blob_from_* methods if the content length is known and is less than this value.
  • MAX_BLOCK_SIZE (int) – The size of the blocks put by create_blob_from_* methods if the content length is unknown or is larger than MAX_SINGLE_PUT_SIZE. Smaller blocks may be put. The maximum block size the service supports is 100MB.
  • MIN_LARGE_BLOCK_UPLOAD_THRESHOLD (int) – The minimum block size at which the the memory-optimized, block upload algorithm is considered. This algorithm is only applicable to the create_blob_from_file and create_blob_from_stream methods and will prevent the full buffering of blocks. In addition to the block size, ContentMD5 validation and Encryption must be disabled as these options require the blocks to be buffered.
Parameters:
  • account_name (str) – The storage account name. This is used to authenticate requests signed with an account key and to construct the storage endpoint. It is required unless a connection string is given, or if a custom domain is used with anonymous authentication.
  • account_key (str) – The storage account key. This is used for shared key authentication. If neither account key or sas token is specified, anonymous access will be used.
  • sas_token (str) – A shared access signature token to use to authenticate requests instead of the account key. If account key and sas token are both specified, account key will be used to sign. If neither are specified, anonymous access will be used.
  • is_emulated (bool) – Whether to use the emulator. Defaults to False. If specified, will override all other parameters besides connection string and request session.
  • protocol (str) – The protocol to use for requests. Defaults to https.
  • endpoint_suffix (str) – The host base component of the url, minus the account name. Defaults to Azure (core.windows.net). Override this to use the China cloud (core.chinacloudapi.cn).
  • custom_domain (str) – The custom domain to use. This can be set in the Azure Portal. For example, ‘www.mydomain.com’.
  • request_session (requests.Session) – The session object to use for http requests.
  • connection_string (str) – If specified, this will override all other parameters besides request session. See http://azure.microsoft.com/en-us/documentation/articles/storage-configure-connection-string/ for the connection string format.
  • socket_timeout (int) – If specified, this will override the default socket timeout. The timeout specified is in seconds. See DEFAULT_SOCKET_TIMEOUT in _constants.py for the default value.
MAX_BLOCK_SIZE = 4194304
MAX_SINGLE_PUT_SIZE = 67108864
MIN_LARGE_BLOCK_UPLOAD_THRESHOLD = 4194305
create_blob_from_bytes(container_name, blob_name, blob, index=0, count=None, content_settings=None, metadata=None, validate_content=False, progress_callback=None, max_connections=2, lease_id=None, if_modified_since=None, if_unmodified_since=None, if_match=None, if_none_match=None, timeout=None)[source]

Creates a new blob from an array of bytes, or updates the content of an existing blob, with automatic chunking and progress notifications.

Parameters:
  • container_name (str) – Name of existing container.
  • blob_name (str) – Name of blob to create or update.
  • blob (bytes) – Content of blob as an array of bytes.
  • index (int) – Start index in the array of bytes.
  • count (int) – Number of bytes to upload. Set to None or negative value to upload all bytes starting from index.
  • content_settings (ContentSettings) – ContentSettings object used to set blob properties.
  • metadata (a dict mapping str to str) – Name-value pairs associated with the blob as metadata.
  • validate_content (bool) – If true, calculates an MD5 hash for each chunk of the blob. The storage service checks the hash of the content that has arrived with the hash that was sent. This is primarily valuable for detecting bitflips on the wire if using http instead of https as https (the default) will already validate. Note that this MD5 hash is not stored with the blob.
  • progress_callback (callback function in format of func(current, total)) – Callback for progress with signature function(current, total) where current is the number of bytes transfered so far, and total is the size of the blob, or None if the total size is unknown.
  • max_connections (int) – Maximum number of parallel connections to use when the blob size exceeds 64MB.
  • lease_id (str) – Required if the blob has an active lease.
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.
  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.
  • if_match (str) – An ETag value, or the wildcard character (*). Specify this header to perform the operation only if the resource’s ETag matches the value specified.
  • if_none_match (str) – An ETag value, or the wildcard character (*). Specify this header to perform the operation only if the resource’s ETag does not match the value specified. Specify the wildcard character (*) to perform the operation only if the resource does not exist, and fail the operation if it does exist.
  • timeout (int) – The timeout parameter is expressed in seconds. This method may make multiple calls to the Azure service and the timeout will apply to each call individually.
Returns:

ETag and last modified properties for the Block Blob

Return type:

ResourceProperties

create_blob_from_path(container_name, blob_name, file_path, content_settings=None, metadata=None, validate_content=False, progress_callback=None, max_connections=2, lease_id=None, if_modified_since=None, if_unmodified_since=None, if_match=None, if_none_match=None, timeout=None)[source]

Creates a new blob from a file path, or updates the content of an existing blob, with automatic chunking and progress notifications.

Parameters:
  • container_name (str) – Name of existing container.
  • blob_name (str) – Name of blob to create or update.
  • file_path (str) – Path of the file to upload as the blob content.
  • content_settings (ContentSettings) – ContentSettings object used to set blob properties.
  • metadata (a dict mapping str to str) – Name-value pairs associated with the blob as metadata.
  • validate_content (bool) – If true, calculates an MD5 hash for each chunk of the blob. The storage service checks the hash of the content that has arrived with the hash that was sent. This is primarily valuable for detecting bitflips on the wire if using http instead of https as https (the default) will already validate. Note that this MD5 hash is not stored with the blob.
  • progress_callback (callback function in format of func(current, total)) – Callback for progress with signature function(current, total) where current is the number of bytes transfered so far, and total is the size of the blob, or None if the total size is unknown.
  • max_connections (int) – Maximum number of parallel connections to use when the blob size exceeds 64MB.
  • lease_id (str) – Required if the blob has an active lease.
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.
  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.
  • if_match (str) – An ETag value, or the wildcard character (*). Specify this header to perform the operation only if the resource’s ETag matches the value specified.
  • if_none_match (str) – An ETag value, or the wildcard character (*). Specify this header to perform the operation only if the resource’s ETag does not match the value specified. Specify the wildcard character (*) to perform the operation only if the resource does not exist, and fail the operation if it does exist.
  • timeout (int) – The timeout parameter is expressed in seconds. This method may make multiple calls to the Azure service and the timeout will apply to each call individually.
Returns:

ETag and last modified properties for the Block Blob

Return type:

ResourceProperties

create_blob_from_stream(container_name, blob_name, stream, count=None, content_settings=None, metadata=None, validate_content=False, progress_callback=None, max_connections=2, lease_id=None, if_modified_since=None, if_unmodified_since=None, if_match=None, if_none_match=None, timeout=None, use_byte_buffer=False)[source]

Creates a new blob from a file/stream, or updates the content of an existing blob, with automatic chunking and progress notifications.

Parameters:
  • container_name (str) – Name of existing container.
  • blob_name (str) – Name of blob to create or update.
  • stream (io.IOBase) – Opened file/stream to upload as the blob content.
  • count (int) – Number of bytes to read from the stream. This is optional, but should be supplied for optimal performance.
  • content_settings (ContentSettings) – ContentSettings object used to set blob properties.
  • metadata (a dict mapping str to str) – Name-value pairs associated with the blob as metadata.
  • validate_content (bool) – If true, calculates an MD5 hash for each chunk of the blob. The storage service checks the hash of the content that has arrived with the hash that was sent. This is primarily valuable for detecting bitflips on the wire if using http instead of https as https (the default) will already validate. Note that this MD5 hash is not stored with the blob.
  • progress_callback (callback function in format of func(current, total)) – Callback for progress with signature function(current, total) where current is the number of bytes transfered so far, and total is the size of the blob, or None if the total size is unknown.
  • max_connections (int) – Maximum number of parallel connections to use when the blob size exceeds 64MB. Note that parallel upload requires the stream to be seekable.
  • lease_id (str) – Required if the blob has an active lease.
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.
  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.
  • if_match (str) – An ETag value, or the wildcard character (*). Specify this header to perform the operation only if the resource’s ETag matches the value specified.
  • if_none_match (str) – An ETag value, or the wildcard character (*). Specify this header to perform the operation only if the resource’s ETag does not match the value specified. Specify the wildcard character (*) to perform the operation only if the resource does not exist, and fail the operation if it does exist.
  • timeout (int) – The timeout parameter is expressed in seconds. This method may make multiple calls to the Azure service and the timeout will apply to each call individually.
  • use_byte_buffer (bool) – If True, this will force usage of the original full block buffering upload path. By default, this value is False and will employ a memory-efficient, streaming upload algorithm under the following conditions: The provided stream is seekable, ‘require_encryption’ is False, and MAX_BLOCK_SIZE >= MIN_LARGE_BLOCK_UPLOAD_THRESHOLD. One should consider the drawbacks of using this approach. In order to achieve memory-efficiency, a IOBase stream or file-like object is segmented into logical blocks using a SubStream wrapper. In order to read the correct data, each SubStream must acquire a lock so that it can safely seek to the right position on the shared, underlying stream. If max_connections > 1, the concurrency will result in a considerable amount of seeking on the underlying stream. For the most common inputs such as a file-like stream object, seeking is an inexpensive operation and this is not much of a concern. However, for other variants of streams this may not be the case. The trade-off for memory-efficiency must be weighed against the cost of seeking with your input stream. The SubStream class will attempt to buffer up to 4 MB internally to reduce the amount of seek and read calls to the underlying stream. This is particularly beneficial when uploading larger blocks.
Returns:

ETag and last modified properties for the Block Blob

Return type:

ResourceProperties

create_blob_from_text(container_name, blob_name, text, encoding='utf-8', content_settings=None, metadata=None, validate_content=False, progress_callback=None, max_connections=2, lease_id=None, if_modified_since=None, if_unmodified_since=None, if_match=None, if_none_match=None, timeout=None)[source]

Creates a new blob from str/unicode, or updates the content of an existing blob, with automatic chunking and progress notifications.

Parameters:
  • container_name (str) – Name of existing container.
  • blob_name (str) – Name of blob to create or update.
  • text (str) – Text to upload to the blob.
  • encoding (str) – Python encoding to use to convert the text to bytes.
  • content_settings (ContentSettings) – ContentSettings object used to set blob properties.
  • metadata (a dict mapping str to str) – Name-value pairs associated with the blob as metadata.
  • validate_content (bool) – If true, calculates an MD5 hash for each chunk of the blob. The storage service checks the hash of the content that has arrived with the hash that was sent. This is primarily valuable for detecting bitflips on the wire if using http instead of https as https (the default) will already validate. Note that this MD5 hash is not stored with the blob.
  • progress_callback (callback function in format of func(current, total)) – Callback for progress with signature function(current, total) where current is the number of bytes transfered so far, and total is the size of the blob, or None if the total size is unknown.
  • max_connections (int) – Maximum number of parallel connections to use when the blob size exceeds 64MB.
  • lease_id (str) – Required if the blob has an active lease.
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.
  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.
  • if_match (str) – An ETag value, or the wildcard character (*). Specify this header to perform the operation only if the resource’s ETag matches the value specified.
  • if_none_match (str) – An ETag value, or the wildcard character (*). Specify this header to perform the operation only if the resource’s ETag does not match the value specified. Specify the wildcard character (*) to perform the operation only if the resource does not exist, and fail the operation if it does exist.
  • timeout (int) – The timeout parameter is expressed in seconds. This method may make multiple calls to the Azure service and the timeout will apply to each call individually.
Returns:

ETag and last modified properties for the Block Blob

Return type:

ResourceProperties

get_block_list(container_name, blob_name, snapshot=None, block_list_type=None, lease_id=None, timeout=None)[source]

Retrieves the list of blocks that have been uploaded as part of a block blob. There are two block lists maintained for a blob:

Committed Block List:
The list of blocks that have been successfully committed to a given blob with Put Block List.
Uncommitted Block List:
The list of blocks that have been uploaded for a blob using Put Block, but that have not yet been committed. These blocks are stored in Azure in association with a blob, but do not yet form part of the blob.
Parameters:
  • container_name (str) – Name of existing container.
  • blob_name (str) – Name of existing blob.
  • snapshot (str) – Datetime to determine the time to retrieve the blocks.
  • block_list_type (str) – Specifies whether to return the list of committed blocks, the list of uncommitted blocks, or both lists together. Valid values are: committed, uncommitted, or all.
  • lease_id (str) – Required if the blob has an active lease.
  • timeout (int) – The timeout parameter is expressed in seconds.
Returns:

list committed and/or uncommitted blocks for Block Blob

Return type:

BlobBlockList

put_block(container_name, blob_name, block, block_id, validate_content=False, lease_id=None, timeout=None)[source]

Creates a new block to be committed as part of a blob.

Parameters:
  • container_name (str) – Name of existing container.
  • blob_name (str) – Name of existing blob.
  • block (io.IOBase or bytes Content of the block.) – Content of the block.
  • block_id (str) – A valid Base64 string value that identifies the block. Prior to encoding, the string must be less than or equal to 64 bytes in size. For a given blob, the length of the value specified for the blockid parameter must be the same size for each block. Note that the Base64 string must be URL-encoded.
  • validate_content (bool) – If true, calculates an MD5 hash of the block content. The storage service checks the hash of the content that has arrived with the hash that was sent. This is primarily valuable for detecting bitflips on the wire if using http instead of https as https (the default) will already validate. Note that this MD5 hash is not stored with the blob.
  • lease_id (str) – Required if the blob has an active lease.
  • timeout (int) – The timeout parameter is expressed in seconds.
put_block_list(container_name, blob_name, block_list, content_settings=None, metadata=None, validate_content=False, lease_id=None, if_modified_since=None, if_unmodified_since=None, if_match=None, if_none_match=None, timeout=None)[source]

Writes a blob by specifying the list of block IDs that make up the blob. In order to be written as part of a blob, a block must have been successfully written to the server in a prior Put Block operation.

You can call Put Block List to update a blob by uploading only those blocks that have changed, then committing the new and existing blocks together. You can do this by specifying whether to commit a block from the committed block list or from the uncommitted block list, or to commit the most recently uploaded version of the block, whichever list it may belong to.

Parameters:
  • container_name (str) – Name of existing container.
  • blob_name (str) – Name of existing blob.
  • block_list (list of BlobBlock) – A list of BlobBlock containing the block ids and block state.
  • content_settings (ContentSettings) – ContentSettings object used to set properties on the blob.
  • metadata (a dict mapping str to str) – Name-value pairs associated with the blob as metadata.
  • validate_content (bool) – If true, calculates an MD5 hash of the block list content. The storage service checks the hash of the block list content that has arrived with the hash that was sent. This is primarily valuable for detecting bitflips on the wire if using http instead of https as https (the default) will already validate. Note that this check is associated with the block list content, and not with the content of the blob itself.
  • lease_id (str) – Required if the blob has an active lease.
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.
  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.
  • if_match (str) – An ETag value, or the wildcard character (*). Specify this header to perform the operation only if the resource’s ETag matches the value specified.
  • if_none_match (str) – An ETag value, or the wildcard character (*). Specify this header to perform the operation only if the resource’s ETag does not match the value specified. Specify the wildcard character (*) to perform the operation only if the resource does not exist, and fail the operation if it does exist.
  • timeout (int) – The timeout parameter is expressed in seconds.
Returns:

ETag and last modified properties for the updated Block Blob

Return type:

ResourceProperties