TileDB Python API Reference

Modules

Typical usage of the Python interface to TileDB will use the top-level module tiledb, e.g.

import tiledb

There is also a submodule libtiledb which contains the necessary bindings to the underlying TileDB native library. Most of the time you will not need to interact with tiledb.libtiledb unless you need native-library specific information, e.g. the version number:

import tiledb
tiledb.libtiledb.version()  # Native TileDB library version number

Getting Started

Arrays may be opened with the tiledb.open function:

tiledb.open(uri, mode='r', key=None, attr=None, config=None, timestamp=None, ctx=None)

Open a TileDB array at the given URI

Parameters:
  • uri – any TileDB supported URI

  • timestamp – array timestamp to open, int or None. See the TileDB time traveling documentation for detailed functionality description.

  • key – encryption key, str or None

  • mode (str) – (default ‘r’) Open the array object in read ‘r’, write ‘w’, modify exclusive ‘m’ mode, or delete ‘d’ mode

  • attr – attribute name to select from a multi-attribute array, str or None

  • config – TileDB config dictionary, dict or None

Returns:

open TileDB {Sparse,Dense}Array object

Data import helpers

tiledb.from_numpy(uri, array, config=None, ctx=None, **kwargs)

Write a NumPy array into a TileDB DenseArray, returning a readonly DenseArray instance.

Parameters:
  • uri (str) – URI for the TileDB array (any supported TileDB URI)

  • array (numpy.ndarray) – dense numpy array to persist

  • config – TileDB config dictionary, dict or None

  • ctx (tiledb.Ctx) – A TileDB Context

  • kwargs – additional arguments to pass to the DenseArray constructor

Return type:

tiledb.DenseArray

Returns:

An open DenseArray (read mode) with a single anonymous attribute

Raises:

TypeError – cannot convert uri to unicode string

Raises:

tiledb.TileDBError

Keyword Arguments:
  • full_domain - Dimensions should be created with full range of the dtype (default: False)

  • mode - Creation mode, one of ‘ingest’ (default), ‘schema_only’, ‘append’

  • append_dim - The dimension along which the Numpy array is append (default: 0).

  • start_idx - The starting index to append to. By default, append to the end of the existing data.

  • timestamp - Write TileDB array at specific timestamp.

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     # Creates array 'array' on disk.
...     with tiledb.from_numpy(tmp + "/array",  np.array([1.0, 2.0, 3.0])) as A:
...         pass
tiledb.from_csv(uri: str, csv_file: str | List[str], **kwargs)

Create TileDB array at given URI from a CSV file or list of files

Parameters:
  • uri – URI for new TileDB array

  • csv_file – input CSV file or list of CSV files. Note: multi-file ingestion requires a chunksize argument. Files will be read in batches of at least chunksize rows before writing to the TileDB array.

Keyword Arguments:
  • Any pandas.read_csv supported keyword argument

  • ctx - A TileDB context

  • sparse - (default True) Create sparse schema

  • index_dims (List[str]) – List of column name(s) to use as dimension(s) in TileDB array schema. This is the recommended way to create dimensions. (note: the Pandas read_csv argument index_col will be passed through if provided, which results in indexes that will be converted to dimnesions by default; however index_dims is preferred).

  • allows_duplicates - Generated schema should allow duplicates

  • mode - Creation mode, one of ‘ingest’ (default), ‘schema_only’, ‘append’

  • attr_filters - FilterList to apply to Attributes: FilterList or Dict[str -> FilterList] for any attribute(s). Unspecified attributes will use default.

  • dim_filters - FilterList to apply to Dimensions: FilterList or Dict[str -> FilterList] for any dimensions(s). Unspecified dimensions will use default.

  • offsets_filters - FilterList to apply to all offsets

  • full_domain - Dimensions should be created with full range of the dtype

  • tile - Dimension tiling: accepts either an int that applies the tiling to all dimensions or a dict(“dim_name”: int) to specifically assign tiling to a given dimension

  • row_start_idx - Start index to start new write (for row-indexed ingestions).

  • fillna - Value to use to fill holes

  • column_types - Dictionary of {column_name: dtype} to apply dtypes to columns

  • varlen_types - A set of {dtypes}; any column wihin the set is converted to a variable length attribute

  • capacity - Schema capacity.

  • date_spec - Dictionary of {column_name: format_spec} to apply to date/time columns which are not correctly inferred by pandas ‘parse_dates’. Format must be specified using the Python format codes: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

  • cell_order - (default ‘row-major) Schema cell order: ‘row-major’, ‘col-major’, or ‘hilbert’

  • tile_order - (default ‘row-major) Schema tile order: ‘row-major’ or ‘col-major’

  • timestamp - Write TileDB array at specific timestamp.

Returns:

None

Example:

>>> import tiledb
>>> tiledb.from_csv("iris.tldb", "iris.csv")
>>> tiledb.object_type("iris.tldb")
'array'
tiledb.from_pandas(uri, dataframe, **kwargs)

Create TileDB array at given URI from a Pandas dataframe

Supports most Pandas series types, including nullable integers and bools.

Parameters:
  • uri – URI for new TileDB array

  • dataframe – pandas DataFrame

Keyword Arguments:
  • Any pandas.read_csv supported keyword argument

  • ctx - A TileDB context

  • sparse - (default True) Create sparse schema

  • chunksize - (default None) Maximum number of rows to read at a time. Note that this is also a pandas.read_csv argument

    which tiledb.read_csv checks for in order to correctly read a file batchwise.

  • index_dims (List[str]) – List of column name(s) to use as dimension(s) in TileDB array schema. This is the recommended way to create dimensions.

  • allows_duplicates - Generated schema should allow duplicates

  • mode - Creation mode, one of ‘ingest’ (default), ‘schema_only’, ‘append’

  • attr_filters - FilterList to apply to Attributes: FilterList or Dict[str -> FilterList] for any attribute(s). Unspecified attributes will use default.

  • dim_filters - FilterList to apply to Dimensions: FilterList or Dict[str -> FilterList] for any dimensions(s). Unspecified dimensions will use default.

  • offsets_filters - FilterList to apply to all offsets

  • full_domain - Dimensions should be created with full range of the dtype

  • tile - Dimension tiling: accepts either an int that applies the tiling to all dimensions or a dict(“dim_name”: int) to specifically assign tiling to a given dimension

  • row_start_idx - Start index to start new write (for row-indexed ingestions).

  • fillna - Value to use to fill holes

  • column_types - Dictionary of {column_name: dtype} to apply dtypes to columns

  • varlen_types - A set of {dtypes}; any column wihin the set is converted to a variable length attribute

  • capacity - Schema capacity.

  • date_spec - Dictionary of {column_name: format_spec} to apply to date/time columns which are not correctly inferred by pandas ‘parse_dates’. Format must be specified using the Python format codes: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

  • cell_order - (default ‘row-major) Schema cell order: ‘row-major’, ‘col-major’, or ‘hilbert’

  • tile_order - (default ‘row-major) Schema tile order: ‘row-major’ or ‘col-major’

  • timestamp - Write TileDB array at specific timestamp.

Raises:

tiledb.TileDBError

Returns:

None

Context

class tiledb.Ctx(config: Config | None = None)

Class representing a TileDB context.

A TileDB context wraps a TileDB storage manager.

Parameters:

config (tiledb.Config or dict) – Initialize Ctx with given config parameters

config()

Returns the Config instance associated with the Ctx.

get_stats(print_out: bool = True, json: bool = False)

Retrieves the stats from a TileDB context.

Parameters:
  • print_out – Print string to console (default True), or return as string

  • json – Return stats JSON object (default: False)

set_tag(key: str, value: str)

Sets a (string, string) “tag” on the Ctx (internal).

tiledb.default_ctx(config: Config | dict | None = None) Ctx

Returns, and optionally initializes, the default tiledb.Ctx context variable.

This Ctx object is used by Python API functions when no ctx keyword argument is provided. Most API functions accept an optional ctx kwarg, but that is typically only necessary in advanced usage with multiple contexts per program.

For initialization, this function must be called before any other tiledb functions. The initialization call accepts a tiledb.Config object to override the defaults for process-global parameters.

Parameters:

configtiledb.Config object or dictionary with config parameters.

Returns:

Ctx

Config

class tiledb.Config(params: dict | None = None, path: str | None = None)

TileDB Config class

The Config object stores configuration parameters for both TileDB Embedded and TileDB-Py.

For TileDB Embedded parameters, see:

The following configuration options are supported by TileDB-Py:

  • py.init_buffer_bytes:

    Initial allocation size in bytes for attribute and dimensions buffers. If result size exceed the pre-allocated buffer(s), then the query will return incomplete and TileDB-Py will allocate larger buffers and resubmit. Specifying a sufficiently large buffer size will often improve performance. Default 10 MB (1024**2 * 10).

  • py.use_arrow:

    Use pyarrow from the Apache Arrow project to convert query results into Pandas dataframe format when requested. Default True.

  • py.deduplicate:

    Attempt to deduplicate Python objects during buffer conversion to Python. Deduplication may reduce memory usage for datasets with many identical strings, at the cost of some performance reduction due to hash calculation/lookup for each object.

Unknown parameters will be ignored!

Parameters:
  • params (dict) – Set parameter values from dict like object

  • path (str) – Set parameter values from persisted Config parameter file

clear()

Unsets all Config parameters (returns them to their default values)

dict(prefix: str = '')

Returns a dict representation of a Config object

Parameters:

prefix (str) – return only parameters with a given prefix

Return type:

dict

Returns:

Config parameter / values as a a Python dict

from_file(path: str)

Update a Config object with from a persisted config file

Parameters:

path – A local Config file path

get(self: tiledb.cc.Config, arg0: str) str
items(prefix: str = '')

Returns an iterator object over Config parameters, values

Parameters:

prefix (str) – return only parameters with a given prefix

Return type:

ConfigItems

Returns:

iterator over Config parameter, value tuples

keys(prefix: str = '')

Returns an iterator object over Config parameters (keys)

Parameters:

prefix (str) – return only parameters with a given prefix

Return type:

ConfigKeys

Returns:

iterator over Config parameter string keys

static load(uri: str)

Constructs a Config class instance from config parameters loaded from a local Config file

Parameters:

uri (str) – a local URI config file path

Return type:

tiledb.Config

Returns:

A TileDB Config instance with persisted parameter values

Raises:

TypeErroruri cannot be converted to a unicode string

Raises:

tiledb.TileDBError

save(uri: str)

Persist Config parameter values to a config file

Parameters:

uri (str) – a local URI config file path

Raises:

TypeErroruri cannot be converted to a unicode string

Raises:

tiledb.TileDBError

update(odict: dict)

Update a config object with parameter, values from a dict like object

Parameters:

odict – dict-like object containing parameter, values to update Config.

values(prefix: str = '')

Returns an iterator object over Config values

Parameters:

prefix (str) – return only parameters with a given prefix

Return type:

ConfigValues

Returns:

iterator over Config string values

Array Schema

class tiledb.ArraySchema(domain: Domain | None = None, attrs: Sequence[Attr] = (), cell_order: str = 'row-major', tile_order: str = 'row-major', capacity: int = 0, coords_filters: FilterList | Sequence[Filter] | None = None, offsets_filters: FilterList | Sequence[Filter] | None = None, validity_filters: FilterList | Sequence[Filter] | None = None, allows_duplicates: bool = False, sparse: bool = False, dim_labels={}, enums=None, ctx: Ctx | None = None)

Schema class for TileDB dense / sparse array representations

Parameters:
  • domain – Domain of schema

  • cell_order ('row-major' (default) or 'C', 'col-major' or 'F' or 'hilbert') – TileDB label for cell layout

  • tile_order ('row-major' (default) or 'C', 'col-major' or 'F') – TileDB label for tile layout

  • capacity (int) – tile cell capacity

  • offsets_filters (tiledb.FilterList) – (default None) offsets filter list

  • validity_filters (tiledb.FilterList) – (default None) validity filter list

  • allows_duplicates (bool) – True if duplicates are allowed

  • sparse (bool) – True if schema is sparse, else False (set by SparseArray and DenseArray derived classes)

  • dim_labels – dict(dim_index, dict(dim_name, tiledb.DimSchema))

  • ctx (tiledb.Ctx) – A TileDB Context

Raises:

tiledb.TileDBError

property allows_duplicates: bool

Returns True if the (sparse) array allows duplicates.

attr(key: str | int) Attr

Returns an Attr instance given an int index or string label

Parameters:

key (int or str) – attribute index (positional or associative)

Return type:

tiledb.Attr

Returns:

The ArraySchema attribute at index or with the given name (label)

Raises:

TypeError – invalid key type

property capacity: int

The array capacity

Return type:

int

Raises:

tiledb.TileDBError

property cell_order: str

The cell order layout of the array.

Return type:

str

check() bool

Checks the correctness of the array schema

Return type:

None

Raises:

tiledb.TileDBError if invalid

property coords_filters: FilterList

The FilterList for the array’s coordinates

Return type:

tiledb.FilterList

Raises:

tiledb.TileDBError

property ctx: Ctx

The array schema’s context

Return type:

tiledb.Ctx

dim_label(name: str) DimLabel

Returns a TileDB DimensionLabel given the label name

Parameters:

name – name of the dimensin label

Returns:

The dimension label associated with the given name

property domain: Domain

The Domain associated with the array.

Return type:

tiledb.Domain

Raises:

tiledb.TileDBError

dump()

Dumps a string representation of the array object to standard output (stdout)

classmethod from_file(uri: str | None = None, ctx: Ctx | None = None)

Create an ArraySchema for a Filestore Array from a given file. If a uri is not given, then create a default schema.

has_attr(name: str) bool

Returns true if the given name is an Attribute of the ArraySchema

Parameters:

name – attribute name

Return type:

boolean

has_dim_label(name: str) bool

Returns true if the given name is a DimensionLabel of the ArraySchema

Note: If using an version of libtiledb that does not support dimension labels this will return false.

Parameters:

name – dimension label name

Return type:

boolean

property nattr: int

The number of array attributes.

Return type:

int

Raises:

tiledb.TileDBError

property ndim: int

The number of array domain dimensions.

Return type:

int

property offsets_filters: FilterList

The FilterList for the array’s variable-length attribute offsets

Return type:

tiledb.FilterList

Raises:

tiledb.TileDBError

property shape: Tuple[dtype, dtype]

The array’s shape

Return type:

tuple(numpy scalar, numpy scalar)

Raises:

TypeError – floating point (inexact) domain

property sparse: bool

True if the array is a sparse array representation

Return type:

bool

Raises:

tiledb.TileDBError

property tile_order: str

The tile order layout of the array.

Return type:

str

Raises:

tiledb.TileDBError

property validity_filters: FilterList

The FilterList for the array’s validity

Return type:

tiledb.FilterList

Raises:

tiledb.TileDBError

property version: int

The array’s schema (storage) version.

Return type:

int

:raises tiledb.TileDBError

tiledb.empty_like(uri, arr, config=None, key=None, tile=None, ctx=None, dtype=None)

Create and return an empty, writeable DenseArray with schema based on a NumPy-array like object.

Parameters:
  • uri – array URI

  • arr – NumPy ndarray, or shape tuple

  • config – (optional, deprecated) configuration to apply to new Ctx

  • key – (optional) encryption key, if applicable

  • tile – (optional) tiling of generated array

  • ctx – (optional) TileDB Ctx

  • dtype – (optional) required if arr is a shape tuple

Returns:

Attribute

class tiledb.Attr(name: str = '', dtype: ~numpy.dtype = <class 'numpy.float64'>, fill: ~typing.Any | None = None, var: bool | None = None, nullable: bool = False, filters: ~tiledb.filter.FilterList | ~typing.Sequence[~tiledb.filter.Filter] | None = None, enum_label: str | None = None, ctx: ~tiledb.ctx.Ctx | None = None)

Represents a TileDB attribute.

property dtype: dtype

Return numpy dtype object representing the Attr type

Return type:

numpy.dtype

dump()

Dumps a string representation of the Attr object to standard output (stdout)

property fill: Any

Fill value for unset cells of this attribute

Return type:

depends on dtype

Raises:

tiledb.TileDBERror

property filters: FilterList

FilterList of the TileDB attribute

Return type:

tiledb.FilterList

Raises:

tiledb.TileDBError

property isanon: bool

True if attribute is an anonymous attribute

Return type:

bool

property isascii: bool

True if the attribute is TileDB dtype TILEDB_STRING_ASCII

Return type:

bool

Raises:

tiledb.TileDBError

property isnullable: bool

True if the attribute is nullable

Return type:

bool

Raises:

tiledb.TileDBError

property isvar: bool

True if the attribute is variable length

Return type:

bool

Raises:

tiledb.TileDBError

property name: str

Attribute string name, empty string if the attribute is anonymous

Return type:

str

Raises:

tiledb.TileDBError

property ncells: int

The number of cells (scalar values) for a given attribute value

Return type:

int

Raises:

tiledb.TileDBError

Filters

class tiledb.FilterList(filters: Sequence[Filter] | None = None, chunksize: int | None = None, ctx: Ctx | None = None)

An ordered list of Filter objects for filtering TileDB data.

FilterLists contain zero or more Filters, used for filtering attribute data, the array coordinate data, etc.

Parameters:
  • ctx (tiledb.Ctx) – A TileDB context

  • filters – An iterable of Filter objects to add.

  • chunksize (int) – (default None) chunk size used by the filter list in bytes

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64))
...     # Create several filters
...     gzip_filter = tiledb.GzipFilter()
...     bw_filter = tiledb.BitWidthReductionFilter()
...     # Create a filter list that will first perform bit width reduction, then gzip compression.
...     filters = tiledb.FilterList([bw_filter, gzip_filter])
...     a1 = tiledb.Attr(name="a1", dtype=np.int64, filters=filters)
...     # Create a second attribute filtered only by gzip compression.
...     a2 = tiledb.Attr(name="a2", dtype=np.int64,
...                      filters=tiledb.FilterList([gzip_filter]))
...     schema = tiledb.ArraySchema(domain=dom, attrs=(a1, a2))
...     tiledb.DenseArray.create(tmp + "/array", schema)
__getitem__(idx: int) Filter
__getitem__(idx: slice) List[Filter]

Gets a copy of the filter in the list at the given index

Parameters:

idx (int or slice) – index into the

Returns:

A filter at given index / slice

Raises:

IndexError – invalid index

Raises:

tiledb.TileDBError

__len__() int
Return type:

int

Returns:

Number of filters in the FilterList

append(filter: Filter)
Parameters:

filter (Filter) – the filter to append into the FilterList

Raises:

ValueError – filter argument incorrect type

class tiledb.GzipFilter(level: int = -1, ctx: Ctx | None = None)

Filter that compresses using gzip.

Parameters:
  • ctx (tiledb.Ctx) – TileDB Ctx

  • level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64))
...     a1 = tiledb.Attr(name="a1", dtype=np.int64,
...                      filters=tiledb.FilterList([tiledb.GzipFilter()]))
...     schema = tiledb.ArraySchema(domain=dom, attrs=(a1,))
...     tiledb.DenseArray.create(tmp + "/array", schema)
class tiledb.ZstdFilter(level: int = -1, ctx: Ctx | None = None)

Filter that compresses using zstd.

Parameters:
  • ctx (tiledb.Ctx) – TileDB Ctx

  • level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64))
...     a1 = tiledb.Attr(name="a1", dtype=np.int64,
...                      filters=tiledb.FilterList([tiledb.ZstdFilter()]))
...     schema = tiledb.ArraySchema(domain=dom, attrs=(a1,))
...     tiledb.DenseArray.create(tmp + "/array", schema)
class tiledb.LZ4Filter(level: int = -1, ctx: Ctx | None = None)

Filter that compresses using lz4.

Parameters:
  • ctx (tiledb.Ctx) – TileDB Ctx

  • level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64))
...     a1 = tiledb.Attr(name="a1", dtype=np.int64,
...                      filters=tiledb.FilterList([tiledb.LZ4Filter()]))
...     schema = tiledb.ArraySchema(domain=dom, attrs=(a1,))
...     tiledb.DenseArray.create(tmp + "/array", schema)
class tiledb.Bzip2Filter(level: int = -1, ctx: Ctx | None = None)

Filter that compresses using bzip2.

Parameters:

level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64))
...     a1 = tiledb.Attr(name="a1", dtype=np.int64,
...                      filters=tiledb.FilterList([tiledb.Bzip2Filter()]))
...     schema = tiledb.ArraySchema(domain=dom, attrs=(a1,))
...     tiledb.DenseArray.create(tmp + "/array", schema)
class tiledb.RleFilter(level: int = -1, ctx: Ctx | None = None)

Filter that compresses using run-length encoding (RLE).

Parameters:

level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64))
...     a1 = tiledb.Attr(name="a1", dtype=np.int64,
...                      filters=tiledb.FilterList([tiledb.RleFilter()]))
...     schema = tiledb.ArraySchema(domain=dom, attrs=(a1,))
...     tiledb.DenseArray.create(tmp + "/array", schema)
class tiledb.DoubleDeltaFilter(level: int = -1, reinterp_dtype: dtype | DataType | None = None, ctx: Ctx | None = None)

Filter that performs double-delta encoding.

Parameters:
  • level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.

  • reinterp_dtype – (optional) sets the compressor to compress the data treating

as the new datatype.

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64))
...     a1 = tiledb.Attr(name="a1", dtype=np.int64,
...                      filters=tiledb.FilterList([tiledb.DoubleDeltaFilter()]))
...     schema = tiledb.ArraySchema(domain=dom, attrs=(a1,))
...     tiledb.DenseArray.create(tmp + "/array", schema)
class tiledb.BitShuffleFilter(ctx: Ctx | None = None)

Filter that performs a bit shuffle transformation.

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64))
...     a1 = tiledb.Attr(name="a1", dtype=np.int64,
...                      filters=tiledb.FilterList([tiledb.BitShuffleFilter()]))
...     schema = tiledb.ArraySchema(domain=dom, attrs=(a1,))
...     tiledb.DenseArray.create(tmp + "/array", schema)
class tiledb.ByteShuffleFilter(ctx: Ctx | None = None)

Filter that performs a byte shuffle transformation.

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64))
...     a1 = tiledb.Attr(name="a1", dtype=np.int64,
...                      filters=tiledb.FilterList([tiledb.ByteShuffleFilter()]))
...     schema = tiledb.ArraySchema(domain=dom, attrs=(a1,))
...     tiledb.DenseArray.create(tmp + "/array", schema)
class tiledb.BitWidthReductionFilter(window: int = -1, ctx: Ctx | None = None)

Filter that performs bit-width reduction.

param ctx:

A TileDB Context

type ctx:

tiledb.Ctx

param window:

-1 (default) sets the max window size for the filter to the default window size as specified in TileDB core. Otherwise, sets the compressor level to the given value.

type window:

int

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64))
...     a1 = tiledb.Attr(name="a1", dtype=np.int64,
...                      filters=tiledb.FilterList([tiledb.BitWidthReductionFilter()]))
...     schema = tiledb.ArraySchema(domain=dom, attrs=(a1,))
...     tiledb.DenseArray.create(tmp + "/array", schema)
class tiledb.PositiveDeltaFilter(window: int = -1, ctx: Ctx | None = None)

Filter that performs positive-delta encoding.

Parameters:
  • ctx (tiledb.Ctx) – A TileDB Context

  • window – -1 (default) sets the max window size for the filter to the default window size as specified in TileDB core. Otherwise, sets the compressor level to the given value. :type window: int

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64))
...     a1 = tiledb.Attr(name="a1", dtype=np.int64,
...                      filters=tiledb.FilterList([tiledb.PositiveDeltaFilter()]))
...     schema = tiledb.ArraySchema(domain=dom, attrs=(a1,))
...     tiledb.DenseArray.create(tmp + "/array", schema)

Dimension

class tiledb.Dim(name: str = '__dim_0', domain: ~typing.Tuple[~typing.Any, ~typing.Any] | None = None, tile: ~typing.Any | None = None, filters: ~tiledb.filter.FilterList | ~typing.Sequence[~tiledb.filter.Filter] | None = None, dtype: ~numpy.dtype = <class 'numpy.uint64'>, var: bool | None = None, ctx: ~tiledb.ctx.Ctx | None = None)

Represents a TileDB dimension.

create_label_schema(order: str = 'increasing', dtype: ~numpy.dtype = <class 'numpy.uint64'>, tile: ~typing.Any | None = None, filters: ~tiledb.filter.FilterList | ~typing.Sequence[~tiledb.filter.Filter] | None = None)

Creates a dimension label schema for a dimension label on this dimension

Parameters:
  • order – Order or sort of the label data (‘increasing’ or ‘decreasing’).

  • dtype – Datatype of the label data.

  • tile – Tile extent for the dimension of the dimension label. If None, it will use the tile extent of this dimension.

  • label_filters – Filter list for the attribute storing the label data.

Return type:

DimLabelSchema

property domain: Tuple[generic, generic]

The dimension (inclusive) domain.

The dimension’s domain is defined by a (lower bound, upper bound) tuple.

Return type:

tuple(numpy scalar, numpy scalar)

property dtype: dtype

Numpy dtype representation of the dimension type.

Return type:

numpy.dtype

property filters: FilterList

FilterList of the TileDB dimension

Return type:

tiledb.FilterList

Raises:

tiledb.TileDBError

property isanon: bool

True if the dimension is anonymous

Return type:

bool

property isvar: bool

True if the dimension is variable length

Return type:

bool

Raises:

tiledb.TileDBError

property name: str

The dimension label string.

Anonymous dimensions return a default string representation based on the dimension index.

Return type:

str

property shape: Tuple[generic, generic]

The shape of the dimension given the dimension’s domain.

Note: The shape is only valid for integer and datetime dimension domains.

Return type:

tuple(numpy scalar, numpy scalar)

Raises:

TypeError – floating point (inexact) domain

property size: int

The size of the dimension domain (number of cells along dimension).

Return type:

int

Raises:

TypeError – floating point (inexact) domain

property tile: generic

The tile extent of the dimension.

Return type:

numpy scalar or np.timedelta64

Domain

class tiledb.Domain(*dims: Dim, ctx: Ctx | None = None)

Represents a TileDB domain.

dim(dim_id)

Returns a Dim object from the domain given the dimension’s index or name.

Parameters:

dim_d – dimension index (int) or name (str)

Raises:

tiledb.TileDBError

property dtype

The numpy dtype of the domain’s dimension type.

Return type:

numpy.dtype

dump()

Dumps a string representation of the domain object to standard output (STDOUT)

has_dim(name)

Returns true if the Domain has a Dimension with the given name

Parameters:

name – name of Dimension

Return type:

bool

Returns:

property homogeneous

Returns True if the domain’s dimension types are homogeneous.

property ndim

The number of dimensions of the domain.

Return type:

int

property shape

The domain’s shape, valid only for integer domains.

Return type:

tuple

Raises:

TypeError – floating point (inexact) domain

property size

The domain’s size (number of cells), valid only for integer domains.

Return type:

int

Raises:

TypeError – floating point (inexact) domain

Array

class tiledb.libtiledb.Array(uri, mode='r', key=None, timestamp=None, attr=None, ctx=None)

Base class for TileDB array objects.

Defines common properties/functionality for the different array types. When an Array instance is initialized, the array is opened with the specified mode.

Parameters:
  • uri (str) – URI of array to open

  • mode (str) – (default ‘r’) Open the array object in read ‘r’, write ‘w’, or delete ‘d’ mode

  • key (str) – (default None) If not None, encryption key to decrypt the array

  • timestamp (tuple) – (default None) If int, open the array at a given TileDB timestamp. If tuple, open at the given start and end TileDB timestamps.

  • attr (str) – (default None) open one attribute of the array; indexing a dense array will return a Numpy ndarray directly rather than a dictionary.

  • ctx (Ctx) – TileDB context

attr(self, key)

Returns an Attr instance given an int index or string label

Parameters:

key (int or str) – attribute index (positional or associative)

Return type:

Attr

Returns:

The array attribute at index or with the given name (label)

Raises:

TypeError – invalid key type

close(self)

Closes this array, flushing all buffered data.

consolidate(self, config=None, key=None, fragment_uris=None, timestamp=None)

Consolidates fragments of an array object for increased read performance.

Overview: https://docs.tiledb.com/main/concepts/internal-mechanics/consolidation

Parameters:
  • config (tiledb.Config) – The TileDB Config with consolidation parameters set

  • key (str or bytes) – (default None) encryption key to decrypt an encrypted array

  • fragment_uris – (default None) Consolidate the array using a list of fragment file names

  • timestamp (tuple (int, int)) – (default None) If not None, consolidate the array using the given tuple(int, int) UNIX seconds range (inclusive). This argument will be ignored if fragment_uris is passed.

Raises:

tiledb.TileDBError

Rather than passing the timestamp into this function, it may be set with the config parameters “sm.vacuum.timestamp_start”`and `”sm.vacuum.timestamp_end” which takes in a time in UNIX seconds. If both are set then this function’s timestamp argument will be used.

coords_dtype

Deprecated in 0.8.10

classmethod create(cls, uri, schema, key=None, overwrite=False, ctx=None)

Creates a TileDB Array at the given URI

Parameters:
  • uri (str) – URI at which to create the new empty array.

  • schema (ArraySchema) – Schema for the array

  • key (str) – (default None) Encryption key to use for array

  • overwrite (bool) – (default False) Overwrite the array if it already exists

  • ctx (Ctx) – (default None) Optional TileDB Ctx used when creating the array, by default uses the ArraySchema’s associated context (not necessarily tiledb.default_ctx).

static delete_array(uri, ctx=None)

Delete the given array.

Parameters:
  • uri (str) – The URI of the array

  • ctx (Ctx) – TileDB context

Example:

>>> import tiledb, tempfile, numpy as np
>>> path = tempfile.mkdtemp()
>>> with tiledb.from_numpy(path, np.zeros(4), timestamp=1) as A:
...     pass
>>> tiledb.array_exists(path)
True
>>> tiledb.Array.delete_array(path)
>>> tiledb.array_exists(path)
False
delete_fragments(self, timestamp_start, timestamp_end)

Delete a range of fragments from timestamp_start to timestamp_end. The array needs to be opened in ‘m’ mode as shown in the example below.

Parameters:
  • timestamp_start (int) – the first fragment to delete in the range

  • timestamp_end (int) – the last fragment to delete in the range

Example:

>>> import tiledb, tempfile, numpy as np
>>> path = tempfile.mkdtemp()
>>> with tiledb.from_numpy(path, np.zeros(4), timestamp=1) as A:
...     pass
>>> with tiledb.open(path, 'w', timestamp=2) as A:
...     A[:] = np.ones(4, dtype=np.int64)
>>> with tiledb.open(path, 'r') as A:
...     A[:]
array([1., 1., 1., 1.])
>>> with tiledb.open(path, 'm') as A:
...     A.delete_fragments(2, 2)
>>> with tiledb.open(path, 'r') as A:
...     A[:]
array([0., 0., 0., 0.])
df

Retrieve data cells as a Pandas dataframe, with multi-range, domain-inclusive indexing using multi_index.

Parameters:

selection (list) – Per dimension, a scalar, slice, or list of scalars or slice objects. Scalars and slice components should match the type of the underlying Dimension.

Returns:

dict of {‘attribute’: result}. Coords are included by default for Sparse arrays only (use Array.query(coords=<>) to select).

Raises:

IndexError – invalid or unsupported index selection

Raises:

tiledb.TileDBError

df[] accepts, for each dimension, a scalar, slice, or list of scalars or slice objects. Each item is interpreted as a point (scalar) or range (slice) used to query the array on the corresponding dimension.

** Example **

>>> import tiledb, tempfile, numpy as np, pandas as pd
>>>
>>> with tempfile.TemporaryDirectory() as tmp:
...    data = {'col1_f': np.arange(0.0,1.0,step=0.1), 'col2_int': np.arange(10)}
...    df = pd.DataFrame.from_dict(data)
...    tiledb.from_pandas(tmp, df)
...    A = tiledb.open(tmp)
...    A.df[1]
...    A.df[1:5]
      col1_f  col2_int
   1     0.1         1
      col1_f  col2_int
   1     0.1         1
   2     0.2         2
   3     0.3         3
   4     0.4         4
   5     0.5         5
dim(self, dim_id)

Returns a Dim instance given a dim index or name

Parameters:

key (int or str) – attribute index (positional or associative)

Return type:

Attr

Returns:

The array attribute at index or with the given name (label)

Raises:

TypeError – invalid key type

domain

The Domain of this array.

dtype

The NumPy dtype of the specified attribute

dump(self)
enum(self, name)

Return the Enumeration from the attribute name.

Parameters:

name – attribute name

Return type:

Enumeration

isopen

True if this array is currently open.

iswritable

This array is currently opened as writable.

label_index(self, labels)

Retrieve data cells with multi-range, domain-inclusive indexing by label. Returns the cross-product of the ranges.

Accepts a scalar, slice, or list of scalars per-label for querying on the corresponding dimensions. For multidimensional arrays querying by labels only on a subset of dimensions, : should be passed in-place for any labels preceeding custom ranges.

** Example **

>>> import tiledb, numpy as np
>>>
>>> dim1 = tiledb.Dim("d1", domain=(1, 4))
>>> dim2 = tiledb.Dim("d2", domain=(1, 3))
>>> dom = tiledb.Domain(dim1, dim2)
>>> att = tiledb.Attr("a1", dtype=np.int64)
>>> dim_labels = {
...     0: {"l1": dim1.create_label_schema("decreasing", np.int64)},
...     1: {
...         "l2": dim2.create_label_schema("increasing", np.int64),
...         "l3": dim2.create_label_schema("increasing", np.float64),
...     },
... }
>>> schema = tiledb.ArraySchema(domain=dom, attrs=(att,), dim_labels=dim_labels)
>>> with tempfile.TemporaryDirectory() as tmp:
...     tiledb.Array.create(tmp, schema)
...
...     a1_data = np.reshape(np.arange(1, 13), (4, 3))
...     l1_data = np.arange(4, 0, -1)
...     l2_data = np.arange(-1, 2)
...     l3_data = np.linspace(0, 1.0, 3)
...
...     with tiledb.open(tmp, "w") as A:
...         A[:] = {"a1": a1_data, "l1": l1_data, "l2": l2_data, "l3": l3_data}
...
...     with tiledb.open(tmp, "r") as A:
...         A.label_index(["l1"])[3:4]
...         A.label_index(["l1", "l3"])[2, 0.5:1.0]
...         A.label_index(["l2"])[:, -1:0]
...         A.label_index(["l3"])[:, 0.5:1.0]
OrderedDict([('l1', array([4, 3])), ('a1', array([[1, 2, 3],
       [4, 5, 6]]))])
OrderedDict([('l3', array([0.5, 1. ])), ('l1', array([2])), ('a1', array([[8, 9]]))])
OrderedDict([('l2', array([-1,  0])), ('a1', array([[ 1,  2],
       [ 4,  5],
       [ 7,  8],
       [10, 11]]))])
OrderedDict([('l3', array([0.5, 1. ])), ('a1', array([[ 2,  3],
       [ 5,  6],
       [ 8,  9],
       [11, 12]]))])
Parameters:
  • labels – List of labels to use when querying. Can only use at most one label per dimension.

  • selection (list) – Per dimension, a scalar, slice, or list of scalars. Each item is iterpreted as a point (scalar) or range (slice) used to query the array on the corresponding dimension.

Returns:

dict of {‘label/attribute’: result}.

Raises:

tiledb.TileDBError

static load_typed(uri, mode='r', key=None, timestamp=None, attr=None, ctx=None)

Return a {Dense,Sparse}Array instance from a pre-opened Array (internal)

meta

Return array metadata instance

Return type:

tiledb.Metadata

mode

The mode this array was opened with.

multi_index

Retrieve data cells with multi-range, domain-inclusive indexing. Returns the cross-product of the ranges.

Parameters:

selection (list) – Per dimension, a scalar, slice, or list of scalars or slice objects. Scalars and slice components should match the type of the underlying Dimension.

Returns:

dict of {‘attribute’: result}. Coords are included by default for Sparse arrays only (use Array.query(coords=<>) to select).

Raises:

IndexError – invalid or unsupported index selection

Raises:

tiledb.TileDBError

multi_index[] accepts, for each dimension, a scalar, slice, or list of scalars or slice objects. Each item is interpreted as a point (scalar) or range (slice) used to query the array on the corresponding dimension.

Unlike NumPy array indexing, multi_index respects TileDB’s range semantics: slice ranges are inclusive of the start- and end-point, and negative ranges do not wrap around (because a TileDB dimensions may have a negative domain).

See also: https://docs.tiledb.com/main/api-usage/reading-arrays/multi-range-subarrays

** Example **

>>> import tiledb, tempfile, numpy as np
>>>
>>> with tempfile.TemporaryDirectory() as tmp:
...    A = tiledb.from_numpy(tmp, np.eye(4) * [1,2,3,4])
...    A.multi_index[1]
...    A.multi_index[1,1]
...    # return row 0 and 2
...    A.multi_index[[0,2]]
...    # return rows 0 and 2 intersecting column 2
...    A.multi_index[[0,2], 2]
...    # return rows 0:2 intersecting columns 0:2
...    A.multi_index[slice(0,2), slice(0,2)]
OrderedDict([('', array([[0., 2., 0., 0.]]))])
OrderedDict([('', array([[2.]]))])
OrderedDict([('', array([[1., 0., 0., 0.], [0., 0., 3., 0.]]))])
OrderedDict([('', array([[0.], [3.]]))])
OrderedDict([('', array([[1., 0., 0.], [0., 2., 0.], [0., 0., 3.]]))])
nattr

The number of attributes of this array.

ndim

The number of dimensions of this array.

nonempty_domain(self)

Return the minimum bounding domain which encompasses nonempty values.

Return type:

tuple(tuple(numpy scalar, numpy scalar), …)

Returns:

A list of (inclusive) domain extent tuples, that contain all nonempty cells

reopen(self, timestamp=None)

Reopens this array.

This is useful when the array is updated after it was opened. To sync-up with the updates, the user must either close the array and open again, or just use reopen() without closing. reopen will be generally faster than a close-then-open.

schema

The ArraySchema for this array.

set_query(self, serialized_query)
shape

The shape of this array.

subarray(self, selection, attrs=None, coords=False, order=None)
timestamp

Deprecated in 0.9.2.

Use timestamp_range

timestamp_range

Returns the timestamp range the array is opened at

Return type:

tuple

Returns:

tiledb timestamp range at which point the array was opened

upgrade_version(self, config=None)

Upgrades an array to the latest format version.

Parameters:

config – (default None) Configuration parameters for the upgrade (nullptr means default, which will use the config from ctx).

Raises:

tiledb.TileDBError

uri

Returns the URI of the array

view_attr

The view attribute of this array.

tiledb.consolidate(uri, key=None, config=None, ctx=None, fragment_uris=None, timestamp=None)

Consolidates TileDB array fragments for improved read performance

Parameters:
  • uri (str) – URI to the TileDB Array

  • key (str) – (default None) Key to decrypt array if the array is encrypted

  • config (tiledb.Config) – The TileDB Config with consolidation parameters set

  • ctx (tiledb.Ctx) – (default None) The TileDB Context

  • fragment_uris – (default None) Consolidate the array using a list of fragment file names

  • timestamp – (default None) If not None, consolidate the array using the given tuple(int, int) UNIX seconds range (inclusive). This argument will be ignored if fragment_uris is passed.

Return type:

str or bytes

Returns:

path (URI) to the consolidated TileDB Array

Raises:

TypeError – cannot convert path to unicode string

Raises:

tiledb.TileDBError

Rather than passing the timestamp into this function, it may be set with the config parameters “sm.vacuum.timestamp_start”`and `”sm.vacuum.timestamp_end” which takes in a time in UNIX seconds. If both are set then this function’s timestamp argument will be used.

Example:

>>> import tiledb, tempfile, numpy as np, os
>>> path = tempfile.mkdtemp()
>>> with tiledb.from_numpy(path, np.zeros(4), timestamp=1) as A:
...     pass
>>> with tiledb.open(path, 'w', timestamp=2) as A:
...     A[:] = np.ones(4, dtype=np.int64)
>>> with tiledb.open(path, 'w', timestamp=3) as A:
...     A[:] = np.ones(4, dtype=np.int64)
>>> with tiledb.open(path, 'w', timestamp=4) as A:
...     A[:] = np.ones(4, dtype=np.int64)
>>> len(tiledb.array_fragments(path))
4
>>> fragment_names = [
...     os.path.basename(f) for f in tiledb.array_fragments(path).uri
... ]
>>> array_uri = tiledb.consolidate(
...    path, fragment_uris=[fragment_names[1], fragment_names[3]]
... )
>>> len(tiledb.array_fragments(path))
3
tiledb.vacuum(uri, config=None, ctx=None, timestamp=None)

Vacuum underlying array fragments after consolidation.

Parameters:
  • uri (str) – URI of array to be vacuumed

  • config – Override the context configuration for vacuuming. Defaults to None, inheriting the context parameters.

  • (ctx – tiledb.Ctx, optional): Context. Defaults to tiledb.default_ctx().

Raises:

TypeError – cannot convert uri to unicode string

Raises:

tiledb.TileDBError

This operation of this function is controlled by the “sm.vacuum.mode” parameter, which accepts the values fragments, fragment_meta, and array_meta. Rather than passing the timestamp into this function, it may be set by using “sm.vacuum.timestamp_start”`and `”sm.vacuum.timestamp_end” which takes in a time in UNIX seconds. If both are set then this function’s timestamp argument will be used.

Example:

>>> import tiledb, numpy as np
>>> import tempfile
>>> path = tempfile.mkdtemp()
>>> with tiledb.from_numpy(path, np.random.rand(4)) as A:
...     pass # make sure to close
>>> with tiledb.open(path, 'w') as A:
...     for i in range(4):
...         A[:] = np.ones(4, dtype=np.int64) * i
>>> paths = tiledb.VFS().ls(path)
>>> # should be 12 (2 base files + 2*5 fragment+ok files)
>>> (); len(paths); () 
(...)
>>> () ; tiledb.consolidate(path) ; () 
(...)
>>> tiledb.vacuum(path)
>>> paths = tiledb.VFS().ls(path)
>>> # should now be 4 ( base files + 2 fragment+ok files)
>>> (); len(paths); () 
(...)

Dense Array

tiledb.DenseArray

alias of DenseArrayImpl

Sparse Array

tiledb.SparseArray

alias of SparseArrayImpl

Query Condition

class tiledb.QueryCondition(expression: str, ctx: ~tiledb.ctx.Ctx = <factory>)

Class representing a TileDB query condition object for attribute and dimension (sparse arrays only) filtering pushdown.

A query condition is set with a string representing an expression as defined by the grammar below. A more straight forward example of usage is given beneath.

When querying a sparse array, only the values that satisfy the given condition are returned (coupled with their associated coordinates). An example may be found in examples/query_condition_sparse.py.

For dense arrays, the given shape of the query matches the shape of the output array. Values that DO NOT satisfy the given condition are filled with the TileDB default fill value. Different attribute and dimension types have different default fill values as outlined here (https://docs.tiledb.com/main/background/internal-mechanics/writing#default-fill-values). An example may be found in examples/query_condition_dense.py.

BNF:

A query condition is made up of one or more Boolean expressions. Multiple Boolean expressions are chained together with Boolean operators. The or_op Boolean operators are given lower presedence than and_op.

query_cond ::= bool_term | query_cond or_op bool_term

bool_term ::= bool_expr | bool_term and_op bool_expr

Logical and and bitwise & Boolean operators are given equal precedence.

and_op ::= and | &

Likewise, or and | are given equal precedence.

or_op ::= or | |

We intend to support not in future releases.

A Boolean expression may either be a comparison expression or membership expression.

bool_expr ::= compare_expr | member_expr

A comparison expression contains a comparison operator. The operator works on a TileDB attribute or dimension name (hereby known as a “TileDB variable”) and value.

``compare_expr ::= var compare_op val
val compare_op var
val compare_op var compare_op val``

All comparison operators are supported.

compare_op ::= < | > | <= | >= | == | !=

If an attribute name has special characters in it, you can wrap namehere in attr("namehere").

A membership expression contains the membership operator, in. The operator works on a TileDB variable and list of values.

member_expr ::= var in <list>

TileDB variable names are Python valid variables or a attr() or dim() casted string.

var ::= <variable> | attr(<str>) | dim(<str>)

Values are any Python-valid number or string. datetime64 values should first be cast to UNIX seconds. Values may also be casted with val().

val ::= <num> | <str> | val(val)

Example:

>>> with tiledb.open(uri, mode="r") as A:
>>>     # Select cells where the values for `foo` are less than 5
>>>     # and `bar` equal to string "asdf".
>>>     # Note precedence is equivalent to:
>>>     # tiledb.QueryCondition("foo > 5 or ('asdf' == var('b a r') and baz <= val(1.0))")
>>>     qc = tiledb.QueryCondition("foo > 5 or 'asdf' == var('b a r') and baz <= val(1.0)")
>>>     A.query(cond=qc)
>>>
>>>     # Select cells where the values for `foo` are equal to 1, 2, or 3.
>>>     # Note this is equivalent to:
>>>     # tiledb.QueryCondition("foo == 1 or foo == 2 or foo == 3")
>>>     A.query(cond=tiledb.QueryCondition("foo in [1, 2, 3]"))

Group

class tiledb.Group(uri: str, mode: str = 'r', config: Config | None = None, ctx: Ctx | None = None)

Support for organizing multiple arrays in arbitrary directory hierarchies.

Group members may be any number of nested groups and arrays. Members are stored as tiledb.Objects which indicate the member’s URI and type.

Groups may contain associated metadata similar to array metadata where keys are strings. Singleton values may be of type int, float, str, or bytes. Multiple values of the same type may be placed in containers of type list, tuple, or 1-D np.ndarray. The values within containers are limited to type int or float.

See more at: https://docs.tiledb.com/main/background/key-concepts-and-data-format#arrays-and-groups

Parameters:
  • uri (str) – The URI to the Group

  • mode (str) – Read mode (‘r’), write mode (‘w’), or modify exclusive (‘m’)

  • config (Config or dict) – A TileDB config

  • ctx (tiledb.Ctx) – A TileDB context

Example:

>>> # Create a group
>>> grp_path = "root_group"
>>> tiledb.Group.create(grp_path)
>>> grp = tiledb.Group(grp_path, "w")
>>>
>>> # Create an array and add as a member to the group
>>> array_path = "array.tdb"
>>> domain = tiledb.Domain(tiledb.Dim(domain=(1, 8), tile=2))
>>> a1 = tiledb.Attr("val", dtype="f8")
>>> schema = tiledb.ArraySchema(domain=domain, attrs=(a1,))
>>> tiledb.Array.create(array_path, schema)
>>> grp.add(array_path)
>>>
>>> # Create a group and add as a subgroup
>>> subgrp_path = "sub_group"
>>> tiledb.Group.create(subgrp_path)
>>> grp.add(subgrp_path)
>>>
>>> # Add metadata to the subgroup
>>> grp.meta["ints"] = [1, 2, 3]
>>> grp.meta["str"] = "string_metadata"
>>> grp.close()
>>>
>>> grp.open("r")
>>> # Dump all the members in string format
>>> mbrs_repr = grp
>>> # Or create a list of Objects in the Group
>>> mbrs_iter = list(grp)
>>> # Get the first member's uri and type
>>> member_uri, member_type = grp[0].uri, grp[0].type
>>> grp.close()
>>>
>>> # Remove the subgroup
>>> grp.open("w")
>>> grp.remove(subgrp_path)
>>> grp.close()
>>>
>>> # Delete the subgroup
>>> grp.open("m")
>>> grp.delete(subgrp_path)
>>> grp.close()
__getitem__(member)

Retrieve a member from the Group as an Object.

Parameters:

member (Union[int, str]) – The index or name of the member

Returns:

The member as an Object

Return type:

Object

__delitem__(uri)

Remove a member from the group.

Parameters:

uri (str) – The URI to the member

__contains__(member)
Returns:

Whether the Group contains a member with the given name

Return type:

bool

__len__() int
Return type:

int

Returns:

Number of members in the Group

class GroupMetadata(group: Group)

Holds metadata for the associated Group in a dictionary-like structure.

clear() None.  Remove all items from D.
dump()

Output information about all group metadata to stdout.

pop(k[, d]) v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D
add(uri: str, name: str | None = None, relative: bool = False)

Adds a member to the Group.

Parameters:
  • uri (str) – The URI of the member to add

  • relative (bool) – Whether the path of the URI is a relative path (default=relative: False)

  • name (str) – An optional name for the Group (default=None)

close()

Close a Group.

static consolidate_metadata(uri: str, config: Config | None = None, ctx: Ctx | None = None)

Consolidate the group metadata.

Parameters:
  • uri (str) – The URI of the TileDB group to be consolidated

  • config (Config) – Optional configuration parameters for the consolidation

  • ctx (Ctx) – Optional TileDB context

static create(uri: str, ctx: Ctx | None = None)

Create a new Group.

Parameters:
  • uri (str) – The URI to the to-be created Group

  • ctx (tiledb.Ctx) – A TileDB context

delete(recursive: bool = False)

Delete a Group. The group needs to be opened in ‘m’ mode.

Parameters:

uri – The URI of the group to delete

is_relative(name: str) bool
Parameters:

name (str) – Name of member to retrieve associated relative indicator

Returns:

Whether the attribute is relative

Return type:

bool

property isopen: bool
Returns:

Whether or not the Group is open

Return type:

bool

property meta: GroupMetadata
Returns:

The Group’s metadata as a key-value structure

Return type:

GroupMetadata

property mode: str
Returns:

Read mode (‘r’), write mode (‘w’), or modify exclusive (‘m’)

Return type:

str

open(mode: str = 'r')

Open a Group in read mode (“r”) or write mode (“w”).

Parameters:

mode (str) – Read mode (‘r’) or write mode (‘w’)

remove(member: str)

Remove a member from the Group.

Parameters:

member (str) – The URI or name of the member

set_config(cfg: Config)
Parameters:

cfg (Config) – Config to set on the Group

property uri: str
Returns:

URI of the Group

Return type:

str

static vacuum_metadata(uri: str, config: Config | None = None, ctx: Ctx | None = None)

Vacuum the group metadata.

Parameters:
  • uri (str) – The URI of the TileDB group to be vacuum

  • config (Config) – Optional configuration parameters for the vacuuming

  • ctx (Ctx) – Optional TileDB context

class tiledb.Group.GroupMetadata(group: Group)

Holds metadata for the associated Group in a dictionary-like structure.

__setitem__(key, value)
Parameters:
  • key (str) – Key for the Group metadata entry

  • value (Union[int, float, str, bytes, np.ndarray]) – Value for the Group metadata entry

__getitem__(key)
Parameters:

key (str) – Key of the Group metadata entry

Return type:

Union[int, float, str, bytes, np.ndarray]

Returns:

The value associated with the key

__delitem__(key)

Removes the entry from the Group metadata.

Parameters:

key (str) – Key of the Group metadata entry

__contains__(key)
Parameters:

key (str) – Key of the Group metadata entry

Return type:

bool

Returns:

True if the key is in the Group metadata, otherwise False

__len__() int
Return type:

int

Returns:

Number of entries in the Group metadata

clear() None.  Remove all items from D.
dump()

Output information about all group metadata to stdout.

pop(k[, d]) v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D

Object

class tiledb.Object(type: ObjectType, uri: str, name: str | None = None)

Represents a TileDB object which may be of type Array, Group, or Invalid.

uri
Returns:

URI of the Object.

Return type:

str

type
Returns:

Valid TileDB object types are Array and Group.

Return type:

type

Object Management

tiledb.array_exists(uri, isdense=False, issparse=False)

Check if arrays exists and is open-able at the given URI

Optionally restrict to isdense or issparse array types.

tiledb.group_create(uri: str, ctx: Ctx | None = None)

Create a new Group.

Parameters:
  • uri (str) – The URI to the to-be created Group

  • ctx (tiledb.Ctx) – A TileDB context

tiledb.object_type(uri, ctx=None)

Returns the TileDB object type at the specified path (URI)

Parameters:
  • path (str) – path (URI) of the TileDB resource

  • ctx (tiledb.Ctx) – The TileDB Context

Return type:

str

Returns:

object type string

Raises:

TypeError – cannot convert path to unicode string

tiledb.remove(uri, ctx=None)

Removes (deletes) the TileDB object at the specified path (URI)

Parameters:
  • uri (str) – URI of the TileDB resource

  • ctx (tiledb.Ctx) – The TileDB Context

Raises:

TypeError – uri cannot be converted to a unicode string

Raises:

tiledb.TileDBError

tiledb.move(old_uri, new_uri, ctx=None)

Moves a TileDB resource (group, array, key-value).

Parameters:
  • ctx (tiledb.Ctx) – The TileDB Context

  • old_uri (str) – path (URI) of the TileDB resource to move

  • new_uri (str) – path (URI) of the destination

Raises:

TypeError – uri cannot be converted to a unicode string

Raises:

TileDBError

tiledb.ls(path, func, ctx=None)

Lists TileDB resources and applies a callback that have a prefix of path (one level deep).

Parameters:
  • path (str) – URI of TileDB group object

  • func (function) – callback to execute on every listed TileDB resource, URI resource path and object type label are passed as arguments to the callback

  • ctx (tiledb.Ctx) – TileDB context

Raises:

TypeError – cannot convert path to unicode string

Raises:

tiledb.TileDBError

tiledb.walk(path, func, order='preorder', ctx=None)

Recursively visits TileDB resources and applies a callback to resources that have a prefix of path

Parameters:
  • path (str) – URI of TileDB group object

  • func (function) – callback to execute on every listed TileDB resource, URI resource path and object type label are passed as arguments to the callback

  • ctx (tiledb.Ctx) – The TileDB context

  • order (str) – ‘preorder’ (default) or ‘postorder’ tree traversal

Raises:
Raises:

tiledb.TileDBError

Fragment Info

class tiledb.FragmentInfoList(array_uri, include_mbrs=False, ctx=None)

Class representing an ordered list of FragmentInfo objects.

Parameters:
  • array_uri (str) – URI for the TileDB array (any supported TileDB URI)

  • include_mbrs (bool) – (default False) include minimum bounding rectangles in FragmentInfo result

  • ctx (tiledb.Ctx) – A TileDB context

Variables:
  • uri – URIs of fragments

  • version – Fragment version of each fragment

  • nonempty_domain – Non-empty domain of each fragment

  • cell_num – Number of cells in each fragment

  • timestamp_range – Timestamp range of when each fragment was written

  • sparse – For each fragment, True if fragment is sparse, else False

  • has_consolidated_metadata – For each fragment, True if fragment has consolidated fragment metadata, else False

  • unconsolidated_metadata_num – Number of unconsolidated metadata fragments in each fragment

  • to_vacuum – URIs of already consolidated fragments to vacuum

  • mbrs – (TileDB Embedded 2.5.0+ only) The mimimum bounding rectangle of each fragment; only present when include_mbrs=True

  • array_schema_name – (TileDB Embedded 2.5.0+ only) The array schema’s name

Example:

>>> import tiledb, numpy as np, tempfile
>>> with tempfile.TemporaryDirectory() as tmp:
...     # The array will be 4x4 with dimensions "rows" and "cols", with domain [1,4] and space tiles 2x2
...     dom = tiledb.Domain(
...         tiledb.Dim(name="rows", domain=(1, 4), tile=2, dtype=np.int32),
...         tiledb.Dim(name="cols", domain=(1, 4), tile=2, dtype=np.int32),
...     )
...     # The array will be dense with a single attribute "a" so each (i,j) cell can store an integer.
...     schema = tiledb.ArraySchema(
...         domain=dom, sparse=False, attrs=[tiledb.Attr(name="a", dtype=np.int32)]
...     )
...     # Set URI of the array
...     uri = tmp + "/array"
...     # Create the (empty) array on disk.
...     tiledb.Array.create(uri, schema)
...
...     # Write three fragments to the array
...     with tiledb.DenseArray(uri, mode="w") as A:
...         A[1:3, 1:5] = np.array(([[1, 2, 3, 4], [5, 6, 7, 8]]))
...     with tiledb.DenseArray(uri, mode="w") as A:
...         A[2:4, 2:4] = np.array(([101, 102], [103, 104]))
...     with tiledb.DenseArray(uri, mode="w") as A:
...         A[3:4, 4:5] = np.array(([202]))
...
...     # tiledb.array_fragments() requires TileDB-Py version > 0.8.5
...     fragments_info = tiledb.array_fragments(uri)
...
...     "====== FRAGMENTS  INFO ======"
...     f"number of fragments: {len(fragments_info)}"
...     f"nonempty domains: {fragments_info.nonempty_domain}"
...     f"sparse fragments: {fragments_info.sparse}"
...
...     for fragment in fragments_info:
...         f"===== FRAGMENT NUMBER {fragment.num} ====="
...         f"is sparse: {fragment.sparse}"
...         f"cell num: {fragment.cell_num}"
...         f"has consolidated metadata: {fragment.has_consolidated_metadata}"
...         f"nonempty domain: {fragment.nonempty_domain}"
'====== FRAGMENTS  INFO ======'
'number of fragments: 3'
'nonempty domains: (((1, 2), (1, 4)), ((2, 3), (2, 3)), ((3, 3), (4, 4)))'
'sparse fragments: (False, False, False)'
'===== FRAGMENT NUMBER 0 ====='
'is sparse: False'
'cell num: 8'
'has consolidated metadata: False'
'nonempty domain: ((1, 2), (1, 4))'
'===== FRAGMENT NUMBER 1 ====='
'is sparse: False'
'cell num: 16'
'has consolidated metadata: False'
'nonempty domain: ((2, 3), (2, 3))'
'===== FRAGMENT NUMBER 2 ====='
'is sparse: False'
'cell num: 4'
'has consolidated metadata: False'
'nonempty domain: ((3, 3), (4, 4))'
class tiledb.FragmentInfo(fragments: FragmentInfoList, num)

Class representing the metadata for a single fragment. See tiledb.FragmentInfoList for example of usage.

Variables:
  • uri – URIs of fragments

  • version – Fragment version of each fragment

  • nonempty_domain – Non-empty domain of each fragment

  • cell_num – Number of cells in each fragment

  • timestamp_range – Timestamp range of when each fragment was written

  • sparse – For each fragment, True if fragment is sparse, else False

  • has_consolidated_metadata – For each fragment, True if fragment has consolidated fragment metadata, else False

  • unconsolidated_metadata_num – Number of unconsolidated metadata fragments in each fragment

  • to_vacuum – URIs of already consolidated fragments to vacuum

  • mbrs – (TileDB Embedded 2.5.0+ only) The mimimum bounding rectangle of each fragment; only present when include_mbrs=True

  • array_schema_name – (TileDB Embedded 2.5.0+ only) The array schema’s name

Exceptions

exception tiledb.TileDBError

VFS

class tiledb.VFS(config: Config | dict | None = None, ctx: Ctx | None = None)

TileDB VFS class

Encapsulates the TileDB VFS module instance with a specific configuration (config).

Parameters:
  • ctx (tiledb.Ctx) – The TileDB Context

  • config (tiledb.Config or dict) – Override ctx VFS configurations with updated values in config.

close(file: FileHandle)

Closes a VFS FileHandle object.

Parameters:

file (FileIO) – An opened VFS FileIO

Return type:

FileIO

Returns:

closed VFS FileHandle

Raises:

tiledb.TileDBError

config() Config
Return type:

tiledb.Config

Returns:

config associated with the VFS object

copy_dir(old_uri: str | bytes | PathLike, new_uri: str | bytes | PathLike)

Copies a TileDB directory from an old URI to a new URI.

Parameters:
  • old_uri (str) – Input of the old directory URI

  • new_uri (str) – Input of the new directory URI

copy_file(old_uri: str | bytes | PathLike, new_uri: str | bytes | PathLike)

Copies a TileDB file from an old URI to a new URI.

Parameters:
  • old_uri (str) – Input of the old file URI

  • new_uri (str) – Input of the new file URI

create_bucket(uri: str | bytes | PathLike)

Creates an object store bucket with the input URI.

Parameters:

uri (str) – Input URI of the bucket

create_dir(uri: str | bytes | PathLike)

Check if an object store bucket is empty.

Parameters:

uri (str) – Input URI of the bucket

ctx() Ctx
Return type:

tiledb.Ctx

Returns:

context associated with the VFS object

dir_size(uri: str | bytes | PathLike) int
Parameters:

uri (str) – Input URI of the directory

Return type:

int

Returns:

The size of a directory with the input URI

empty_bucket(uri: str | bytes | PathLike)

Empty an object store bucket.

Parameters:

uri (str) – Input URI of the bucket

file_size(uri: str | bytes | PathLike) int
Parameters:

uri (str) – Input URI of the file

Return type:

int

Returns:

The size of a file with the input URI

is_bucket(uri: str | bytes | PathLike) bool
Parameters:

uri (str) – Input URI of the bucket

Return type:

bool

Returns:

True if an object store bucket with the input URI exists, False otherwise

is_dir(uri: str | bytes | PathLike) bool
Parameters:

uri (str) – Input URI of the directory

Return type:

bool

Returns:

True if a directory with the input URI exists, False otherwise

is_empty_bucket(uri: str | bytes | PathLike) bool
Parameters:

uri (str) – Input URI of the bucket

Return type:

bool

Returns:

True if an object store bucket is empty, False otherwise

is_file(uri: str | bytes | PathLike) bool
Parameters:

uri (str) – Input URI of the file

Return type:

bool

Returns:

True if a file with the input URI exists, False otherwise

isdir(uri: str | bytes | PathLike) bool
Parameters:

uri (str) – Input URI of the directory

Return type:

bool

Returns:

True if a directory with the input URI exists, False otherwise

isfile(uri: str | bytes | PathLike) bool
Parameters:

uri (str) – Input URI of the file

Return type:

bool

Returns:

True if a file with the input URI exists, False otherwise

ls(uri: str | bytes | PathLike) List[str]

Retrieves the children in directory uri. This function is non-recursive, i.e., it focuses in one level below uri.

Parameters:

uri (str) – Input URI of the directory

Return type:

List[str]

Returns:

The children in directory uri

move_dir(old_uri: str | bytes | PathLike, new_uri: str | bytes | PathLike)

Renames a TileDB directory from an old URI to a new URI.

Parameters:
  • old_uri (str) – Input of the old directory URI

  • new_uri (str) – Input of the new directory URI

move_file(old_uri: str | bytes | PathLike, new_uri: str | bytes | PathLike)

Renames a TileDB file from an old URI to a new URI.

Parameters:
  • old_uri (str) – Input of the old file URI

  • new_uri (str) – Input of the new file URI

open(uri: str | bytes | PathLike, mode: str = 'rb')

Opens a VFS file resource for reading / writing / appends at URI.

If the file did not exist upon opening, a new file is created.

Parameters:
  • uri (str) – URI of VFS file resource

  • str (mode) – ‘rb’ for opening the file to read, ‘wb’ to write, ‘ab’ to append

Return type:

FileHandle

Returns:

TileDB FileIO

Raises:
Raises:

tiledb.TileDBError

read(file: FileHandle, offset: int, nbytes: int) bytes

Read nbytes from an opened VFS FileHandle at a given offset.

Parameters:
  • file (FileHandle) – An opened VFS FileHandle in ‘r’ mode

  • offset (int) – offset position in bytes to read from

  • nbytes (int) – number of bytes to read

Return type:

bytes()

Returns:

read bytes

Raises:

tiledb.TileDBError

remove_bucket(uri: str | bytes | PathLike)

Deletes an object store bucket with the input URI.

Parameters:

uri (str) – Input URI of the bucket

remove_dir(uri: str | bytes | PathLike)

Removes a directory (recursively) with the input URI.

Parameters:

uri (str) – Input URI of the directory

remove_file(uri: str | bytes | PathLike)

Removes a file with the input URI.

Parameters:

uri (str) – Input URI of the file

size(uri: str | bytes | PathLike) int
Parameters:

uri (str) – Input URI of the file

Return type:

int

Returns:

The size of a file with the input URI

supports(scheme: str) bool

Returns true if the given URI scheme (storage backend) is supported.

Parameters:

scheme (str) – scheme component of a VFS resource URI (ex. ‘file’ / ‘hdfs’ / ‘s3’)

Return type:

bool

Returns:

True if the linked libtiledb version supports the storage backend, False otherwise

Raises:

ValueError – VFS storage backend is not supported

touch(uri: str | bytes | PathLike)

Touches a file with the input URI, i.e., creates a new empty file.

Parameters:

uri (str) – Input URI of the file

write(file: FileHandle, buff: str | bytes)

Writes buffer to opened VFS FileHandle.

Parameters:
  • file (FileHandle) – An opened VFS FileHandle in ‘w’ mode

  • buff – a Python object that supports the byte buffer protocol

Raises:

TypeError – cannot convert buff to bytes

Raises:

tiledb.TileDBError

class tiledb.FileIO(vfs: VFS, uri: str | bytes | PathLike, mode: str = 'rb')

TileDB FileIO class that encapsulates files opened by tiledb.VFS. The file operations are meant to mimic Python’s built-in file I/O methods.

__len__()
Return type:

int

Returns:

Number of bytes in file

property closed: bool
Return type:

bool

Returns:

True if the file is closed, otherwise False

flush()

Force the data to be written to the file.

property mode: str
Return type:

str

Returns:

Whether the file is in read mode (“rb”), write mode (“wb”), or append mode (“ab”)

read(size: int = -1) bytes

Read the file from the current pointer position.

Parameters:

size (int) – Number of bytes to read. By default, size is set to -1

which will read until the end of the file. :rtype: bytes :return: The bytes in the file

readable() bool
Return type:

bool

Returns:

True if the file is readable (ie. “rb” mode), otherwise False

readinto(buff: ndarray) int

Read bytes into a pre-allocated, writable bytes-like object b, and return the number of bytes read.

Parameters:

np.ndarray (buff) – A pre-allocated, writable bytes-like object

seek(offset: int, whence: int = 0)
Parameters:
  • offset (int) – Byte position to set the file pointer

  • whence (int) – Reference point. A whence value of 0 measures from the

beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. whence can be omitted and defaults to 0.

seekable()

All tiledb.FileIO objects are seekable.

Return type:

bool

Returns:

True

tell() int
Return type:

int

Returns:

The current position in the file represented as number of bytes

writable() bool
Return type:

bool

Returns:

True if the file is writable (ie. “wb” or “ab” mode), otherwise False

write(buff: bytes)
Parameters:

buff (bytes) – Write the given bytes to the file

Filestore

class tiledb.Filestore(uri: str, ctx: Ctx | None = None)

Functions to set and get data to and from a TileDB Filestore Array.

A Filestore Array may be created using ArraySchema.from_file combined with Array.create.

Parameters:
  • uri (str) – The URI to the TileDB Fileshare Array

  • ctx (tiledb.Ctx) – A TileDB context

__len__() int
Return type:

int

Returns:

Bytes in the Filestore Array

static copy_from(filestore_array_uri: str, file_uri: str, mime_type: str = 'AUTODETECT', ctx: Ctx | None = None) None

Copy data from a file to a Filestore Array.

Parameters:
  • filestore_array_uri (str) – The URI to the TileDB Fileshare Array

  • file_uri (str) – URI of file to export

  • mime_type (str) – MIME types are “AUTODETECT” (default), “image/tiff”, “application/pdf”

  • ctx (tiledb.Ctx) – A TileDB context

static copy_to(filestore_array_uri: str, file_uri: str, ctx: Ctx | None = None) None

Copy data from a Filestore Array to a file.

Parameters:
  • filestore_array_uri (str) – The URI to the TileDB Fileshare Array

  • file_uri (str) – The URI to the TileDB Fileshare Array

  • ctx (tiledb.Ctx) – A TileDB context

read(offset: int = 0, size: int = -1) bytes
Parameters:
  • offset (int) – Byte position to begin reading. Defaults to beginning of filestore.

  • size (int) – Total number of bytes to read. Defaults to -1 which reads the entire filestore.

Return type:

bytes

Returns:

Data from the Filestore Array

write(buffer: ByteString, mime_type: str = 'AUTODETECT') None

Import data from an object that supports the buffer protocol to a Filestore Array.

Parameters:
  • ByteString (buffer) – Data of type bytes, bytearray, memoryview, etc.

  • mime_type (str) – MIME types are “AUTODETECT” (default), “image/tiff”, “application/pdf”

Version

tiledb.libtiledb.version()

Return the version of the linked libtiledb shared library

Return type:

tuple

Returns:

Semver version (major, minor, rev)

Statistics

tiledb.stats_enable()

Enable TileDB internal statistics.

tiledb.stats_disable()

Disable TileDB internal statistics.

tiledb.stats_reset()

Reset all TileDB internal statistics to 0.

tiledb.stats_dump(version=True, print_out=True, include_python=True, json=False, verbose=True)

Return TileDB internal statistics as a string.

Parameters:
  • include_python – Include TileDB-Py statistics

  • print_out – Print string to console (default True), or return as string

  • version – Include TileDB Embedded and TileDB-Py versions (default: True)

  • json – Return stats JSON object (default: False)

  • verbose – Print extended internal statistics (default: True)