.. include:: references.txt .. |add_index| replace:: :func:`~astropy.table.Table.add_index` .. |index_mode| replace:: :func:`~astropy.table.Table.index_mode` .. _table-indexing: Table indexing ************** Once a |Table| has been created, it is possible to create indexes on one or more columns of the table. An index internally sorts the rows of a table based on the index column(s), allowing for element retrieval by column value and improved performance for certain table operations. Creating an index ================= To create an index on a table, use the |add_index| method:: >>> from astropy.table import Table >>> t = Table([(2, 3, 2, 1), (8, 7, 6, 5)], names=('a', 'b')) >>> t.add_index('a') The optional argument "unique" may be specified to create an index with uniquely valued elements. To create a composite index on multiple columns, pass a list of columns instead:: >>> t.add_index(['a', 'b']) In particular, the first index created using the |add_index| method is considered the default index or the "primary key". To retrieve an index from a table, use the `~astropy.table.Table.indices` property:: >>> t.indices['a'] a rows --- ---- 1 3 2 0 2 2 3 1 >>> t.indices['a', 'b'] a b rows --- --- ---- 1 5 3 2 6 2 2 8 0 3 7 1 Row retrieval using indices =========================== Row retrieval can be accomplished using two table properties: `~astropy.table.Table.loc` and `~astropy.table.Table.iloc`. The `~astropy.table.Table.loc` property can be indexed either by column value, range of column values (*including* the bounds), or a list or ndarray of column values:: >>> t = Table([(1, 2, 3, 4), (10, 1, 9, 9)], names=('a', 'b'), dtype=['i8', 'i8']) >>> t.add_index('a') >>> t.loc[2] a b int64 int64 ----- ----- 2 1 >>> t.loc[[1, 4]] a b int64 int64 ----- ----- 1 10 4 9 >>> t.loc[1:3]
a b int64 int64 ----- ----- 1 10 2 1 3 9 >>> t.loc[:]
a b int64 int64 ----- ----- 1 10 2 1 3 9 4 9 Note that by default, `~astropy.table.Table.loc` uses the primary index, which here is column 'a'. To use a different index, pass the indexed column name before the retrieval data:: >>> t.add_index('b') >>> t.loc['b', 8:10]
a b int64 int64 ----- ----- 3 9 4 9 1 10 The property `~astropy.table.Table.iloc` works similarly, except that the retrieval information must be either an int or a slice, and relates to the sorted order of the index rather than column values. For example:: >>> t.iloc[0] # smallest row by value 'a' a b int64 int64 ----- ----- 1 10 >>> t.iloc['b', 1:] # all but smallest value of 'b'
a b int64 int64 ----- ----- 3 9 4 9 1 10 Effects on performance ====================== Table operations change somewhat when indices are present, and there are a number of factors to consider when deciding whether the use of indices will improve performance. In general, indexing offers the following advantages: * Table grouping and sorting based on indexed column(s) become faster * Retrieving values by index is faster than custom searching There are certain caveats, however: * Creating an index requires time and memory * Table modifications become slower due to automatic index updates * Slicing a table becomes slower due to index relabeling See `here `_ for an IPython notebook profiling various aspects of table indexing. Index modes =========== The |index_mode| method allows for some flexibility in the behavior of table indexing by allowing the user to enter a specific indexing mode via a context manager. There are currently three indexing modes: *freeze*, *copy_on_getitem*, and *discard_on_copy*. The *freeze* mode prevents automatic index updates whenever a column of the index is modified, and all indices refresh themselves after the context ends:: >>> with t.index_mode('freeze'): ... t['a'][0] = 0 ... print(t.indices['a']) # unmodified a rows --- ---- 1 0 2 1 3 2 4 3 >>> print(t.indices['a']) # modified a rows --- ---- 0 0 2 1 3 2 4 3 The *copy_on_getitem* mode forces columns to copy and relabel their indices upon slicing. In the absence of this mode, table slices will preserve indices while column slices will not:: >>> ca = t['a'][[1, 3]] >>> ca.info.indices [] >>> with t.index_mode('copy_on_getitem'): ... ca = t['a'][[1, 3]] ... print(ca.info.indices) [ a rows --- ---- 2 0 4 1] The *discard_on_copy* mode prevents indices from being copied whenever a column or table is copied:: >>> t2 = Table(t) >>> t2.indices['a'] a rows --- ---- 0 0 2 1 3 2 4 3 >>> t2.indices['b'] b rows --- ---- 1 1 9 2 9 3 10 0 >>> with t.index_mode('discard_on_copy'): ... t2 = Table(t) ... print(t2.indices) [] Updating row using indices ========================== Row updates can be accomplished by assigning the table property: `~astropy.table.Table.loc` a complete row or a list of rows:: >>> t = Table([('w', 'x', 'y', 'z'), (10, 1, 9, 9)], names=('a', 'b'), dtype=['str', 'i8']) >>> t.add_index('a') >>> t.loc['x'] a b str1 int64 ---- ----- x 1 >>> t.loc['x'] = ['a', 12] >>> t
a b str1 int64 ---- ----- w 10 a 12 y 9 z 9 >>> t.loc[['w', 'y']]
a b str1 int64 ---- ----- w 10 y 9 >>> t.loc[['w', 'z']] = [['b',23], ['c',56]] >>> t
a b str1 int64 ---- ----- b 23 a 12 y 9 c 56 Retrieving the location of rows using indices ============================================= Retrieval of the location of rows can be accomplished using a table property: `~astropy.table.Table.loc_indices`. The `~astropy.table.Table.loc_indices` property can be indexed either by column value, range of column values (*including* the bounds), or a list or ndarray of column values:: >>> t = Table([('w', 'x', 'y', 'z'), (10, 1, 9, 9)], names=('a', 'b'), dtype=['str', 'i8']) >>> t.add_index('a') >>> t.loc_indices['x'] 1 Engines ======= When creating an index via |add_index|, the keyword argument "engine" may be specified to use a particular indexing engine. The available engines are * `~astropy.table.SortedArray`, a sorted array engine using an underlying sorted Table * `~astropy.table.SCEngine`, a sorted list engine using the `Sorted Containers `_ package * `~astropy.table.FastRBT`, a C-based red-black tree engine * `~astropy.table.FastBST`, a C-based binary search tree engine * `~astropy.table.BST`, a Python-based binary search tree engine Note that FastRBT and FastBST depend on the bintrees dependency; without this dependency, both classes default to `~astropy.table.BST`. The SCEngine depends on the sortedcontainers dependency. Probably the most important takeaway is that `~astropy.table.SortedArray` (the default engine) is usually best, although `~astropy.table.SCEngine` may be more appropriate for an index created on an empty column since adding new values is quicker.