Data Tables (astropy.table
)¶
Introduction¶
astropy.table
provides functionality for storing and manipulating
heterogeneous tables of data in a way that is familiar to numpy
users. A few
notable capabilities of this package are:
Initialize a table from a wide variety of input data structures and types.
Modify a table by adding or removing columns, changing column names, or adding new rows of data.
Handle tables containing missing values.
Include table and column metadata as flexible data structures.
Specify a description, units and output formatting for columns.
Interactively scroll through long tables similar to using
more
.Create a new table by selecting rows or columns from a table.
Perform Table operations like database joins, concatenation, and binning.
Maintain a table index for fast retrieval of table items or ranges.
Manipulate multidimensional columns.
Handle non-native (mixin) column types within table.
Methods for Reading and writing Table objects to files.
Hooks for Subclassing Table and its component classes.
Currently astropy.table
is used when reading an ASCII table using
astropy.io.ascii
. Future releases of Astropy are expected to use
the Table
class for other subpackages such as astropy.io.votable
and astropy.io.fits
.
Getting Started¶
The basic workflow for creating a table, accessing table elements,
and modifying the table is shown below. These examples show a very simple
case, while the full astropy.table
documentation is available from the
Using table section.
First create a simple table with three columns of data named a
, b
,
and c
. These columns have integer, float, and string values respectively:
>>> from astropy.table import Table
>>> a = [1, 4, 5]
>>> b = [2.0, 5.0, 8.2]
>>> c = ['x', 'y', 'z']
>>> t = Table([a, b, c], names=('a', 'b', 'c'), meta={'name': 'first table'})
If you have row-oriented input data such as a list of records, use the rows
keyword. In this example we also explicitly set the data types for each column:
>>> data_rows = [(1, 2.0, 'x'),
... (4, 5.0, 'y'),
... (5, 8.2, 'z')]
>>> t = Table(rows=data_rows, names=('a', 'b', 'c'), meta={'name': 'first table'},
... dtype=('i4', 'f8', 'S1'))
There are a few ways to examine the table. You can get detailed information about the table values and column definitions as follows:
>>> t
<Table length=3>
a b c
int32 float64 str1
----- ------- ----
1 2.0 x
4 5.0 y
5 8.2 z
You can also assign a unit to the columns. If any column has a unit assigned, all units would be shown as follows:
>>> t['b'].unit = 's'
>>> t
<Table length=3>
a b c
s
int32 float64 str1
----- ------- ----
1 2.0 x
4 5.0 y
5 8.2 z
Finally, you can get summary information about the table as follows:
>>> t.info
<Table length=3>
name dtype unit
---- ------- ----
a int32
b float64 s
c str1
A column with a unit works with and can be easily converted to an
Quantity
object (but see Quantity and QTable for
a way to natively use Quantity
objects in tables):
>>> t['b'].quantity
<Quantity [2. , 5. , 8.2] s>
>>> t['b'].to('min')
<Quantity [0.03333333, 0.08333333, 0.13666667] min>
From within the IPython notebook, the table is displayed as a formatted HTML
table (details of how it appears can be changed by altering the
astropy.table.default_notebook_table_class
configuration item):
Or you can get a fancier notebook interface with in-browser search and sort
using show_in_notebook
:
If you print the table (either from the notebook or in a text console session) then a formatted version appears:
>>> print(t)
a b c
s
--- --- ---
1 2.0 x
4 5.0 y
5 8.2 z
If you do not like the format of a particular column, you can change it:
>>> t['b'].info.format = '7.3f'
>>> print(t)
a b c
s
--- ------- ---
1 2.000 x
4 5.000 y
5 8.200 z
For a long table you can scroll up and down through the table one page at time:
>>> t.more()
You can also display it as an HTML-formatted table in the browser:
>>> t.show_in_browser()
or as an interactive (searchable & sortable) javascript table:
>>> t.show_in_browser(jsviewer=True)
Now examine some high-level information about the table:
>>> t.colnames
['a', 'b', 'c']
>>> len(t)
3
>>> t.meta
{'name': 'first table'}
Access the data by column or row using familiar numpy
structured array syntax:
>>> t['a'] # Column 'a'
<Column name='a' dtype='int32' length=3>
1
4
5
>>> t['a'][1] # Row 1 of column 'a'
4
>>> t[1] # Row object for table row index=1
<Row index=1>
a b c
s
int32 float64 str1
----- ------- ----
4 5.000 y
>>> t[1]['a'] # Column 'a' of row 1
4
You can retrieve a subset of a table by rows (using a slice) or columns (using column names), where the subset is returned as a new table:
>>> print(t[0:2]) # Table object with rows 0 and 1
a b c
s
--- ------- ---
1 2.000 x
4 5.000 y
>>> print(t['a', 'c']) # Table with cols 'a', 'c'
a c
--- ---
1 x
4 y
5 z
Modifying table values in place is flexible and works as one would expect:
>>> t['a'][:] = [-1, -2, -3] # Set all column values in place
>>> t['a'][2] = 30 # Set row 2 of column 'a'
>>> t[1] = (8, 9.0, "W") # Set all row values
>>> t[1]['b'] = -9 # Set column 'b' of row 1
>>> t[0:2]['b'] = 100.0 # Set column 'b' of rows 0 and 1
>>> print(t)
a b c
s
--- ------- ---
-1 100.000 x
8 100.000 W
30 8.200 z
Replace, add, remove, and rename columns with the following:
>>> t['b'] = ['a', 'new', 'dtype'] # Replace column b (different from in place)
>>> t['d'] = [1, 2, 3] # Add column d
>>> del t['c'] # Delete column c
>>> t.rename_column('a', 'A') # Rename column a to A
>>> t.colnames
['A', 'b', 'd']
Adding a new row of data to the table is as follows:
>>> t.add_row([-8, -9, 10])
>>> len(t)
4
You can create a table with support for missing values, for example by setting
masked=True
:
>>> t = Table([a, b, c], names=('a', 'b', 'c'), masked=True, dtype=('i4', 'f8', 'S1'))
>>> t['a'].mask = [True, True, False]
>>> t
<Table masked=True length=3>
a b c
int32 float64 str1
----- ------- ----
-- 2.0 x
-- 5.0 y
5 8.2 z
You can include certain object types like Time
,
SkyCoord
or Quantity
in your table.
These “mixin” columns behave like a hybrid of a regular Column
and the native object type (see Mixin columns). For example:
>>> from astropy.time import Time
>>> from astropy.coordinates import SkyCoord
>>> tm = Time(['2000:002', '2002:345'])
>>> sc = SkyCoord([10, 20], [-45, +40], unit='deg')
>>> t = Table([tm, sc], names=['time', 'skycoord'])
>>> t
<Table length=2>
time skycoord
deg,deg
object object
--------------------- ----------
2000:002:00:00:00.000 10.0,-45.0
2002:345:00:00:00.000 20.0,40.0
The QTable
class is a variant of Table
in
which Quantity
are used natively, instead of being
converted to Column
. This means their units get taken into
account in numerical operations, etc. In this class Column
is still used for all unit-less arrays (see Quantity and QTable
for details):
>>> from astropy.table import QTable
>>> import astropy.units as u
>>> t = QTable()
>>> t['dist'] = [1, 2] * u.m
>>> t['velocity'] = [3, 4] * u.m / u.s
>>> t['flag'] = [True, False]
>>> t
<QTable length=2>
dist velocity flag
m m / s
float64 float64 bool
------- -------- -----
1.0 3.0 True
2.0 4.0 False
>>> t.info()
<QTable length=2>
name dtype unit class
-------- ------- ----- --------
dist float64 m Quantity
velocity float64 m / s Quantity
flag bool Column
Note
The only difference between QTable
and
Table
is the behavior when adding a column that has a
specified unit. With QTable
such a column is always
converted to a Quantity
object before being added to the
table. Likewise if a unit is specified for an existing unit-less
Column
in a QTable
, then the column is
converted to Quantity
.
The converse is that if one adds a Quantity
column to an
ordinary Table
then it gets converted to an ordinary
Column
with the corresponding unit
attribute.
Using table
¶
The details of using astropy.table
are provided in the following sections:
Construct table¶
Access table¶
Table operations¶
Indexing¶
I/O with tables¶
Mixin columns¶
Implementation¶
Performance Tips¶
Constructing Table
objects row-by-row using
add_row()
can be very slow:
>>> from astropy.table import Table
>>> t = Table(names=['a', 'b'])
>>> for i in range(100):
... t.add_row((1, 2))
If you do need to loop in your code to create the rows, a much faster approach is to construct a list of rows and then create the Table object at the very end:
>>> rows = []
>>> for i in range(100):
... rows.append((1, 2))
>>> t = Table(rows=rows, names=['a', 'b'])
Writing a Table
with MaskedColumn
to .ecsv
using
write()
can be very slow:
>>> from astropy.table import Table
>>> import numpy as np
>>> x = np.arange(10000, dtype=float)
>>> tm = Table([x], masked=True)
>>> tm.write('tm.ecsv', overwrite=True)
If you want to write .ecsv
using write()
,
then use serialize_method='data_mask'
.
It uses the non-masked version of data and it is faster:
>>> tm.write('tm.ecsv', overwrite=True, serialize_method='data_mask')
Reference/API¶
astropy.table Package¶
Functions¶
|
Stack tables along columns (horizontally) |
|
Perform a join of the left table with the right table on specified keys. |
|
Represent input Table |
|
Take a set difference of table rows. |
|
Returns the unique rows of a table. |
|
Stack tables vertically (along rows) |
Classes¶
|
A basic binary search tree in pure Python, used as an engine for indexing. |
Define a data column for use in a Table object. |
|
|
|
|
Container for meta information like name, description, format. |
Configuration parameters for |
|
alias of |
|
alias of |
|
|
Provides an interactive HTML export of a Table. |
Define a masked data column for use in a Table object. |
|
Mixin column class to allow storage of arbitrary numpy ndarrays within a Table. |
|
|
A class to represent tables of heterogeneous data. |
|
A class to represent one row of a Table object. |
|
Fast tree-based implementation for indexing, using the |
Subclass of dict that is a used in the representation to contain the name (and possible other info) for a mixin attribute (either primary data or an array-like attribute) that is serialized as a column in the table. |
|
|
Implements a sorted array container using a list of numpy arrays. |
Warning class for when a string column is assigned a value that gets truncated because the base (numpy) string length is too short. |
|
|
A class to represent tables of heterogeneous data. |
|
OrderedDict subclass for a set of columns. |
|
|
Warning class for cases when a table column is replaced via the Table.__setitem__ syntax e.g. |