Welcome to astrodbkit’s documentation!¶
This documentation describes a toolkit of classes, methods, and functions useful for CRUD operations and analysis of data from an SQL database.
To install astrodbkit, do:
pip install astrodbkit
Alternatively, you can update your existing installation with:
pip install --upgrade astrodbkit
Creating a Database¶
To create a database from scratch, do:
from astrodbkit import astrodb dbpath = '/desired/path/to/my_new_database.db' astrodb.create_database(dbpath)
For access to the full dataset, an email request must be made to a BDNYC group admin.
Accessing the Database¶
To start using the database, launch Python, import the module, then initialize the database with the
astrodb.Database() class like so:
from astrodbkit import astrodb db = astrodb.Database(dbpath)
You are now ready to use the database.
The path to the database can either be the binary database file (typically ending in .db) or an .sql file that contains the schema for the database.
In the later case, the database is constructed by referring to the schema and the directory where the tables of data are located (by default, ‘tabledata’) and a .db file is created.
You can use the
save() method to output a database to a schema file and the individual tables to a specified directory.
This workflow can better facilitate version control.
Querying the Database¶
It can be daunting to start using the database without a lot of prior knowledge of the contents of the database or the functionality of astrodbkit. For this purpose we have created two utility methods:
Will list the names of the files that have been loaded as well as the contents of the database: every table and the number of sources in it.
Will give a brief overview of what
astrodb.Database() is and summarizes the more widely used methods. More details on each method can be obtained by using Python’s help system, for example:
Now that you have the database loaded, you’ll want to get some information out of it. There are a variety of ways to extract information with astrodbkit.
The schema for any table can be quickly examined with the
You can see an inventory of all data for a specific source by passing an integer id to the
data = db.inventory(86)
This will retrieve the data across all tables with the specified source_id for visual inspection. Setting fetch=True will return the data as a dictionary of Astropy tables so that table and column keys can be used to access the results. For example:
will return a table of the band, magnitude and uncertainty for all records in the sources table with that source_id.
You can search any table in the database with the
identify() method by supplying a string,
integer, or (ra,dec) coordinates along with the table to search. For example, if I want to find all the records in the SOURCES table in the HR 8799 system:
Or all the papers published by Joe Filippazzo:
When supplying coordinates, you can also specify the search radius to use, in degrees:
db.search((338.673, 40.694), 'sources', radius=5)
references() method can be used to search for all entries that match the publication record. For example:
You can also pass SQL queries wrapped in double-quotes (“) to the
data = db.query( "SQL_query_goes_here" )
For example, you can get an Astropy table of all the records with a spectral type of L2 with:
db.query("SELECT * FROM spectral_types WHERE spectral_type=12", fmt='table')
By default, this returns the data as a list of arrays. Alternative options for the fmt flag include ‘dict’ for a list of Python dictionaries, ‘table’ for an Astropy Table, and ‘pandas’ for a pandas DataFrame.
For more general SQL commands beyond SELECT and PRAGMA, you can use the
db.modify("UPDATE spectra SET wavelength_units='A' WHERE id=4")
There are two main ways to add data to a database with astrodbkit: by passing a properly formatted ascii file or by passing the data directly in a list of lists.
To add data from a file, you want to create a file with the following format:
Each entry should be its own row, with the first row denoting the columns to be populated. Note that the column names in the ascii file need not be in the same order as the table. Also, only the column names that match will be added and non-matching or missing column names will be ignored. Assuming this file is called data.txt in the working directory, we can add this new data to the SOURCES table with:
db.add_data('data.txt', 'sources', delim='|')
To add the same data without creating the file, you would do the following:
data = [['ra', 'dec', 'publication_id'],[123, -34, 5]] db.add_data(data, 'sources')
query() method provides an option to export your query to a file:
By default, this will save the results to an ascii file.
VOTables are another way to store data in a format that can be read by other programs, such as TOPCAT.
votools module can generate a VOTable from your SQL query. This has been integrated to be called directly
query() method when the filename ends in .xml or .vot. For example:
from astrodbkit import astrodb db = astrodb.Database('/path/to/database') txt = 'SELECT s.id, s.ra, s.dec, s.shortname, p.source_id, p.band, p.magnitude FROM sources as s ' \ 'JOIN photometry as p ON s.id=p.source_id WHERE s.dec<=-10 AND (p.band IN ("J","H","Ks","W1"))' data = db.query(txt, export='votable.xml')
You can import and call votools directly, which has additional options you can set.
Special characters (such as accents or greek letters) can cause astropy and thus file output to fail in Python 2. Python 3 handles this differently and will not fail in this instance.
Saving the Full Database¶
If changes have been made to the database, such as by adding new data or modifying existing entries, you will want to
save() method to dump the contents of the database to ascii files.
save() writes a schema file and outputs all tables to individual files in a directory of your choice (by default, ‘tabledata’).
This directory and the schema file can be version controlled with, for example, git, to facilitate tracking changes in the database.
When finished working with the database, the
close() method will close the connection.
close() will prompt to save the database to the default directory.