Tables

class pg_utils.table.table.Table(*args, **kwargs)[source]

This class is used for representing table metadata.

Variables:
  • conn (pg_utils.connection.Connection) – A connection to be used by this table.
  • name (str) – The fully-qualified name of this table.
  • column_names (tuple[str]) – A list of column names for the table, as found in the database.
  • columns (tuple[Column]) – A tuple of Column objects.
  • numeric_columns (tuple[str]) – A list of column names corresponding to the column_names in the table that have some kind of number datatype (int, float8, numeric, etc).
Parameters:
  • table_name (str) – The name of the table in the database. If it’s qualified with a schema, then leave the schema argument alone.
  • schema (None|str) – The name of the schema in which this table lies. If unspecified and the value of table_name doesn’t include a schema, then the (OS-specified) username of the given user is taken to be the schema.
  • conn (None|pg_utils.connection.Connection) – A connection object that’s used to fetch data and metadata. If not specified, a new connection is made with default arguments provided for username, password, etc.
  • columns (str|list[str]|tuple[str]) – An iterable of specified column names. It’s used by the __getitem__ magic method, so you shouldn’t need to fiddle with this.
  • check_existence (bool) – If enabled, an extra check is made to ensure that the table referenced by this object actually exists in the database.
  • debug (bool) – Enable to get some extra logging that’s useful for debugging stuff.
classmethod create(*args, **kwargs)[source]

This is the constructor that’s easiest to use when creating a new table.

Parameters:
  • table_name (str) – As mentioned above.
  • create_stmt (str) – A string of SQL (presumably including a “CREATE TABLE” statement for the corresponding database table) that will be executed before __init__ is run.

Note

The statement drop table if exists schema.table_name; is executed before the SQL in create_stmt is executed.

Parameters:
  • conn (None|pg_utils.connection.Connection) – A Connection object to use for creating the table. If not specified, a new connection will be created with no arguments. Look at the docs for the Connection object for more information.
  • schema (None|str) – A specified schema (optional).
  • args – Other positional arguments to pass to the initializer.
  • kwargs – Other keyword arguments to pass to the initializer.
Returns:

The corresponding Table object after the create_stmt is executed.

classmethod from_table(table, *args, **kwargs)[source]

This class method constructs a table from a given table. Used to give a fresh Table object with different columns, but all other parameters the same as the given table.

If the columns attribute only specifies one column, then a Column object will be returned. :param Table table: The table object from which the output will be created. :param list args: Any positional arguments (if any). :param dict kwargs: Any keyword arguments to pass along (if any). :return: Either a fresh Table or Column, depending on whether the columns parameter is restricted to just a single column. :rtype: Column|Table

count[source]

Returns the number of rows in the corresponding database table.

head(num_rows=10, **read_sql_kwargs)[source]

Returns some of the rows, returning a corresponding Pandas DataFrame.

Parameters:
  • num_rows (int|str) – The number of rows to fetch, or "all" to fetch all of the rows.
  • read_sql_kwargs (dict) – Any other keyword arguments that you’d like to pass into pandas.read_sql (as documented here).
Returns:

The resulting data frame.

Return type:

pandas.core.frame.DataFrame

shape[source]

As in the property of Pandas DataFrames by the same name, this gives a tuple showing the dimensions of the table: (number of rows, number of columns)

dtypes[source]

Mimics the pandas.DataFrame.dtypes property, giving a Series of dtypes (given as strings) corresponding to each column. :return: The Series of dtypes. :rtype: pd.Series

insert(row, columns=None)[source]

Inserts a single tuple into the table.

Parameters:
  • row (list|pandas.Series) – A list or Series of items to insert. If a list, its length must match up with the number of columns that we’ll insert. If it’s a series, the column names must be contained within the index.
  • columns (None|list[str]|tuple[str]) – An iterable of column names to use, that must be contained within this table. If not specified, all of the columns are taken.
Returns:

Returns a boolean indicating success.

Return type:

bool

insert_csv(file_name, columns=None, header=True, sep=', ', null='', size=8192)[source]

A wrapper around the copy_expert method of the psycopg2 cursor class to do a bulk insert into the table.

Parameters:
  • file_name (str) – The name of the CSV file.
  • columns (None|list[str]|tuple[str]) – An iterable of column names to use, that must be contained within this table. If not specified, all of the columns are taken.
  • header (bool) – Indicates whether or not the file has a header.
  • sep (str) – The separator character.
  • null (str) – The string used to indicate null values.
  • size (int) – The size of the buffer that psycopg2.cursor.copy_expert uses.
insert_dataframe(data_frame, encoding='utf8', **csv_kwargs)[source]

Does a bulk insert of a given pandas DataFrame, writing it to a (temp) CSV file, and then importing it.

Parameters:
  • data_frame (pd.DataFrame) – The DataFrame that is to be inserted into this table.
  • encoding (str) – The encoding of the CSV file.
  • csv_kwargs – Other keyword arguments that are passed to the insert_csv method.
sort_values(by, ascending=True, **sql_kwargs)[source]

Mimicks the pandas.DataFrame.sort_values method.

Parameters:
  • by (str|list[str]) – A string or list of strings representing one or more column names by which to sort.
  • ascending (bool|list[bool]) – Whether to sort ascending or descending. This must match the number of columns by which we’re sorting, although if it’s just a single value, it’ll be used for all columns.
  • sql_kwargs (dict) – A dictionary of keyword arguments passed into pandas.read_sql.
Returns:

Values of the sorted DataFrame.

Return type:

pandas.DataFrame

describe(columns=None, percentiles=None, type_='continuous')[source]

Mimics the pandas.DataFrame.describe method, getting basic statistics of each numeric column.

Parameters:
  • columns (None|list[str]) – A list of column names to which the description should be restricted. If not specified, then all numeric columns will be included.
  • percentiles (list[float]|None) – A list of percentiles (given as numbers between 0 and 1) to compute. If not specified, quartiles will be used (ie 0.25, 0.5, 0.75).
  • type (str) – Specifies whether the percentiles are to be taken as discrete or continuous. Must be one of “discrete” or “continuous”.
Returns:

A series representing the statistical description for each column. The format is the same as the output of pandas.DataFrame.describe.

Return type:

pd.DataFrame

pairplot(*args, **kwargs)[source]

Yields a Seaborn pairplot for all of the columns of this table that are of a numeric datatype.

Parameters:kwargs (dict) – Optional keyword arguments to pass into seaborn.pairplot.
Returns:The grid of plots.
numeric_columns[source]

A tuple of names belonging to columns that have a numeric datatype.

numeric_array_columns[source]

A tuple of names belonging to columns that are an array of a numeric datatype (eg int[], double precision[], etc).

column_data_types[source]

A dictionary mapping column names to their corresponding datatypes.

schema

The name of the schema.

table_name

The name of the table (without the schema included).

name

The fully-qualified name of the table.

drop()[source]

Drops the table and deletes this object (by calling del on it).

static exists(*args, **kwargs)[source]

A static method that returns whether or not the given table exists.

Parameters:
  • table_name (str) – The name of the table.
  • schema (None|str) – The name of the schema (or current username if not provided).
  • conn (None|pg_utils.connection.Connection) – A connection to the database. If not provided, a new connection is created with default arguments.
Returns:

Whether or not the table exists.

Return type:

bool