Columns

class pg_utils.column.base.Column(name, parent_table)[source]

In Pandas, a column of a DataFrame is represented as a Series.

Similarly, a column in a database table is represented by an object from this class.

Note that the Series represented by these columns have the default index (ie non-negative, consecutive integers starting at zero). Thus, for the portion of the Pandas Series API mocked here, we need not worry about multilevel (hierarchical) indices.

Parameters:
  • name (str) – The name of the column. Required.
  • parent_table (pg_utils.table.Table) – The table to which this column belongs. Required.
select_all_query()[source]

Provides the SQL used when selecting everything from this column.

Returns:The SQL statement.
Return type:str
sort_values(ascending=True, limit=None, **sql_kwargs)[source]

Mimics the method pandas.Series.sort_values.

Parameters:
  • limit (int|None) – Either a positive integer for the number of rows to take or None to take all.
  • ascending (bool) – Sort ascending vs descending.
  • sql_kwargs (dict) – A dictionary of keyword arguments passed into pandas.read_sql.
Returns:

The resulting series.

Return type:

pandas.Series

unique()[source]

Returns an array of unique values in this column. Includes null (represented as None). :return: The unique values. :rtype: np.array

head(num_rows=10)[source]

Fetches some values of this column.

Parameters:num_rows (int|str) – Either a positive integer number of values or the string “all” to fetch all values
Returns:A NumPy array of the values
Return type:np.array
is_unique[source]

Determines whether or not the values of this column are all unique (ie whether this column is a unique identifier for the table). :return: Whether or not this column contains unique values. :rtype: bool

dtype[source]

The dtype of this column (represented as a string).

Returns:The dtype.
Return type:str
describe(percentiles=None, type_='continuous')[source]

This mocks the method pandas.Series.describe, and provides a series with the same data (just calculated by the database).

Parameters:
  • percentiles (None|list[float]) – A list of percentiles to evaluate (with numbers between 0 and 1). If not specified, quartiles (0.25, 0.5, 0.75) are used.
  • type (str) – Specifies whether the percentiles are to be taken as discrete or continuous. Must be one of “discrete” or “continuous”.
Returns:

A series returning the description of the column, in the same format as pandas.Series.describe.

Return type:

pandas.Series

distplot(*args, **kwargs)[source]

Produces a distplot. See the seaborn docs on distplot for more information.

Note that this requires Seaborn in order to function.

Parameters:
  • bins (int|None) – The number of bins to use. If unspecified, the Freedman-Diaconis rule will be used to determine the number of bins.
  • kwargs (dict) – A dictionary of options to pass on to seaborn.distplot.
values[source]

Mocks the method pandas.Series.values, returning a simple NumPy array consisting of the values of this column.

Returns:The NumPy array containing the values.
Return type:np.array
mean[source]

Mocks the pandas.Series.mean method to give the mean of the values in this column. :return: The mean. :rtype: float

max[source]

Mocks the pandas.Series.max method to give the maximum of the values in this column. :return: The maximum. :rtype: float

min[source]

Mocks the pandas.Series.min method to give the maximum of the values in this column. :return: The minimum. :rtype: float

size[source]

Mocks the pandas.Series.size property to give a count of the values in this column. :return: The count. :rtype: int