fasttrips.Util

class fasttrips.Util[source]

Bases: object

Util class.

Collect useful stuff here that doesn’t belong in any particular existing class.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

add_new_id(input_df, id_colname, …[, …]) Passing a pandas.DataFrame input_df with an ID column called id_colname, adds the numeric id as a column named newid_colname and returns it.
add_numeric_column(input_df, id_colname, …) Method to create numerical ID to map to an existing ID.
calculate_distance_miles(dataframe, …) Given a dataframe with columns origin_lat, origin_lon, destination_lat, destination_lon, calculates the distance in miles between origin and destination based on Haversine.
calculate_pathweight_costs(df, result_col) Calculates the weighted cost for a given pandas.DataFrame given a impedance function and value.
datetime64_formatter(x) Formatter to convert numpy.datetime64 to string that looks like HH:MM:SS
datetime64_min_formatter(x) Formatter to convert numpy.datetime64 to minutes after midnight (with two decimal places)
exponential_integration(penalty_min, growth_rate) Returns the integrated value of an exponential function.
get_fast_trips_config() Adds additional nodes to the Partridge graph to support Fast Trip extension files.
get_process_mem_use_bytes() Returns the process memory usage in bytes
get_process_mem_use_str() Returns a string representing the process memory use.
logarithmic_integration(penalty_min, growth_rate) Returns the integrated value of an logarithmic function.
logistic_integration(penalty_minute, …) Returns the integrated value of an logistic function.
merge_two_dicts(x, y) A helper method for Python 2.7 to ‘zip’ two dictionary objects
parse_boolean(val)
parse_minutes_to_time(minutes)
pretty(df) Make a pretty version of the dataframe and return it.
read_end_time(x)
read_time(x[, end_of_day])
remove_null_columns(input_df[, inplace]) Remove columns from the dataframe if they’re all null since they’re not useful and make thinks harder to look at.
timedelta_formatter(x) Formatter to convert numpy.timedelta64 to string that looks like 4m 35.6s
write_dataframe(df, name, output_file[, …]) Convenience method to write a dataframe but make some of the fields more usable.

Attributes

DROP_DEBUG_COLUMNS Debug columns to drop
DROP_PATHFINDING_COLUMNS
TIMEDELTA_COLUMNS_TO_UNITS Maps timedelta columns to units for Util.write_dataframe()
static add_new_id(input_df, id_colname, newid_colname, mapping_df, mapping_id_colname, mapping_newid_colname, warn=False, warn_msg=None, drop_failures=True)[source]

Passing a pandas.DataFrame input_df with an ID column called id_colname, adds the numeric id as a column named newid_colname and returns it.

mapping_df is defines the mapping from an ID (mapping_id_colname) to a numeric ID (mapping_newid_colname).

If warn is True, then don’t worry if some fail. Just log and move on. Otherwise, raise an exception.

static add_numeric_column(input_df, id_colname, numeric_newcolname)[source]

Method to create numerical ID to map to an existing ID. Pass in a dataframe with JUST an ID column and we’ll add a numeric ID column for ya!

Returns the dataframe with the new column.

static calculate_distance_miles(dataframe, origin_lat, origin_lon, destination_lat, destination_lon, distance_colname)[source]

Given a dataframe with columns origin_lat, origin_lon, destination_lat, destination_lon, calculates the distance in miles between origin and destination based on Haversine. Results are added to the dataframe in a column called distance_colname.

static calculate_pathweight_costs(df, result_col)[source]

Calculates the weighted cost for a given pandas.DataFrame given a impedance function and value.

Sets the result into the column called result_col.

column name column type description
var_value float64 The value to weight
growth_type str one of constant, exponential, logarithmic, logistic
growth_log_base float64 logarithmic only log base for logarithmic base value
growth_logistic_max float64 logistic only Maximum assymtotic value for logistic curve
growth_logistic_mid float64 logistic only X-Axis location of the midpoint of the curve
static datetime64_formatter(x)[source]

Formatter to convert numpy.datetime64 to string that looks like HH:MM:SS

static datetime64_min_formatter(x)[source]

Formatter to convert numpy.datetime64 to minutes after midnight (with two decimal places)

static exponential_integration(penalty_min, growth_rate)[source]

Returns the integrated value of an exponential function. Growth Function: (1 + Growth Rate) ** Penalty Minutes Integrated Growth Function: ((1 + Growth Rate) ** Penalty Minutes - 1) / LN(1 + Growth Rate), dx=Penalty Minutes :param penalty_min: float or pandas.Series of floats :param growth_rate: float: Exponetial growth factor :return: float or pandas.Series of floats depending on inputs

static get_fast_trips_config()[source]

Adds additional nodes to the Partridge graph to support Fast Trip extension files.

Returns:Partridge configuration customized for Fast-Trips (_ft) loads and type casting.
static get_process_mem_use_bytes()[source]

Returns the process memory usage in bytes

static get_process_mem_use_str()[source]

Returns a string representing the process memory use. Use SI prefixes (not binary prefixes). 1KB = 1000 bytes

static logarithmic_integration(penalty_min, growth_rate, log_base=2.718281828459045)[source]

Returns the integrated value of an logarithmic function. # Growth Function: Growth Rate * LOG((Penalty Minutes + 1), Log Base) # Integrated Growth Function: Growth Rate * ((Penalty Minutes + 1) * LN(Penalty Minutes + 1) - Penalty Minutes) / LN(Penalty Log Base), dx=Penalty Minutes :param penalty_min: float or pandas.Series of floats :param growth_rate: log growth factor :param log_base: log base to impact shape of curve :return: float or pandas.Series of floats depending on inputs

static logistic_integration(penalty_minute, growth_rate, max_logit, sigmoid_mid)[source]

Returns the integrated value of an logistic function. Growth Function: Max Value / (1 + e^(-Growth Rate*(Penalty Min - Sigmoid Mid))) Integrated Growth Function: Upper Bound Integral - Lower Bound Integral (lower=0) Upper Bound Integral: (Max Logit / Growth Rate) * ln(e^(Growth Rate * Penalty Min) + e^(Growth Rate * Sigmoid Mid)) Lower Bound Integral: (Max Value / Growth Rate) * ln(1 + e^(Growth Rate * Sigmoid Mid)) :param penalty_minute: float or pandas.Series of floats :param growth_rate: log growth factor :param max_logit: assymtotic max value of curve :param sigmoid_mid: x-midpoint of curve :return: float or pandas.Series of floats depending on inputs

static merge_two_dicts(x, y)[source]

A helper method for Python 2.7 to ‘zip’ two dictionary objects

Parameters:
  • x (dict.) – This dictionary will be copied and y appended to the copy.
  • y (dict.) – This dictionary will be appended with a copy of x.
Returns:

dict

static parse_boolean(val)[source]
static parse_minutes_to_time(minutes)[source]
static pretty(df)[source]

Make a pretty version of the dataframe and return it.

static read_end_time(x)[source]
static read_time(x, end_of_day=False)[source]
static remove_null_columns(input_df, inplace=True)[source]

Remove columns from the dataframe if they’re all null since they’re not useful and make thinks harder to look at.

static timedelta_formatter(x)[source]

Formatter to convert numpy.timedelta64 to string that looks like 4m 35.6s

static write_dataframe(df, name, output_file, append=False, keep_duration_columns=False, drop_debug_columns=True, drop_pathfinding_columns=True)[source]

Convenience method to write a dataframe but make some of the fields more usable.

Parameters:
  • df (pandas.DataFrame) – The dataframe to write
  • name (str) – Name of the dataframe. Just used for logging.
  • output_file (str) – The name of the file to which the dataframe will be written
  • append (bool) – Pass true to append to the existing output file, false otherwise
  • keep_duration_columns (bool) – Pass True to keep the original duration columns (e.g. “0 days 00:12:00.000000000”)
  • drop_debug_columns (bool) – Pass True to drop debug columns specified in Util.DROP_DEBUG_COLUMNS
  • drop_pathfinding_columns (bool) – Pass True to drop pathfinding columns specified in Util.DROP_PATHFINDING_COLUMNS

For columns that are numpy.timedelta64 fields, instead of writing “0 days 00:12:00.000000000”, times will be converted to the units specified in Util.TIMEDELTA_COLUMNS_TO_UNITS. The original duration columns will be kept if keep_duration_columns is True.

DROP_DEBUG_COLUMNS = ['A_lat', 'A_lon', 'B_lat', 'B_lon', 'trip_list_id_num', 'trip_id_num', 'A_id_num', 'B_id_num', 'mode_num', 'bump_iter', 'bumpstop_boarded', 'alight_delay_min']

Debug columns to drop

DROP_PATHFINDING_COLUMNS = ['pf_iteration', 'pf_A_time', 'pf_B_time', 'pf_linktime', 'pf_linkcost', 'pf_linkdist', 'pf_waittime', 'pf_linkfare', 'pf_cost', 'pf_fare', 'pf_initcost', 'pf_initfare']
TIMEDELTA_COLUMNS_TO_UNITS = {'new_linktime': 'min', 'new_waittime': 'min', 'pf_linkcost': 'min', 'pf_linktime': 'min', 'pf_waittime': 'min', 'step_duration': 'seconds', 'time enumerating': 'milliseconds', 'time labeling': 'milliseconds'}

Maps timedelta columns to units for Util.write_dataframe()