fasttrips.Util¶

class fasttrips.Util[source]¶

Bases: object

Util class.

Collect useful stuff here that doesn’t belong in any particular existing class.

__init__()¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`add_new_id`(input_df, id_colname, …[, …])	Passing a `pandas.DataFrame` input_df with an ID column called id_colname, adds the numeric id as a column named newid_colname and returns it.
`add_numeric_column`(input_df, id_colname, …)	Method to create numerical ID to map to an existing ID.
`calculate_distance_miles`(dataframe, …)	Given a dataframe with columns origin_lat, origin_lon, destination_lat, destination_lon, calculates the distance in miles between origin and destination based on Haversine.
`calculate_pathweight_costs`(df, result_col)	Calculates the weighted cost for a given `pandas.DataFrame` given a impedance function and value.
`datetime64_formatter`(x)	Formatter to convert `numpy.datetime64` to string that looks like HH:MM:SS
`datetime64_min_formatter`(x)	Formatter to convert `numpy.datetime64` to minutes after midnight (with two decimal places)
`exponential_integration`(penalty_min, growth_rate)	Returns the integrated value of an exponential function.
`get_fast_trips_config`()	Adds additional nodes to the Partridge graph to support Fast Trip extension files.
`get_process_mem_use_bytes`()	Returns the process memory usage in bytes
`get_process_mem_use_str`()	Returns a string representing the process memory use.
`logarithmic_integration`(penalty_min, growth_rate)	Returns the integrated value of an logarithmic function.
`logistic_integration`(penalty_minute, …)	Returns the integrated value of an logistic function.
`merge_two_dicts`(x, y)	A helper method for Python 2.7 to ‘zip’ two dictionary objects
`parse_boolean`(val)
`parse_minutes_to_time`(minutes)
`pretty`(df)	Make a pretty version of the dataframe and return it.
`read_end_time`(x)
`read_time`(x[, end_of_day])
`remove_null_columns`(input_df[, inplace])	Remove columns from the dataframe if they’re all null since they’re not useful and make thinks harder to look at.
`timedelta_formatter`(x)	Formatter to convert `numpy.timedelta64` to string that looks like 4m 35.6s
`write_dataframe`(df, name, output_file[, …])	Convenience method to write a dataframe but make some of the fields more usable.

Attributes

`DROP_DEBUG_COLUMNS`	Debug columns to drop
`DROP_PATHFINDING_COLUMNS`
`TIMEDELTA_COLUMNS_TO_UNITS`	Maps timedelta columns to units for `Util.write_dataframe()`

static add_new_id(input_df, id_colname, newid_colname, mapping_df, mapping_id_colname, mapping_newid_colname, warn=False, warn_msg=None, drop_failures=True)[source]¶

Passing a pandas.DataFrame input_df with an ID column called id_colname, adds the numeric id as a column named newid_colname and returns it.

mapping_df is defines the mapping from an ID (mapping_id_colname) to a numeric ID (mapping_newid_colname).

If warn is True, then don’t worry if some fail. Just log and move on. Otherwise, raise an exception.

static add_numeric_column(input_df, id_colname, numeric_newcolname)[source]¶

Method to create numerical ID to map to an existing ID. Pass in a dataframe with JUST an ID column and we’ll add a numeric ID column for ya!

Returns the dataframe with the new column.

static calculate_distance_miles(dataframe, origin_lat, origin_lon, destination_lat, destination_lon, distance_colname)[source]¶: Given a dataframe with columns origin_lat, origin_lon, destination_lat, destination_lon, calculates the distance in miles between origin and destination based on Haversine. Results are added to the dataframe in a column called distance_colname.

static calculate_pathweight_costs(df, result_col)[source]¶

Calculates the weighted cost for a given pandas.DataFrame given a impedance function and value.

Sets the result into the column called result_col.

column name	column type	description
`var_value`	float64	The value to weight
`growth_type`	str	one of `constant`, `exponential`, `logarithmic`, `logistic`
`growth_log_base`	float64	logarithmic only log base for logarithmic base value
`growth_logistic_max`	float64	logistic only Maximum assymtotic value for logistic curve
`growth_logistic_mid`	float64	logistic only X-Axis location of the midpoint of the curve

static datetime64_formatter(x)[source]¶: Formatter to convert numpy.datetime64 to string that looks like HH:MM:SS

static datetime64_min_formatter(x)[source]¶: Formatter to convert numpy.datetime64 to minutes after midnight (with two decimal places)

static exponential_integration(penalty_min, growth_rate)[source]¶: Returns the integrated value of an exponential function. Growth Function: (1 + Growth Rate) ** Penalty Minutes Integrated Growth Function: ((1 + Growth Rate) ** Penalty Minutes - 1) / LN(1 + Growth Rate), dx=Penalty Minutes :param penalty_min: float or pandas.Series of floats :param growth_rate: float: Exponetial growth factor :return: float or pandas.Series of floats depending on inputs

static get_fast_trips_config()[source]¶

Adds additional nodes to the Partridge graph to support Fast Trip extension files.

Returns:	Partridge configuration customized for Fast-Trips (_ft) loads and type casting.

static get_process_mem_use_bytes()[source]¶: Returns the process memory usage in bytes

static get_process_mem_use_str()[source]¶: Returns a string representing the process memory use. Use SI prefixes (not binary prefixes). 1KB = 1000 bytes

static logarithmic_integration(penalty_min, growth_rate, log_base=2.718281828459045)[source]¶: Returns the integrated value of an logarithmic function. # Growth Function: Growth Rate * LOG((Penalty Minutes + 1), Log Base) # Integrated Growth Function: Growth Rate * ((Penalty Minutes + 1) * LN(Penalty Minutes + 1) - Penalty Minutes) / LN(Penalty Log Base), dx=Penalty Minutes :param penalty_min: float or pandas.Series of floats :param growth_rate: log growth factor :param log_base: log base to impact shape of curve :return: float or pandas.Series of floats depending on inputs

static logistic_integration(penalty_minute, growth_rate, max_logit, sigmoid_mid)[source]¶: Returns the integrated value of an logistic function. Growth Function: Max Value / (1 + e^(-Growth Rate*(Penalty Min - Sigmoid Mid))) Integrated Growth Function: Upper Bound Integral - Lower Bound Integral (lower=0) Upper Bound Integral: (Max Logit / Growth Rate) * ln(e^(Growth Rate * Penalty Min) + e^(Growth Rate * Sigmoid Mid)) Lower Bound Integral: (Max Value / Growth Rate) * ln(1 + e^(Growth Rate * Sigmoid Mid)) :param penalty_minute: float or pandas.Series of floats :param growth_rate: log growth factor :param max_logit: assymtotic max value of curve :param sigmoid_mid: x-midpoint of curve :return: float or pandas.Series of floats depending on inputs

static merge_two_dicts(x, y)[source]¶

A helper method for Python 2.7 to ‘zip’ two dictionary objects

Parameters:	x (dict.) – This dictionary will be copied and y appended to the copy. y (dict.) – This dictionary will be appended with a copy of x.
Returns:	dict

static parse_boolean(val)[source]¶

static parse_minutes_to_time(minutes)[source]¶

static pretty(df)[source]¶: Make a pretty version of the dataframe and return it.

static read_end_time(x)[source]¶

static read_time(x, end_of_day=False)[source]¶

static remove_null_columns(input_df, inplace=True)[source]¶: Remove columns from the dataframe if they’re all null since they’re not useful and make thinks harder to look at.

static timedelta_formatter(x)[source]¶: Formatter to convert numpy.timedelta64 to string that looks like 4m 35.6s

static write_dataframe(df, name, output_file, append=False, keep_duration_columns=False, drop_debug_columns=True, drop_pathfinding_columns=True)[source]¶

Convenience method to write a dataframe but make some of the fields more usable.

Parameters:

df (pandas.DataFrame) – The dataframe to write
name (str) – Name of the dataframe. Just used for logging.
output_file (str) – The name of the file to which the dataframe will be written
append (bool) – Pass true to append to the existing output file, false otherwise
keep_duration_columns (bool) – Pass True to keep the original duration columns (e.g. “0 days 00:12:00.000000000”)
drop_debug_columns (bool) – Pass True to drop debug columns specified in Util.DROP_DEBUG_COLUMNS
drop_pathfinding_columns (bool) – Pass True to drop pathfinding columns specified in Util.DROP_PATHFINDING_COLUMNS

For columns that are numpy.timedelta64 fields, instead of writing “0 days 00:12:00.000000000”, times will be converted to the units specified in Util.TIMEDELTA_COLUMNS_TO_UNITS. The original duration columns will be kept if keep_duration_columns is True.

DROP_DEBUG_COLUMNS = ['A_lat', 'A_lon', 'B_lat', 'B_lon', 'trip_list_id_num', 'trip_id_num', 'A_id_num', 'B_id_num', 'mode_num', 'bump_iter', 'bumpstop_boarded', 'alight_delay_min']¶: Debug columns to drop

DROP_PATHFINDING_COLUMNS = ['pf_iteration', 'pf_A_time', 'pf_B_time', 'pf_linktime', 'pf_linkcost', 'pf_linkdist', 'pf_waittime', 'pf_linkfare', 'pf_cost', 'pf_fare', 'pf_initcost', 'pf_initfare']¶

TIMEDELTA_COLUMNS_TO_UNITS = {'new_linktime': 'min', 'new_waittime': 'min', 'pf_linkcost': 'min', 'pf_linktime': 'min', 'pf_waittime': 'min', 'step_duration': 'seconds', 'time enumerating': 'milliseconds', 'time labeling': 'milliseconds'}¶: Maps timedelta columns to units for Util.write_dataframe()