fasttrips.Util¶
-
class
fasttrips.
Util
[source]¶ Bases:
object
Util class.
Collect useful stuff here that doesn’t belong in any particular existing class.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
Methods
add_new_id
(input_df, id_colname, …[, …])Passing a pandas.DataFrame
input_df with an ID column called id_colname, adds the numeric id as a column named newid_colname and returns it.add_numeric_column
(input_df, id_colname, …)Method to create numerical ID to map to an existing ID. calculate_distance_miles
(dataframe, …)Given a dataframe with columns origin_lat, origin_lon, destination_lat, destination_lon, calculates the distance in miles between origin and destination based on Haversine. calculate_pathweight_costs
(df, result_col)Calculates the weighted cost for a given pandas.DataFrame
given a impedance function and value.datetime64_formatter
(x)Formatter to convert numpy.datetime64
to string that looks like HH:MM:SSdatetime64_min_formatter
(x)Formatter to convert numpy.datetime64
to minutes after midnight (with two decimal places)exponential_integration
(penalty_min, growth_rate)Returns the integrated value of an exponential function. get_fast_trips_config
()Adds additional nodes to the Partridge graph to support Fast Trip extension files. get_process_mem_use_bytes
()Returns the process memory usage in bytes get_process_mem_use_str
()Returns a string representing the process memory use. logarithmic_integration
(penalty_min, growth_rate)Returns the integrated value of an logarithmic function. logistic_integration
(penalty_minute, …)Returns the integrated value of an logistic function. merge_two_dicts
(x, y)A helper method for Python 2.7 to ‘zip’ two dictionary objects parse_boolean
(val)parse_minutes_to_time
(minutes)pretty
(df)Make a pretty version of the dataframe and return it. read_end_time
(x)read_time
(x[, end_of_day])remove_null_columns
(input_df[, inplace])Remove columns from the dataframe if they’re all null since they’re not useful and make thinks harder to look at. timedelta_formatter
(x)Formatter to convert numpy.timedelta64
to string that looks like 4m 35.6swrite_dataframe
(df, name, output_file[, …])Convenience method to write a dataframe but make some of the fields more usable. Attributes
DROP_DEBUG_COLUMNS
Debug columns to drop DROP_PATHFINDING_COLUMNS
TIMEDELTA_COLUMNS_TO_UNITS
Maps timedelta columns to units for Util.write_dataframe()
-
static
add_new_id
(input_df, id_colname, newid_colname, mapping_df, mapping_id_colname, mapping_newid_colname, warn=False, warn_msg=None, drop_failures=True)[source]¶ Passing a
pandas.DataFrame
input_df with an ID column called id_colname, adds the numeric id as a column named newid_colname and returns it.mapping_df is defines the mapping from an ID (mapping_id_colname) to a numeric ID (mapping_newid_colname).
If warn is True, then don’t worry if some fail. Just log and move on. Otherwise, raise an exception.
-
static
add_numeric_column
(input_df, id_colname, numeric_newcolname)[source]¶ Method to create numerical ID to map to an existing ID. Pass in a dataframe with JUST an ID column and we’ll add a numeric ID column for ya!
Returns the dataframe with the new column.
-
static
calculate_distance_miles
(dataframe, origin_lat, origin_lon, destination_lat, destination_lon, distance_colname)[source]¶ Given a dataframe with columns origin_lat, origin_lon, destination_lat, destination_lon, calculates the distance in miles between origin and destination based on Haversine. Results are added to the dataframe in a column called distance_colname.
-
static
calculate_pathweight_costs
(df, result_col)[source]¶ Calculates the weighted cost for a given
pandas.DataFrame
given a impedance function and value.Sets the result into the column called result_col.
column name column type description var_value
float64 The value to weight growth_type
str one of constant
,exponential
,logarithmic
,logistic
growth_log_base
float64 logarithmic only log base for logarithmic base value growth_logistic_max
float64 logistic only Maximum assymtotic value for logistic curve growth_logistic_mid
float64 logistic only X-Axis location of the midpoint of the curve
-
static
datetime64_formatter
(x)[source]¶ Formatter to convert
numpy.datetime64
to string that looks like HH:MM:SS
-
static
datetime64_min_formatter
(x)[source]¶ Formatter to convert
numpy.datetime64
to minutes after midnight (with two decimal places)
-
static
exponential_integration
(penalty_min, growth_rate)[source]¶ Returns the integrated value of an exponential function. Growth Function: (1 + Growth Rate) ** Penalty Minutes Integrated Growth Function: ((1 + Growth Rate) ** Penalty Minutes - 1) / LN(1 + Growth Rate), dx=Penalty Minutes :param penalty_min: float or
pandas.Series
of floats :param growth_rate: float: Exponetial growth factor :return: float orpandas.Series
of floats depending on inputs
-
static
get_fast_trips_config
()[source]¶ Adds additional nodes to the Partridge graph to support Fast Trip extension files.
Returns: Partridge configuration customized for Fast-Trips (_ft) loads and type casting.
-
static
get_process_mem_use_str
()[source]¶ Returns a string representing the process memory use. Use SI prefixes (not binary prefixes). 1KB = 1000 bytes
-
static
logarithmic_integration
(penalty_min, growth_rate, log_base=2.718281828459045)[source]¶ Returns the integrated value of an logarithmic function. # Growth Function: Growth Rate * LOG((Penalty Minutes + 1), Log Base) # Integrated Growth Function: Growth Rate * ((Penalty Minutes + 1) * LN(Penalty Minutes + 1) - Penalty Minutes) / LN(Penalty Log Base), dx=Penalty Minutes :param penalty_min: float or
pandas.Series
of floats :param growth_rate: log growth factor :param log_base: log base to impact shape of curve :return: float orpandas.Series
of floats depending on inputs
-
static
logistic_integration
(penalty_minute, growth_rate, max_logit, sigmoid_mid)[source]¶ Returns the integrated value of an logistic function. Growth Function: Max Value / (1 + e^(-Growth Rate*(Penalty Min - Sigmoid Mid))) Integrated Growth Function: Upper Bound Integral - Lower Bound Integral (lower=0) Upper Bound Integral: (Max Logit / Growth Rate) * ln(e^(Growth Rate * Penalty Min) + e^(Growth Rate * Sigmoid Mid)) Lower Bound Integral: (Max Value / Growth Rate) * ln(1 + e^(Growth Rate * Sigmoid Mid)) :param penalty_minute: float or
pandas.Series
of floats :param growth_rate: log growth factor :param max_logit: assymtotic max value of curve :param sigmoid_mid: x-midpoint of curve :return: float orpandas.Series
of floats depending on inputs
-
static
merge_two_dicts
(x, y)[source]¶ A helper method for Python 2.7 to ‘zip’ two dictionary objects
Parameters: - x (dict.) – This dictionary will be copied and y appended to the copy.
- y (dict.) – This dictionary will be appended with a copy of x.
Returns: dict
-
static
remove_null_columns
(input_df, inplace=True)[source]¶ Remove columns from the dataframe if they’re all null since they’re not useful and make thinks harder to look at.
-
static
timedelta_formatter
(x)[source]¶ Formatter to convert
numpy.timedelta64
to string that looks like 4m 35.6s
-
static
write_dataframe
(df, name, output_file, append=False, keep_duration_columns=False, drop_debug_columns=True, drop_pathfinding_columns=True)[source]¶ Convenience method to write a dataframe but make some of the fields more usable.
Parameters: - df (
pandas.DataFrame
) – The dataframe to write - name (str) – Name of the dataframe. Just used for logging.
- output_file (str) – The name of the file to which the dataframe will be written
- append (bool) – Pass true to append to the existing output file, false otherwise
- keep_duration_columns (bool) – Pass True to keep the original duration columns (e.g. “0 days 00:12:00.000000000”)
- drop_debug_columns (bool) – Pass True to drop debug columns specified in
Util.DROP_DEBUG_COLUMNS
- drop_pathfinding_columns (bool) – Pass True to drop pathfinding columns specified in
Util.DROP_PATHFINDING_COLUMNS
For columns that are
numpy.timedelta64
fields, instead of writing “0 days 00:12:00.000000000”, times will be converted to the units specified inUtil.TIMEDELTA_COLUMNS_TO_UNITS
. The original duration columns will be kept if keep_duration_columns is True.- df (
-
DROP_DEBUG_COLUMNS
= ['A_lat', 'A_lon', 'B_lat', 'B_lon', 'trip_list_id_num', 'trip_id_num', 'A_id_num', 'B_id_num', 'mode_num', 'bump_iter', 'bumpstop_boarded', 'alight_delay_min']¶ Debug columns to drop
-
DROP_PATHFINDING_COLUMNS
= ['pf_iteration', 'pf_A_time', 'pf_B_time', 'pf_linktime', 'pf_linkcost', 'pf_linkdist', 'pf_waittime', 'pf_linkfare', 'pf_cost', 'pf_fare', 'pf_initcost', 'pf_initfare']¶
-
TIMEDELTA_COLUMNS_TO_UNITS
= {'new_linktime': 'min', 'new_waittime': 'min', 'pf_linkcost': 'min', 'pf_linktime': 'min', 'pf_waittime': 'min', 'step_duration': 'seconds', 'time enumerating': 'milliseconds', 'time labeling': 'milliseconds'}¶ Maps timedelta columns to units for
Util.write_dataframe()
-