Utilities SubPackage

This subpackage collects various utilities

transitionMatrix.utils.converters module

Converter utilities to help switch between various formats

transitionMatrix.utils.converters.datetime_to_float(dataframe, time_column='Time', format=None)[source]

datetime_to_float() converts dates from string format to the canonical float format

Parameters:

time_column – the column label of the observation times
dataframe – Pandas dataframe with dates in string format

Returns:

Pandas dataframe with dates in float format

Return type:

object

Note

The date string must be recognizable by the pandas to_datetime function.

transitionMatrix.utils.converters.frame_to_array(dataframe)[source]: Convert pandas to numpy array :param dataframe: :return:

transitionMatrix.utils.converters.to_canonical(dataframe)[source]

to_canonical() converts a dataframe that is in compact form into a canonical form

Parameters:: dataframe
Returns:: dataframe

transitionMatrix.utils.converters.to_compact(dataframe)[source]

to_compact() converts a dataframe that is in canonical form into a compact form

Parameters:: dataframe
Returns:: dataframe

transitionMatrix.utils.preprocessing module

module transitionMatrix.utils - helper classes and functions

transitionMatrix.utils.preprocessing.bin_timestamps(sorted_data, cohorts, output_format=0, remove_stale=False)[source]

Bin timestamped data in a dataframe so as to have ingoing and outgoing states per cohort interval

Parameters:

data (pandas dataframe) – the dataframe to cohort
cohorts – the number of cohorts
output_format (int) – how to structure the outputs (0=cohorts, 1=event_list)
remove_stale (bool) – whether to remove successive observations with identical state

Returns:

returns dataframe with cohorted data and cohort intervals

Note

The ‘ID’ and ‘Time’ column labels are used by default.

Warning

Cohorting is a ‘lossy’ operation: Timestamps are discretised (binned) and any intermediate state transitions are lost.

Warning

The data must be sorted already

transitionMatrix.utils.preprocessing.generate_cohort_bounds(data, cohorts)[source]

Generate cohort intervals given an input transition dataframe and the desired number of cohorts. The function finds the range of timestamps and divides it equally

Parameters:

data – a pandas dataframe
cohorts (int) – the number of cohorts

Returns:

cohort_bounds

Returns:

dt

Warning

the Time column must be in float format

transitionMatrix.utils.preprocessing.generate_event_dict(data, dt, cohort_bounds)[source]

Loop over all events and construct a dictionary in the following format:

event_dict = {
  (entity_id, cohort interval) : [(time, state), ..., (time, state)]
  (entity_id, cohort interval) : (time, state), ..., (time, state)]
}

Create a unique key as per (entity, interval)
Find the interval of each event (the cohort it belongs it)
Add (time, state) pairs as variable length list

This data structure allows applying arbitrary state assignment to each cohort interval

Parameters:

data – a pandas dataframe
dt – the cohort interval
cohort_bounds – the boundaries of the cohort intervals

Returns:

dict

transitionMatrix.utils.preprocessing.remove_stale_events(data)[source]

Parse an event dictionary and remove transitions to the same state:

event_dict = {
  (entity_id, cohort interval) : [(time, state), ..., (time, state)]
  (entity_id, cohort interval) : (time, state), ..., (time, state)]
}

Parameters:: data – a pandas dataframe
Returns:: dict

transitionMatrix.utils.preprocessing.total_timestamps(data)[source]

Count total number of timestamps in a dataframe

Parameters:: data – dataframe. The ‘Time’ column is used by default
Returns:: returns an integer

transitionMatrix.utils.preprocessing.transitions_summary(dataframe)[source]: Calculate some summary statistics about transitions :param dataframe: input dataframe :return: dict

transitionMatrix.utils.preprocessing.unique_entities(data)[source]

Identify unique entities in a dataframe

Parameters:: data – dataframe. The ‘ID’ column is used by default
Returns:: returns a numpy array

transitionMatrix.utils.preprocessing.unique_states(data)[source]

Identify unique states in a dataframe

Parameters:: data – dataframe. The ‘State’ column is used by default for Compact formats, ‘From’ column as fallback for Canonical format
Returns:: returns a numpy array

transitionMatrix.utils.preprocessing.unique_timestamps(data)[source]

Identify unique timestamps in a dataframe

Parameters:: data – dataframe. The ‘Time’ column is used by default
Returns:: returns a sorted numpy array

transitionMatrix.utils.preprocessing.validate_absorbing_state(dataframe, state)[source]

Validate whether a given state is actually absorbing (there should be no transitions to another state)

Parameters:

dataframe – an input data frame
state (int) – the state to validate

Returns:

a list of exceptions