Backend documentation
Preprocessing module
this module is in charge of
- supporting event log imports from xes/csv files.
- formatting the event log so that it can be later on used by the
nn_manager
module. in particular, the timestamps are encoded as integers, the case id's and activity names are encoded, and the rows are sorted by case id and timestamp. Splitting the event log in training and testing sublogs is also supported. - the preprocessor also calculates important values such as the number of activities and absolute frequency distribution, which are also required by the neural network's training.
- formatting is done automatically after importing, but this can also be deselected by setting the corresponding parameter.
- other preprocessing operations are supported, such as replacing NaN values, adding a unique start / end activity to the log, and removing duplicate rows.
Note that this module does not bring the event log in the input format
for the RNN. this is done by the module util.py
in the subpackage
RMTPP_torch
.
Preprocessing
This is the preprocessing unit for our server, which implements all the above mentioned functionalities.
Source code in server/preprocessing.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 |
|
add_unique_start_end_activity()
if there is no unique start/ end activity, add an artificial start and end activity
Source code in server/preprocessing.py
351 352 353 354 355 356 357 358 359 360 361 362 363 364 |
|
encode_df_columns()
- encode the markers and case id's with integers (label encoding)
- encode the timestamps
- returns nothing, but modifies self.event_df
The following holds for self.event_df
after this function is called:
- all columns are sorted by case id and timestamp
- the case id and markers are encoded with integers
- the timestamps are encoded as floats. timezone information is removed.
Source code in server/preprocessing.py
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
|
find_end_activities()
" find the end activities of all cases for an existing log and return a dict with end activities as keys and value is the count of this activity
Source code in server/preprocessing.py
312 313 314 315 316 317 318 319 320 321 |
|
find_start_activities()
find the start activities of all cases for an existing log and return a dict with start activities as keys and value is the count of this activity
Source code in server/preprocessing.py
305 306 307 308 309 310 |
|
get_sample_case()
returns a sample of a case
Source code in server/preprocessing.py
324 325 326 327 328 329 |
|
handle_import(is_xes, path, case_id, timestamp, activity, time_precision=time_precision.TimePrecision.NS, sep=',', formatting=True)
handles the import of the event log.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
is_xes |
bool
|
If True, the event log is in XES format. If False, it is in CSV format. |
required |
path |
str
|
Path to the event log. |
required |
case_id |
str
|
Case id column name. |
required |
timestamp |
str
|
Timestamp column name. |
required |
activity |
str
|
Activity column name. |
required |
time_precision |
TimePrecision
|
Time precision. Defaults to TimePrecision.NS. note that this functionality is INCOMPLETED. |
NS
|
sep |
str
|
Separator. Defaults to ",". |
','
|
formatting |
bool
|
If True, the event log is formatted so that it can be used by the RNN. Defaults to True. |
True
|
Source code in server/preprocessing.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
import_event_log(formatting)
helper function for import_event_log_csv and import_event_log_xes. - genereates an EventLog object so that other pm4py functions can use it - remove all columns other than the three main ones - remove all NaN entries - format a dataframe using pm4py Effects: - rows sorted by case id and timestamp
Parameters:
Name | Type | Description | Default |
---|---|---|---|
formatting |
bool
|
If True, the event log is formatted so that it can be used by the RNN. |
required |
Source code in server/preprocessing.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
|
import_event_log_csv(path, sep, formatting=True)
This is an adapter for format_dataframe such that the event data can be properly used by the RNN.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
Path to the event log. |
required |
sep |
str
|
Separator. |
required |
formatting |
bool
|
If True, the event log is formatted so that it can be used by the RNN. Defaults to True. |
True
|
Source code in server/preprocessing.py
111 112 113 114 115 116 117 118 119 120 121 |
|
import_event_log_dataframe(df, case_id, activity_key, timestamp_key, formatting=True)
This is an adapter for format_dataframe such that the event data can be properly used by the RNN model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
Path to the event log. |
required |
case_id |
str
|
Case id column name. |
required |
activity_key |
str
|
Activity column name. |
required |
timestamp_key |
str
|
Timestamp column name. |
required |
formatting |
bool
|
If True, the event log is formatted so that it can be used by the RNN. Defaults to True. |
True
|
Source code in server/preprocessing.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
import_event_log_xes(path, formatting=True)
Imports an event log in XES format.
Args: path (str): Path to the XES file. formatting (bool, optional): If True, the event log is formatted so that it can be used by the RNN. Defaults to True.
Effects: - event_df dataframe is generated. - The generated dataframe has 3 columns: case id (string), label (string), and timestamp (datetime64). - event log object: its correctness is assumed from the pm4py library and is therefore not tested.
Source code in server/preprocessing.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
|
replace_activity_nan_with_mode()
replaces NaN values in activity column with median
Source code in server/preprocessing.py
333 334 335 336 337 338 339 340 341 |
|
split_train_test(train_percentage)
This is a helper function for splitting the event log into training and testing data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_percentage |
float
|
The percentage of data to be used for training. |
required |
Returns:
Name | Type | Description |
---|---|---|
tuple |
A tuple containing two event logs (dataframes) for training and testing, the number of classes (for the markers), and the absolute frequency distribution for each class in the whole event log. |
Source code in server/preprocessing.py
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
|
string_to_index(df, column)
translate each marker into a specific integer index.
Source code in server/preprocessing.py
201 202 203 204 205 206 207 208 |
|
xes_helper(path)
just a testing function
Source code in server/preprocessing.py
61 62 63 64 65 66 |
|
NN management module
This module is in charge of training the NN Model and also testing it.
The generated model may be exported and also imported later on by this class.
it supports manual trainig, random search and grid search.
Config
class containing the configuration for the model.
Attributes:
Name | Type | Description |
---|---|---|
seq_len |
int
|
The sequence length used for the sliding window. |
emb_dim |
int
|
The embedding dimension. |
hid_dim |
int
|
The hidden dimension. |
mlp_dim |
int
|
The MLP dimension used for the LSTM. |
batch_size |
int
|
The batch size. |
alpha |
float
|
The alpha value. |
dropout |
float
|
The dropout value. |
time_precision |
TimePrecision
|
The time precision. Only NS is supported. |
lr |
float
|
The learning rate. |
epochs |
int
|
The number of epochs. |
importance_weight |
str
|
The importance weight. (set to a default value as in the RMTPP implementation) |
verbose_step |
int
|
The verbose step, just for logging purposes. |
cuda |
bool
|
Whether to use the GPU. |
absolute_frequency_distribution |
Counter
|
The absolute frequency distribution of the classes. |
case_id_le |
LabelEncoder
|
The case ID label encoder. |
activity_le |
LabelEncoder
|
The activity label encoder. |
exponent |
int
|
The exponent used for the time conversion (see the preprocessing module). |
number_classes |
int
|
The number of possible activities in the data. |
case_activity_key |
str
|
The case activity key. |
case_timestamp_key |
str
|
The case timestamp key. |
case_id_key |
str
|
The case ID key. |
Source code in server/nn_manager.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
asdict()
used for exporting as a dictionary
Source code in server/nn_manager.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
dict_to_encoder(dic)
cast the dictionary to an encoder
Source code in server/nn_manager.py
132 133 134 135 136 137 138 139 |
|
encoder_to_dict(encoder)
cast the encoder to a dictionary
Source code in server/nn_manager.py
126 127 128 129 130 |
|
load_config(dic)
used for importing
Source code in server/nn_manager.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
NNManagement
This is the NNMamangement class.
Provided functinality
- Train the model based on the event log.
- Test the model based on the event log.
- Set params.
Source code in server/nn_manager.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 |
|
evaluate()
This is the testing function for the model. It prints out the time_error, precision, recall, and f1 score.
Returns:
Name | Type | Description |
---|---|---|
time_error |
float
|
The time error. |
acc |
float
|
The accuracy. |
recall |
float
|
The recall. |
f1 |
float
|
The F1 score. |
Source code in server/nn_manager.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
|
export_nn_model(name='trained_model.pt')
generates the .pt file containing the generated model.
model state dict contains optimizer state dict
Source code in server/nn_manager.py
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 |
|
get_training_statistics()
Returns:
Name | Type | Description |
---|---|---|
str |
The accuracy, recall, and F1 score as a JSON object in string format. |
Source code in server/nn_manager.py
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
|
grid_search(search_parameters)
Grid search for the best hyperparameters.
We only do this for the hid_dim, mlp_dim and emb_dim parameters. (decided arbitrarily, can be extended to other parameters as well.)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_parameters |
dict
|
Dictionary containing the search parameters. |
required |
- |
hid_dim
|
[start, end, step] |
required |
- |
mlp_dim
|
[start, end, step] |
required |
- |
emb_dim
|
[start, end, step] |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
The best accuracy. |
Source code in server/nn_manager.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
|
import_nn_model(path)
imports a .pt file
Source code in server/nn_manager.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 |
|
load_data(train_data, test_data, case_id, timestamp_key, event_key)
imports a training and testing sublogs, which were preprocessed by the preprocessing module.
it applies the sliding window algorithm to have subsequences of the same fixed length. The output is passed to the respective DataLoader object, which computes the time differences and casts the input to tensors; generates the batches.
Source code in server/nn_manager.py
315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 |
|
random_search(search_parameters, iterations)
Random search for the best hyperparameters. Saves the best model in the class.
We only do this for the hid_dim, mlp_dim and emb_dim parameters. (decided arbitrarily, can be extended to other parameters as well.)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_parameters |
dict
|
Dictionary containing the search parameters. - 'hid_dim': [start, end] - 'mlp_dim': [start, end] - 'emb_dim': [start, end] |
required |
iterations |
int
|
Number of iterations. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
The best accuracy. |
Source code in server/nn_manager.py
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 |
|
train()
This is the main training function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data |
DataFrame
|
The training data. |
required |
test_data |
DataFrame
|
The test data. |
required |
case_id |
str
|
The column name of the case ID in the data. |
required |
timestamp_key |
str
|
The key of the timestamp in the data. |
required |
no_classes |
int
|
The number of known markers. |
required |
Source code in server/nn_manager.py
338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 |
|
Prediction management module
This module is in charge of administrating prediction generation.
The two following of predictions can be made
- single predictions (one step in the future and get the most likely (event, timestamp) pair)
- multiple predictions (generate a predictive tree). these can be saved in a file.
Predictions are also decoded.
This module is also used by the process_model_manager
module, which calls the multiple
prediction manager repeatedly. Since this other manager supports different options in
relation to how the cut sequences should be restored, the parametrized function
multiple_prediction_linear
is implemented; which grants some runtime benefits.
PredictionManager
Source code in server/prediction_manager.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 |
|
__init__(model, case_id_key, activity_key, timestamp_key, config)
Initializes the PredictionManager object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
object
|
The model used for doing predictions. |
required |
case_id_key |
str
|
The case id key of the log. |
required |
activity_key |
str
|
The activity key of the log. |
required |
timestamp_key |
str
|
The timestamp key of the log. |
required |
config |
Config
|
The configuration used for training and important hyperparameters. |
required |
Source code in server/prediction_manager.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
append_one_difference_array(lst)
Appends one difference array to self.recursive_time_diffs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lst |
list
|
List used for calculating the contiguous differences. |
required |
Source code in server/prediction_manager.py
166 167 168 169 170 171 172 173 174 175 176 177 |
|
append_to_log(time, event)
Appends a window and a difference array to an existing list instead of calling ATMDataset and Dataloader on each iterative call of the prediction generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
time |
float
|
The newly predicted timestamp. |
required |
event |
The newly predicted event. |
required |
Source code in server/prediction_manager.py
324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 |
|
backtracking_prediction_tree(c_t, c_e, c_d, depth, degree, current_path)
use backtracking to generate all the paths from the given last timestamp and marker considering the input degree as a threshold and the maximum depth for the generated tree.
Source code in server/prediction_manager.py
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 |
|
check_input_uniqueness()
the input df must contain only one process. hence check if thereis one unique case_id
Source code in server/prediction_manager.py
84 85 86 87 88 |
|
decode_paths()
used for decoding the events and timestamps in the generated paths. The timestamps are NOT decoded, since the predictions are TIMEDELTAS
Source code in server/prediction_manager.py
350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 |
|
get_differences()
calculates time differences.
Source code in server/prediction_manager.py
154 155 156 157 158 159 160 161 162 163 |
|
get_dummy_process(df, case_id_column)
just used for testing; create a dummy input df.
Source code in server/prediction_manager.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
jsonify_paths()
note that we just save the probability of the last pair (time, event) in the path, since the nn calculates lambda*(t), which is the probability of the last predicted event happening in the predicted time t.
paths markers are assumed to be decoded.
Source code in server/prediction_manager.py
367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 |
|
jsonify_single(time_pred, event_pred, prob)
note that we just save the probability of the last pair (time, event) in the path, since the nn calculates lambda*(t) (see paper), which is the probability of the last predicted event happening in the predicted time t.
Source code in server/prediction_manager.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
linear_iterative_predictor(depth, start_time, start_event)
makes predictions linearly (ie no backtracking and branching degree = 1) , and also iteratively (no recursion)
Source code in server/prediction_manager.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 |
|
linear_iterative_predictor_non_stop(start_time, start_event, upper=float('inf'))
Predicts the path of events iteratively until an end activity is found or the upper bound is reached.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start_time |
float
|
The start time of the path. |
required |
start_event |
The start event of the path. |
required | |
upper |
float
|
The upper bound for the amount of iterations. Defaults to float("inf"). |
float('inf')
|
Source code in server/prediction_manager.py
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
|
multiple_prediction(depth, degree)
Get a list of possible paths starting at the last timestamp and event pair.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
depth |
int
|
The number of steps in the future to be predicted. |
required |
degree |
int
|
The number of predictions on each step to be considered. |
required |
This method loads data, gets windows, computes paths, and decodes paths. It requires the configuration used for the NN, which is required by the ATM Dataset.
Source code in server/prediction_manager.py
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 |
|
multiple_prediction_dataframe(depth, degree, df, linear=False, non_stop=False, upper=30)
make multiple predictions given a dataframe preprocessor is in charge of doing the reading/importing from csv, xes, commandline, etc... it is assumed that the event log contains only one case id.
Source code in server/prediction_manager.py
417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 |
|
multiple_prediction_linear(depth, nonstop, upper)
this is a special case of multiple prediction where the degree= 1. we avoid backtracking and recursion for efficiency reasons.
Source code in server/prediction_manager.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
|
pop_from_log()
used for backtracking to restore the old path
Source code in server/prediction_manager.py
342 343 344 345 346 347 348 |
|
single_prediction()
make one prediction given a partial process.
Source code in server/prediction_manager.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
single_prediction_dataframe(df)
make one prediction given a dataframe. preprocessor is in charge of doing the reading/importing from csv, xes, commandline, etc...
Source code in server/prediction_manager.py
90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
Process model manager module
This module implements all necessary functions for conformance checking and fitness analysis.
Functions:
Name | Description |
---|---|
- cut_event_log_tail |
Cuts each case in the event log from the tail. |
- cut_event_log_random |
Cuts each case in the event log at random indices. |
- reconstruct_event_log |
Reconstructs the event log using the prediction manager. |
- process_mining |
Applies a process mining algorithm to the reconstructed event log. |
- conformance_checking_token_based |
Performs token-based conformance checking on the reconstructed event log. |
- conformance_checking_alignment_based |
Performs alignment-based conformance checking on the reconstructed event log. |
- import_petri_net |
Imports a Petri net. |
- export_petri_net |
Exports a Petri net. |
- decode_predictions |
Decodes the predictions in the event log. |
This module allows for the analysis of fitness by cutting the event log, reconstructing it using predictions, and applying process mining and conformance checking algorithms.
ProcessModelManager
Source code in server/process_model_manager.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 |
|
alpha_miner(path)
Run alpha miner on the predictive log and generate a petri net.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
Path used for saving the generated petri net. |
required |
Source code in server/process_model_manager.py
324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 |
|
decode_df(df)
decodes the predictive df; inverse transform timestamps and event names.
Source code in server/process_model_manager.py
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 |
|
decode_sequence(sequence)
decodes the input sequence that contains a df.
:return: sequence that has been decoded.
Source code in server/process_model_manager.py
205 206 207 208 209 210 211 |
|
fill_up_log(upper, non_stop, random_cuts, cut_length, input_sequences, cuts)
do the predictions for each cut sequence and extend the event log so that it now constains the predictions.
Source code in server/process_model_manager.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
|
format_columns()
exporting to csv changes the datetime types to object, but we need them to be datetime.
Source code in server/process_model_manager.py
294 295 296 297 298 299 300 301 302 |
|
generate_predictive_log(new_log_path, max_len=15, upper=30, non_stop=False, random_cuts=False, cut_length=0)
generates a predictive log. each process is cut at some given index, and the model is used to
reconstruct the rest of the process. there are so far three possible modi for cutting and prediction generation:
- for tail cuts: set cut_length value and set random_cuts to false
- for random cuts with cut memory: random_cuts to true and non_stop to false
- for random cuts nonstop: random_cuts to true and non_stop totrue
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max |
len
|
max length for the cut sequences ie max sequence input size length. |
required |
upper |
upperbound for the non stop random cutter ie how long to run before reaching end state. |
30
|
|
non_stop |
must be set to true if the predictions are done until reaching final marking. |
False
|
|
random_cuts |
set to true to cut in random indices. |
False
|
|
cut_length |
in case of cutting fix tail lengths, select the tail length to cut for all sequences. |
0
|
|
upper |
upper bound for how many iterations a non stop iterative predictor should run. |
30
|
Source code in server/process_model_manager.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 |
|
handle_nat(group)
the inverse transformation for timestamps is a lossy transformation and might lead to NaT entries. a timedelta of k second's with respect to the last valid timestamp is set as a timestamp value for the kth NaT entry. :param group: a group in the predictive df that contains only one case id. :return: the same group now with valid timestamps
Source code in server/process_model_manager.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
|
heuristic_miner(path, dependency_threshold=0.5, and_threshold=0.65, loop_two_threshold=0.5, view=False)
Run heuristic miner on the predictive log and generate a petri net.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
Path used for saving the generated petri net. |
required |
dependency_threshold |
float
|
Dependency threshold parameter for heuristic miner. |
0.5
|
and_threshold |
float
|
AND threshold parameter for heuristic miner. |
0.65
|
loop_two_threshold |
float
|
Loop two threshold parameter for heuristic miner. |
0.5
|
Source code in server/process_model_manager.py
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
|
import_predictive_df(path)
used for importing a predictive df.
Source code in server/process_model_manager.py
260 261 262 263 264 |
|
inductive_miner(path, noise_threshold=0)
Run inductive miner on the predictive log and generate a petri net.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
Path used for saving the generated petri net. |
required |
noise_threshold |
float
|
Noise threshold parameter for inductive miner. |
0
|
Source code in server/process_model_manager.py
304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 |
|
initialize_variables()
initialize variabels for predictive log generator
Source code in server/process_model_manager.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
prefix_tree_miner(path)
Run prefix tree miner on the predictive log and generate a petri net.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
Path used for saving the generated petri net. |
required |
Source code in server/process_model_manager.py
342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 |
|
random_cutter(case_id_counts, max_len, cuts, input_sequences)
Cuts each sequence contained in input_sequences at random indices.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cuts |
dict
|
The cut index and cut length are preserved. |
required |
case_id_counts |
Series
|
Number of rows for each case_id. |
required |
max_len |
int
|
Max length that the input sequence can have. Can be set to improve runtime. TODO: allow INF for max_len. |
required |
input_sequences |
list
|
List of sequences to be cut. |
required |
Source code in server/process_model_manager.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|
tail_cutter(case_id_counts, cut_length, cuts, input_sequences)
cut sequences cut_length steps from the tail. :param cut_length: how many steps to cut from the tail of each sequence. :param case_id_counts: number of steps on each case_id :param input_sequences: list of sequences to be cut.
Side effect: the predictive_df is extended with the cut sequences.
Source code in server/process_model_manager.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|