Datasets
The transitionMatrix distribution includes a number of datasets to support testing / training objectives. Datasets come in two main types:
State Transition Data (used in estimation). There are both dummy (synthetic) examples and some actual data. Transition data are usually in CSV format.
Transition Matrices and Multi-period Sets of matrices (again both dummy and actual examples). Transition matrices are usually in JSON format.
State Transition Data
The scripts are located in examples/python. For testing purposes all examples can be run using the run_examples.py script located in the root directory. Some scripts have an example flag that selects alternative input data or estimators.
File |
Format |
Events |
Entities |
States |
Generator |
Description |
|---|---|---|---|---|---|---|
rating_data_raw.csv |
Compact |
4000 |
1829 |
9 |
Extract |
A typical credit rating dataset |
rating_data.csv |
Compact |
3780 |
1642 |
9 |
Data cleaning script |
A typical credit rating dataset |
scenario_data.csv |
Compact |
550 |
50 |
5 |
||
synthetic_data.csv |
Compact |
100 |
10 |
2 |
||
synthetic_data1.csv |
Compact |
100 |
1 |
4 |
Generator(=1) |
DURATION TYPE DATASETS (Compact format) |
synthetic_data2.csv |
Compact |
10000 |
1000 |
2 |
Generator(=2) |
DURATION TYPE DATASETS (Compact format) |
synthetic_data3.csv |
Compact |
2000 |
100 |
7 |
Generator(=3) |
DURATION TYPE DATASETS (Compact format) |
synthetic_data4.csv |
Compact |
10000 |
1000 |
8 |
Generator(=4) |
Cohort type dataset (Generic Rating Matrix). Offers a semi-realistic example |
synthetic_data5.csv |
Compact |
50000 |
10000 |
3 |
Generator(=5) |
Large cohort type dataset useful for testing convergence |
synthetic_data6.csv |
Compact |
20000 |
1000 |
2 |
Generator(=6) |
COHORT TYPE DATASETS |
synthetic_data7.csv |
Canonical |
1295 |
1000 |
8 |
Generator(=7) |
Duration type datasets in Long Format |
synthetic_data8.csv |
Canonical |
10000 |
10000 |
2 |
Generator(=8) |
Duration type datasets in Long Format |
synthetic_data9.csv |
Canonical |
1338 |
1000 |
8 |
Generator(=9) |
Duration type datasets in Long Format |
synthetic_data10.csv |
Canonical |
12000 |
2000 |
9 |
Generator(=10) |
Credit Rating Migrations in Long Format / Compact Form |
test.csv |
Compact |
14 |
7 |
3 |
Transition Matrices
generic_monthly
generic_multiperiod
JLT
sp 2017