individual-claims.Rmd
This article outlines the conventions for simulators that generate individual claims data for P&C (aka general and nonlife) insurance.
Note: these guidelines are currently under active development, please open an issue or drop by slack if you have sugguestions!
Each row should represent the state of a claim at one point in time.
Good:
#> # A tibble: 3 x 4
#> claim_id accident_year development_year paid_loss
#> <chr> <int> <int> <dbl>
#> 1 00001 2000 1 250
#> 2 00001 2000 2 150
#> 3 00001 2000 3 50
Bad:
#> # A tibble: 1 x 5
#> claim_id accident_year paid_loss_1 paid_loss_2 paid_loss_3
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 00001 2000 250 150 50
We note that the output should be at the most granular level possible, which allows the user to aggregate up as needed. For example, if instead of accident year, the simulator produces accident dates and transaction dates, those should be included instead of accident/development periods.
Variables with redundant information should not be included. E.g. if accident_year
and reporting_delay
are included, reporting_lag
should not be included.
In the following table, we list the preferred names for various variables that may be included in the output of a simulator. If you’re writing a conjuror implementation that introduces new names, please open an issue or PR with the proposal.
Description | Preferred Pattern | Example | Type |
---|---|---|---|
Accident period | accident_* |
accident_year |
integer |
Identifiers | *_id |
claim_id |
character |
Development period | development_* |
development_year |
integer |
Incremental paid loss | paid_loss |
double | |
Categorical claim and policy characteristics | claimant_age |
character |