This article outlines the conventions for simulators that generate individual claims data for P&C (aka general and nonlife) insurance.

Note: these guidelines are currently under active development, please open an issue or drop by slack if you have sugguestions!

Data Format

Each row should represent the state of a claim at one point in time.

Good:

#> # A tibble: 3 x 4
#>   claim_id accident_year development_year paid_loss
#>   <chr>            <int>            <int>     <dbl>
#> 1 00001             2000                1       250
#> 2 00001             2000                2       150
#> 3 00001             2000                3        50

Bad:

#> # A tibble: 1 x 5
#>   claim_id accident_year paid_loss_1 paid_loss_2 paid_loss_3
#>   <chr>            <int>       <dbl>       <dbl>       <dbl>
#> 1 00001             2000         250         150          50

We note that the output should be at the most granular level possible, which allows the user to aggregate up as needed. For example, if instead of accident year, the simulator produces accident dates and transaction dates, those should be included instead of accident/development periods.

Variables with redundant information should not be included. E.g. if accident_year and reporting_delay are included, reporting_lag should not be included.

Column Names and Types

In the following table, we list the preferred names for various variables that may be included in the output of a simulator. If you’re writing a conjuror implementation that introduces new names, please open an issue or PR with the proposal.

Description Preferred Pattern Example Type
Accident period accident_* accident_year integer
Identifiers *_id claim_id character
Development period development_* development_year integer
Incremental paid loss paid_loss double
Categorical claim and policy characteristics claimant_age character