Icon

01_​Interactive_​Outlier_​Detection

Outlier Detection in Medical Claims

This workflow identifies outliers in medical claim data such as claims with an unusual high cost for a certain disease. Firstly, the input data is group by the target column (disease). Secondly, the interquartile range (IQR), i.e. the difference between the 3rd and 1st quartile, is computed for the numerical column in question (cost). Outliers are all records that do not lie inside the permitted interval IQR +/- k*IQR, where the factor k is specified by the analyst. The target and numerical columns can be defined in the configuration dialogs of the components. The lower branch of the workflow is a refinement of this approach and allows for identifying outliers across several target columns e.g. an unusual high/low duration of days staid for a certain disease and payment amount.

Configurecomponent toprefilter the input data Standardpreprocessing e.g.label replacement Configure components toselect the group values e.g.diseases to get the detailedinformation for Configure components tospecify the group and outliercolumns e.g. disease andpayment amount as well asthe outlier factor General InformationPress the green double arrow above to run the complete workflow. The grey nodes are components that contain a sub workflow which you can inspect and change to your needs by doing a Ctrl+ double-click. When you inspect a component you will find configuration nodes which are used to pass settings to subsequent nodes. In order tochange the preset settings select "Configure" in the component's context menu, or simply double click the component.The File Reader node uses a workflow relative path to read the 2008_BSA_Inpatient_Claims_PUF file which is located in the data folderinside the workflow folder. The workflow uses shared components to prevent duplication of logic. This makes the maintenance of theworkflow simpler because changes to one of the shared components are reflected by its counterparts.Data DescriptionThe workflow analyses the Basic Stand Alone (BSA) Inpatient Public Use Files (PUF) named “CMS 2008 BSA Inpatient Claims PUF” withinformation from 2008 Medicare inpatient claims. This is a claim-level file in which each record is an inpatient claim incurred by a 5%sample of Medicare beneficiaries. There are some demographic and claim-related variables provided in this PUF as detailed below.However, as beneficiary identities are not provided, it is not possible to link claims that belong to the same beneficiary in the CMS 2008BSA Inpatient Claims PUF.Data fields:The file contains seven (7) variables: A primary claim key indexing the records and six (6) analytic variables, listed below. One of theanalytic variables, claim cost, is provided in two forms, (a) as an integer category and (b) as a dollar average. These two versions areessentially equivalent. As they can be treated as one variable, there are six (6) rather than seven (7) analytic variables, in addition to theclaim ID. 1.) Primary claim key2.) Age (BENE_AGE_CAT_CD), the beneficiary's age, reported in six categories: (1) under 65, (2) 65 - 69, (3) 70 - 74, (4) 75-79, (5) 80-84,(6) 85 and above.3.) Gender (BENE_SEX_IDENT_CD), (1) male or (2) female.4.) Base DRG (IP_CLM_BASE_DRG_CD): This is a set of 311 possible codes, numbered 1 - 311, derived from MS-DRG codes. Itidentifies a basic diagnosis or a set of diagnoses. A base DRG code might be comprised of up to three MS-DRG codes.5.) ICD-9 primary procedure code (IP_CLM_ICD9_PRCDR_CD): “International Classification of Diseases” version 9. This is a two-digitcode reported as 00 - 99. In the PUF, 85 such codes are observed. This is the only variable that has “missing” values (about 47% missing)meaning that there does not exist a primary procedure on the claim.6.) Length (IP_CLM_DAYS_CD), the length of stay reported in four categories: (1) 1 day, (2) 2 - 4 days, (3) 5 - 7 days, and (4) 8 or more days7.) Amount (IP_DRG_QUINT_PMT_AVG and IP_DRG_QUINT_PMT_CD): This has (up to) five (5) categories for each base DRG code.Within each base DRG code, the original claim amounts in the entire population (except negative payments) are broken into approximatequintiles (identified by IP_DRG_QUINT_PMT_CD from 1 - 5). The goal of the workflow is to identify outliers in the medical claim data, such as claims with an unusual high cost for a certain disease. Detailed records for selected diseases1: lower outlier records2: upper outlier records2008_BSA_Inpatient_Claims_PUFReplace IDswith names1: all outlier2: lower top x groups3: upper top x groupsDetailed records for selected diseases1: lower outlier records2: upper outlier records1: all outlier2: lower top x groups3: upper top x groups Details for Group File Reader Preprocess Labels ParameterizedData Filtering Single ColumnOutlier Detection Details for Group Pair ColumnOutlier Detection Configurecomponent toprefilter the input data Standardpreprocessing e.g.label replacement Configure components toselect the group values e.g.diseases to get the detailedinformation for Configure components tospecify the group and outliercolumns e.g. disease andpayment amount as well asthe outlier factor General InformationPress the green double arrow above to run the complete workflow. The grey nodes are components that contain a sub workflow which you can inspect and change to your needs by doing a Ctrl+ double-click. When you inspect a component you will find configuration nodes which are used to pass settings to subsequent nodes. In order tochange the preset settings select "Configure" in the component's context menu, or simply double click the component.The File Reader node uses a workflow relative path to read the 2008_BSA_Inpatient_Claims_PUF file which is located in the data folderinside the workflow folder. The workflow uses shared components to prevent duplication of logic. This makes the maintenance of theworkflow simpler because changes to one of the shared components are reflected by its counterparts.Data DescriptionThe workflow analyses the Basic Stand Alone (BSA) Inpatient Public Use Files (PUF) named “CMS 2008 BSA Inpatient Claims PUF” withinformation from 2008 Medicare inpatient claims. This is a claim-level file in which each record is an inpatient claim incurred by a 5%sample of Medicare beneficiaries. There are some demographic and claim-related variables provided in this PUF as detailed below.However, as beneficiary identities are not provided, it is not possible to link claims that belong to the same beneficiary in the CMS 2008BSA Inpatient Claims PUF.Data fields:The file contains seven (7) variables: A primary claim key indexing the records and six (6) analytic variables, listed below. One of theanalytic variables, claim cost, is provided in two forms, (a) as an integer category and (b) as a dollar average. These two versions areessentially equivalent. As they can be treated as one variable, there are six (6) rather than seven (7) analytic variables, in addition to theclaim ID. 1.) Primary claim key2.) Age (BENE_AGE_CAT_CD), the beneficiary's age, reported in six categories: (1) under 65, (2) 65 - 69, (3) 70 - 74, (4) 75-79, (5) 80-84,(6) 85 and above.3.) Gender (BENE_SEX_IDENT_CD), (1) male or (2) female.4.) Base DRG (IP_CLM_BASE_DRG_CD): This is a set of 311 possible codes, numbered 1 - 311, derived from MS-DRG codes. Itidentifies a basic diagnosis or a set of diagnoses. A base DRG code might be comprised of up to three MS-DRG codes.5.) ICD-9 primary procedure code (IP_CLM_ICD9_PRCDR_CD): “International Classification of Diseases” version 9. This is a two-digitcode reported as 00 - 99. In the PUF, 85 such codes are observed. This is the only variable that has “missing” values (about 47% missing)meaning that there does not exist a primary procedure on the claim.6.) Length (IP_CLM_DAYS_CD), the length of stay reported in four categories: (1) 1 day, (2) 2 - 4 days, (3) 5 - 7 days, and (4) 8 or more days7.) Amount (IP_DRG_QUINT_PMT_AVG and IP_DRG_QUINT_PMT_CD): This has (up to) five (5) categories for each base DRG code.Within each base DRG code, the original claim amounts in the entire population (except negative payments) are broken into approximatequintiles (identified by IP_DRG_QUINT_PMT_CD from 1 - 5). The goal of the workflow is to identify outliers in the medical claim data, such as claims with an unusual high cost for a certain disease. Detailed records for selected diseases1: lower outlier records2: upper outlier records2008_BSA_Inpatient_Claims_PUFReplace IDswith names1: all outlier2: lower top x groups3: upper top x groupsDetailed records for selected diseases1: lower outlier records2: upper outlier records1: all outlier2: lower top x groups3: upper top x groups Details for Group File Reader Preprocess Labels ParameterizedData Filtering Single ColumnOutlier Detection Details for Group Pair ColumnOutlier Detection

Nodes

Extensions

Links