Paying physician group practices for quality: A statewide quasi-experiment

Vol. 1, No. 3-4
December 2013
Conrad, D.A., Grembowski, D., Perry, L. et al.

This article presents the results of a unique quasi-experiment of the effects of a large-scale pay-for-performance (P4P) program implemented by a leading health insurer in Washington state during 2001–2007. The authors received external funding to provide an objective impact evaluation of the program. The program was unique in several respects: (1) It was designed dynamically, with two discrete intervention periods—one in which payment incentives were based on relative performance (the “contest” period) and a second in which payment incentives were based on absolute performance compared to achievable benchmarks. (2) The program was designed in collaboration with large multispecialty group practices, with an explicit run-in period to test the quality metrics. Public reporting of the quality scorecard for all participating medical groups was introduced 1 year before the quality incentive payment program's inception, and continued throughout 2002–2007. (3) The program was implemented in stages with distinct medical groups. A control group of comparable group practices also was assembled, and difference-in-differences methodology was applied to estimate program effects. Case mix measures were included in all multivariate analyses.

The regression design permitted a contrast of intervention effects between the “contest” approach in the sub-period of 2003–2004 and the absolute standard, “achievable benchmarks of care” approach in sub-period 2005–2007. Most of the statistically significant quality incentive program coefficients were small and negative (opposite to program intent). A consistent pattern of differential intervention impact in the sub-periods did not emerge.

Cumulatively, the probit regression estimates indicate that neither the quality scorecard nor the quality incentive payment program had a significant positive effect on general clinical quality. Based on key informant interviews with medical leaders, practicing physicians, and administrators of the participating groups, the authors conclude that several factors likely combined to dampen program effects: (1) modest size of the incentive; (2) use of rewards only, rather than a balance of rewards and penalties; (3) targeting incentive payments to the group, thus potentially weakening incentive effects at the individual level.

