Assessing Dangerousness
Page 2
Sex Offenders and Risk Assessment
Sex offenders present a special problem, and researchers have attempted to predict sexual recidivism apart from general violent or
criminal recidivism. Hanson and Bussiere (1998) systematically reviewed the research literature on sexual reoffending. Using meta-analysis
(a statistical technique that allows the researcher to combine the results of many separate studies) and 61 datasets representing more
than 20,000 sex offenders, Hanson and Bussiere identified seven factors that showed a significant correlation with reoffense. Subsequent
analyses showed that four variables (number of prior sex offence charges, age upon release, any male victims, any unrelated victims)
predicted reoffense as well as seven variables-in other words, the other three variables were redundant. Hanson formalized his findings
into the Rapid Risk Assessment of Sexual Offender (RRASOR; pronounced "razor"). Unlike most such instruments, a probation officer or any
one else sufficiently familiar with the scoring criteria can validly score the RRASOR. Although other, more recent measures have shown
slightly higher predictive accuracy, the RRASOR remains a viable instrument, partly for its ease of use.
Several other specialized instruments exist for predicting sexual recidivism. Hanson and Thornton (1999) created the Static-99, which adds
six additional items to the four items on the RRASOR. The authors of the VRAG have created a parallel version, the Sex Offender Risk
Appraisal Guide (SORAG), for sex offenders. Several different variables are incorporated and one changes direction from its use in
the VRAG: Degree of injury to the victim raises the predicted risk level in the SORAG rather than lowering it, as it does in the VRAG.
This is probably because victim injury in sexual offenders, other than rapists, is rare and reflects sadism and escalating violence.
Among criminal offenders, murderers typically have a low recidivism rate and many do not have a criminal background. Thus they are not
high risk relative to their peers. Lastly, The authors of the HCR-20 have created a similar instrument designed to assess sexual
re-offending-the SVR-20. As with the SORAG, similar variables are considered, but sexual history items are emphasized, particularly
sexual offenses. There is less research on the SORAG and SVR-20 than their general violence prediction cousins, and at least one study
found disappointing results for the SVR-20. A recent comparison of risk assessment instruments for sex offenders (Barbaree, Langton, &
Peacock, 2001) found the RRASOR, Static-99, VRAG and SORAG to be comparable in predicting sexual, "serious," or "any" recidivism,
although the PCL-R was a weak predictor of sexual recidivism.
How Accurate Are Risk Assessment Instruments?
This is a surprisingly difficult question to answer. Take the 1982 statement from the American Psychiatric Association, where the good
doctors of the time admitted that when they predicted violence, three were likely to be wrong two out of three times. Although this is
not a stellar performance (and there is reason to believe that many acts of violence were not detected), this does not tell the whole
story. Missing from this statement are patients who were implicitly predicted not to be violent. Even if psychiatrists were wrong
two of three times when predicting a violent outcome, they may be accurate 100% of the time when they predict someone won't be violent.
The overall accuracy that is obtained depends on the proportion of total correct decisions divided by the number of decisions made. In
this example, that figure might equal 66% or 97%, depending on how many non-violent patient there were and how accurate the predictions
were for them. These relationships are illustrated in Table 5.
Table 5 - Types of Decision Outcomes
| | Violence Predicted |
| Yes | No |
| Actually Violent | Yes | Hits (True Positives) | Misses (False Negative) |
| No | Misses (False Positives) | Hits (True Negatives) |
Usually social scientists express the degree of relation between two variables with a correlation coefficient. A correlation correficient
can range from -1.0 to 1.0. A correlation of 1.0 indicates one variable is perfectly predictable from the other. A correlation coefficient
of -1.0 is just as useful, but indicates the relationship is reciprocal-as the value of one variable goes up, the other does down. For
example, the authors of the VRAG found that the severity of the index offense was negatively correlated (-.18) with risk of future arrest.
This (-.18) is a negative correlation of low to moderate magnitude. The best measures, such as the PCL-R and VRAG, typically show
corrections with future violence on the order of .25 to .45. However, there are two problems with correlations in this context. One is
that the criterion in risk prevention studies is often dichotomous-did the person commit a violent act or not? Correlations work best on
continuous variables. When one is constrained to only two values (violent or not), the maximum correlation possible is not 1.0, but .67.
In other words, the degree of association can be substantially underestimated. Correlations are also strongly affected by baserates-in
this case, the percentage of the total sample that is actually violent during the follow-up period. If the baserate of a behavior is very
low (say 0.1%), it is nearly impossible to achieve a high correlation, or overall accuracy rate, unless you predict everyone will perform
at most the people in the group do.
For example, assume you are approached by the Catholic Church, who wants you to screen applicants for the convent for potential
dangerousness. You would probably be accurate 99.99% of the time by predicting every nun you evaluate is not a risk. If you were to
identify one nun as possibly violent, it is very likely you would be wrong and lower your accuracy rate. If you want to make sure you
identify that one potentially violent nun in 10,000, you would probably want to select the ten, twenty, or fifty that you think are the
most likely candidates. However, you would add at least nine, nineteen, and 49 (respectively) false positives to your batting average--and
lower it.
There are three other indices in widespread use in risk assessment, two that are currently most wide spread and seemingly gaining
prominence. These include odds-ratio, effect size, and Radio Operating Curves (ROC). For more thorough (but still accessible) discussion
of these topics, see Douglas and Webster (1999) or Quinsey, Harris, Rice, and Cormier (1998).
The odds-ratio is calculated by multiplying the number of True Positives by True Negatives and then dividing by the number of False
Positives times False Negatives. In other words, the odds ratio is a weighted ratio of hits to misses. A typical value for a good
predictor like the PCL-R or VRAG is around 4.0, although values as high as 13 are sometimes observed (Douglas, Ogloff, Nichols, & Grant,
1999). However, this may underestimate actual usefulness, because it is based on dichotomizing scores (30 or above vs. below 30). In an
actual case, a score in the 0 to 10 on the PCL-R is a whole lot better than a score of 29, even though both fall below 30.
The effect size starts by taking difference in the mean scores on a measure such as the VRAG between (in this case) those predicted to be
violent. This number is then divided by the standard deviation of the nonviolent group. The standard deviation (SD) is a measure of how
much the scores of different members of a group differ from the average. A group with a large SD has a lot of variation among its members.
For prediction purposes, we want a large difference in the average scores between those who will be violent and those who will not be. We
also want very little variation within both the groups. We can artificially create this situation by a thought experiment. We select two
groups: Con men who are also sadistic sex offenders and nuns. We obtain PCL-R scores on all the offenders and the nuns. We will probably
find the murders have an average score of about 30-34, the nuns around 0-5. It is unlikely any of the offenders will score below 20, or
that any of the nuns will score above 10 (which is about what Bill Clinton scores, incidentally). PCL-R score will be an excellent
predictor in this group, and will show a huge effect size of approximately 12 to 16. This is because of the large difference in average
scores between nuns and sex offenders, and the small range of scores among nuns. In real life, separating out violent from nonviolent
offenders, the VRAG achieves an effect size of about 1.0. Although much smaller than our example, this is both a sizeable and useful
degree of accuracy.
Radio Operating Curve (ROC) analysis is a statistical technique that was developed to aid sonar operators in the detection of enemy
submarines. Submariners deal with background noise and possible signals. Their task is to determine what is a contact and what is static.
Turning up the sensitivity of their equipment will help detect a faint signal, but will also increase interference. The trade off between
sensitivity and specificity (being able to accurately distinguish the signal from background noise) is inevitable and is illustrated by
the ROC curve, which graphically plots the relationship between hit rate and false alarm rate as the radio operator adjusts his equipment
(see Figure 1). If the equipment (or test) is of no value, you will be right half the time by chance. This is represented by a diagonal
line from the origin to the far corner. Under these conditions, the test has no power to separate hits from false alarms and the two
values are equal at all possible criteria points (cut scores). The diagonal line divides the top half from the lower half of the chart
and represents an Area Under the Curve (AUC) of .50. As the test become more accurate, the curve bends toward the upper left corner and
encompasses a larger area of the chart under the curve. If the test is perfectly accurate, it will have an AUC of 1.00. The VRAG
typically achieves an AUC value of .75, which is half way between useless and perfect. The MacArthur study, using its multiple
classification tree approach, obtained an AUC value of .88. This is about midway between the VRAG's value and perfection. The AUC value
represents the probability that a true positive (a violent person) will score above the cutoff on the test (VRAG). In this case, a
dangerous person has a 75% probability of being identified as such (averaged across all possible cutoff points).
Figure 1 - Illustration of an ROC Curve
The big advantage of the ROC approach is that it is not affected by baserates. Because of this, AUC values are comparable across studies
and different measures.
All indices of effect size are limited by the ability to define good comparison groups. If not all acts of violence or sexual recidivism
are detected, the criterion group of supposed non-offenders will be contaminated to an unknown degree. If the "successful" group is
actually composed of a mixture of recidivists and those that did not, it will not be possible to make a test than can perfectly separate
true successes from failures. No matter how good a test is, it cannot predict a criterion that cannot be reliably measured. In fact, we
may have already reached the point with sex offenders where the tests are better than the criteria available to evaluate them.
The Death Penalty and Assessment of Dangerousness
When the court asks if a person presents an ongoing danger within prison, this is a distinct, empirical question. Studies of violence
prediction based on release to the community cannot be assumed to generalize to prison-especially when someone's life clearly hangs in the
balance. The validity of the vaunted PCL-R for predicting prison violence has recently been challenged (Edens, Petrila, Buffington-Vollum,
2001), and others have reported that prisoners who are given life without possibility of parole or death have lower rates of institutional
violence than other inmates (Cunningham & Reidy, 1998). Although there is evidence that risk assessments can identify those more likely to
have both major and minor infractions (Kroner & Mills, 2002), most of these infractions do not involve violence. And the absolute
incidence of violence among death row inmates is much smaller than most would assume. Marquart, Ekland-Olsson, and Sorenson (1994)
tracked violent acts among 421 Texas death row inmates from 1974-1988. Assaults were committed by only 10.7% of these inmates, although
there were two murders of other death row inmates. Still, over a fifteen-year period, nearly 90% of death row inmates committed no
reported acts of violence. In the absence of direct evidence that risk instruments can make accurate predictions of prison violence, the
best appraisal of future dangerousness will come from the individual's history of behavior in similar settings and the base rate of
violence for the setting where he will be living.
Who Benefits from a Risk Assessment?
Some people are unfairly perceived to be dangerous when they are not. This includes some mentally ill offenders. An attorney may wish to
do his or her own quick and dirty assessment of the client in terms of the factors listed given in the two tables in this article. A
defendant with a low Psychopathy Checklist (score 10 or less) and only a few items checked on the HCR-20 will unlikely be a danger to
society. In addition, the factors that indicate low dangerousness also predict ability to succeed on supervision and to benefit from
psychotherapy.
Let's return to the example at the beginning of the article. When referred for the evaluation, the young man was highly defensive and
brusque. Although he admitted to occasional feelings of severe depression during the interview, he denied all problems on standardized
personality inventory of psychiatric symptoms. He had failed on probation-but only because he visited his wife and their infant child.
There were no complaints from her and she wanted him back home very badly. Apart from his erratic behavior in the offense, the young man
was stabily employed in a good job. He had no criminal record or history of violence, was not impulsive or psychopathic (PCL-R score = 5),
and did not use substances. His VRAG score indicated a 12-17% chance of any further incident involving violence in the next seven years.
Noting the positive changes he and his wife had made, I recommended that he be allowed to have contact with wife, and the judge agreed.
Conclusion
The science of assessing dangerousness is advancing rapidly. At the same time, there is a movement afoot that encourages a more
individualized approach. A thorough risk assessment will necessarily utilize both types of information: Scores on measures such as the
PCL-R and VRAG as well as an examinations of the individual's own, unique pattern of aggressive behavior and its triggers. An emerging
area is the role of protective factors, which may help prevent future incidents of violence.
A risk assessment by a forensic psychologist or psychiatrist can provide an attractive alternative to PSI reports and assessments of risk
from probation or parole boards. Because there is substantial overlap in the factors that predict violence, general recidivism, and
supervision failure, these others issues can easily be addressed as well.
References
Barbaree, H. E., Seto, M. C., Langton, C. M., & Peacock, E. J. (2001). Evaluating the predictive accuracy of six risk assessment instruments. Criminal Justice and Behavior, 28(4), 490-521.
Boer, D. P., Hart, S. D., Kropp, P. R., & Webster, C. D. (1998). Manual for the Sexual Violence Risk-20. Psychological Assessment Resources, Inc.
Cunningham, M. D., & Reidy, T. J. (1998). Integrating base rate data in violence risk assessments at capital sentencing. Behavioral Sciences & the Law, 16(1) 71-95
Douglas, K. S., & Webster, C. D. (1999). The HCR-20 violence risk assessment scheme: Concurrent validity in a sample of incarcerated offenders. Criminal Justice & Behavior, 26(1) 3-19
Douglas, K. S., & Webster, C. D. (1999). Predicting violence in mentally and personality disordered individuals. In R. Roesch, S. D. Hart, & J. R P. Ogloff (Eds.) Psychology and Law: The State of the discipline. New York: Kluwer Academic/ Plenum.
Edens, J. F., Petrila, J., & Buffington-Vollum, J. K. (2001). Psychopathy and the death penalty: Can the psychopathy checklist-revised identify offenders who represent a continuing threat to society? Journal of Psychiatry & Law, 29(4) 433-448.
Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanistic, algorithmic) prediction procedures: The clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293-323.
Hanson, R. K., & Bussiere, M. T. (1998). Predicting relapse: A meta-analysis of sexual offender recidivism studies. Journal of Consulting and Clinical Psychology, 66, 348-362.
Hanson, R. K, & Thornton, D. (2000). Improving risk assessments for sex offenders: A comparison of three actuarial scales. Law and Human Behavior, 24, 119-136.
Hare, R. D. (1991). Manual for the Hare Psychopathy Checklist-Revised. Toronto: Multi-Health Systems.
Kroner, D. G., & Mills, J. F. (2001). The accuracy of five risk appraisal instruments in predicting institutional misconduct and new convictions. Criminal Justice & Behavior, 28(4) 471-489
Marquart, J. W., Ekland-Olsson, S., Sorenson, J. R. (1994). The rope, the chair, and the needle: Capital punishment in Texas, 1923-1990.
McGinnis, K., & Austin, J. (2001). Texas Board of Pardons and Paroles Guidelines Project Contact No. 696-PD-0-P-024. August 1, 2001, Security Response Technologies, Inc.
Monahan, J., Steadman, H. J., Silver, E., Appelbaum, P. S., Robbins, P. C., Mulvey, E. P., Roth, L. H., Grisso, T., & Banks, S. (2001). Rethinking risk assessment: The MacArthur study of mental disorder and violence. New York: Oxford University Press.
Quinsey, V. L., Harris, G. T., Rice, M. E., & Cormier, C.A. (1998). Violent offenders: Appraising and managing risk. Washington DC: American Psychological Association.
Silver, E., Mulvey, E. P., & Monahan, J. (1999). Assessing violence risk among discharged psychiatric patients: Toward an ecological approach. Law & Human Behavior, 23(2), 237-255.
Page 1