National Samples, Sexual Abuse in Childhood, and Adjustment in AdulthoodA Commentary on Najman, Dunne, Purdie, Boyle, and Coxeter (2005)Bruce Rind1 and Philip Tromovitch2
Archives of Sexual Behavior; December 2006 AbstractThis article comments on the Najman, Dunne, Purdie, Boyle, and Coxeter (2005) study [*] on the relationship between childhood sexual abuse (CSA) and later sexual functioning in an Australian national sample. We note the value of the Najman et al. study, being well conducted and using a generalizable sample, but critique Najman et al.’s interpretation that their study showed “significant impairment” due to the CSA. We computed effect sizes to show that the “effects” were small, and then show using meta-analysis that these small effects were consistent with results in a series of national samples from other countries. We argue that Najman et al.’s causal statement about CSA’s “impairment” effect was unwarranted given their lack of causal analysis, the well-established fact in other research that CSA is often confounded with third variables, and the fact that CSA was confounded with a key third variable in Najman et al.’s study. Given the hyperbole that surrounds the issue of CSA, we emphasize the need for researchers to adhere to valid scientific principles in inference and precision when reporting the results of CSA research.
IntroductionOne of the major weaknesses of research on the relation between childhood sexual abuse (CSA) and later psychological adjustment has been an over-reliance on clinical samples. Authors of these studies have frequently compounded the problem by inappropriately generalizing their findings to the nonclinical population. Such generalization is problematic because it cannot be assumed that patients seeking help for maladjustment, who have a history of CSA, are representative of nonpatients in the general population not seeking help, who also have a history of CSA. It is for this reason that CSA research using nonclinical samples is important, especially when these samples are selected to be representative of the entire population (Rind & Tromovitch, 1997). Thus, the recent study by Najman, Dunne, Purdie, Boyle, and Coxeter (2005) deserves special attention, given that it was based on a large national sample (n=1793) designed to be representative of Australian adults aged 18–59. We offer commentary on this important study, using as a frame of reference our previous meta-analysis of national samples examining CSA-psychological adjustment relations (Rind & Tromovitch, 1997). Overall, the study was well done and contributes valuable additional data to understanding the relationship between CSA and later adjustment. Nevertheless, the study has several shortcomings that need addressing. These points of concern involve causal inference and statistical precision. Causal inferenceIn non-experimental designs, causal analysis typically involves statistical control via regression analysis. This analysis ranges in sophistication from controlling for a single third variable (e.g., first-order partial or semipartial correlation) to conducting structural equation modeling involving numerous third variables (Pedhazur, 1997). These techniques, even the most sophisticated ones, are not definitive because, among other reasons, true causal variables may not have been examined. Nevertheless, statistical control can vastly advance a valid understanding of causation, and without it causal conclusions from correlational data are much more dubious (Keith, 2006).
Najman et al. did not perform a causal analysis. They did
not statistically control for any third variables (e.g., family
environment) to examine whether obtained correlations between
CSA and adjustment were causal or spurious. Appropriately, they
pointed out this limitation in their Discussion. Therefore, it
was contradictory, surprising, and quite unwarranted for them to
conclude in their Abstract that
This statement is a causal assertion. They are telling readers that CSA “contributes to significant impairment” in some areas, but its “consequences” do not appear to obtain in other areas. This statement is particularly problematic, because abstracts are widely read, and the conclusion that CSA causes significant impairment in a national population is likely to be the key take-home message of the article. But it is inappropriate to give such an impression, because it is well-known in CSA research that CSA is often confounded with negative home and peer environments. Therefore, to offer causal conclusions, even tentative ones, researchers must give serious attention to these confounds. In our meta-analysis of CSA-adjustment correlates from national samples (Rind & Tromovitch, 1997), we examined the issue of causality in detail. Though the national samples studies were sparse on third variable analyses, the evidence that was available was not supportive of inferring causality. For instance, in the most relevant study on national samples, Ageton (1988) showed, using a longitudinal analysis, that negative social environment factors, such as family and school normlessness, peer support for delinquent behavior, exposure to delinquent peers, and attitudes towards deviance, predicted later CSA experiences in a female sample. We noted that Ageton’s results were consistent with the possibility that a negative social environment leads to both poorer adjustment and to CSA, so that the relation between the latter two variables may be spurious. Our meta-analysis on CSA-adjustment relations in college samples (Rind, Tromovitch, & Bauserman, 1998) was motivated, in part, by the attempt to improve causal analysis, which the much larger data bank in the college studies permitted. In our Introduction, we were critical of CSA researchers who had been lax in use of causal language, when their studies’ designs and analyses did not permit causal inference. In our analysis of the college data, averaging across numerous studies, we showed that poorer family environment was consistently confounded with CSA, and that it predicted adjustment problems on average nine times more strongly than CSA did. Poorer family environment predicted most specific symptoms substantially better than CSA did, including sexual problems, the focus of the Najman et al. study. Statistical control analyses across numerous studies eliminated most of the statistically significant relations between CSA and adjustment. In our Discussion, we extensively reviewed other research that indicated that negative family environment factors were consistently confounded with CSA, and that emotional abuse and neglect in particular, but also physical abuse, appeared to be substantially more important factors in accounting for later adjustment problems than CSA. The important point is that scientific reporting of CSA findings has generally shown a bias in inferring harm when not warranted. Given the extreme concern currently expressed in our society over CSA, especially compared to potentially more damaging experiences such as emotional abuse and neglect or physical abuse, and the strong tendency to sensationalize and hyperbolize CSA (Jenkins, 1998), it is paramount that scientific reporting in this area adhere to more cautious statements that reflect strict scientifically appropriate inference. Statistical considerations and precisionAs noted previously, Najman et al. (2005) ended their abstract by stating that CSA “contributes to significant impairment in the sexual functioning of adults, especially women” (our emphasis). To evaluate the actual degree of “impairment,” it is valuable to compute effect sizes. Had Najman et al. done these computations, they would not only have reached a more precise measure of the extent of difference between sexual adjustment in the CSA and control groups, but they could have compared their results to those of other national samples, as provided in our meta-analysis of national samples (Rind & Tromovitch, 1997). In this section, we present computations of effect sizes in Najman et al.’s study, and then compare them with those in our 1997 meta-analysis. Significance testing and p-valuesFirst, we note that presenting p-values associated with statistical tests tells only half the story (e.g., whether an independent and dependent variable are related). But it is also important to determine the magnitude of the relationship. To illustrate, we consider a non-CSA study and then two CSA studies. Cohen (1990) noted that, in 1986, the New York Times reported that a study found a “definite” link between children’s height (adjusted for age and sex) and IQ. Though the link was reported to be “highly significant” based on its associated p-value, Cohen noted that the magnitude of the link was so small that raising a child’s IQ from 100 to 130 would require growing the child by 14 feet. Rind and Tromovitch (1997) and Rind et al. (1998) both found that the relationship between CSA and adult psychological adjustment in the general population was “highly significant” in terms of p-values, but that the magnitude of the relationship was small. The difference between the CSA and control populations was estimated to be nearly two-tenths of a SD. Assuming that individuals with clinically significant problems are at least two SD above the mean, the results of the meta-analyses indicate that, while about two out of 100 control individuals fall in the clinical range, about three out of 100 CSA individuals will. Put another way, the vast majority of CSA individuals will not fall in the clinical range, which contradicts widespread popular and professional impressions that CSA produces pervasive, devastating harm. The important point is that “highly significant,” based on p-values, can be highly misleading in terms of degree of difference between or among groups. Regarding the p-values that Najman et al. did present, it is important to point out that multiple tests can inflate to an unacceptably high level the probability of Type I errors (e.g., falsely inferring that CSA is related to symptoms when it is not). In Najman et al. 10 statistical tests were performed for men, and 12 for women, to assess whether CSA was related to poorer adjustment. In this case, the probability of making at least one Type I error, assuming all null hypotheses were true and individual significance tests were independent, would be 40% for the analyses of the male data, and 46% for the female data. For correlated significance tests, as was likely the case in the Najman et al. study, these probabilities could be substantially higher. Thus, with these many tests, some type of procedure for controlling Type I errors is indicated (Hays, 1994). In our meta-analysis of CSA in national samples (Rind & Tromovitch, 1997), we corrected for Type I errors by using a Bonferroni procedure. This procedure divides alpha, the level of statistical significance, by the number of tests. Najman et al. used multiple tests to examine sexual dysfunction in relation to 7 specific experiences for men and 9 for women, finding one statistically significant result for men and 8 for women at the conventional .05 level. If we apply the Bonferroni procedure just to these multiple tests, the corrected alphas would eliminate the one statistically significant finding for men and half the statistically significant findings for women. The Bonferroni procedure is conservative (Hays, 1994), but still suggests caution regarding Najman et al.’s claim in their Abstract that CSA contributes to “significant impairment in sexual functioning of adults, especially women.” Even without a correction procedure, the claim should omit men altogether, where 6 out of 7 of these tests were not statistically significant. Effect sizes and meta-analysisAn effect size quantifies the degree of difference between an exposure and control group in a precise manner – here, between mean responses by CSA versus control participants. Effect sizes have the important advantage over p-values of permitting precise comparisons across studies, as well as combining results across studies for an overall average when these results are comparable (i.e., homogeneous). These techniques are important components of meta-analysis. Najman et al. noted that they focused on participants with three or more symptoms of sexual dysfunction as an indicator of poor sexual adjustment. Consistent with this cutoff, we used Najman et al.’s data in their Table 3 to compute effect sizes by dividing participants into the two categories of 0, 1, or 2 symptoms versus 3 or more. Consistent with previous meta-analyses, we combined the non-penetrative and penetrative CSA groups into one CSA group, which could then be contrasted with participants without CSA. Thus, we created 2×2 contingency tables for sexual dysfunction, one for men and the other for women. We computed chi-squares, and then converted these statistics to Pearson rs as the measure of effect size. We used Pearson rs so as to compare results with the Rind and Tromovitch (1997) meta-analysis. The chi-square analyses yielded the following results.
Najman et al. also reported results for participants’ physical and emotional satisfaction derived from sexual interactions. Participants indicated that their reactions were extremely, very, moderately, or less than moderately satisfying in physical satisfaction and then in emotional satisfaction. We created two categories by combining “extremely” with “very” and “moderately” with “less than moderately.” The result was a series of four 2×2 contingency tables, two for men and two for women, one each for physical and for emotional satisfaction. We computed chi-squares and then effect sizes (rs).
Chi-square analyses yielded the following.
Table 1 CSA-sexual adjustment effect sizes (rs) in Najman et al. (2005)
Table 2 Study-level effect sizes of measures of adjustment from studies using national probability samples to examine correlates of child sexual abuse
Finally, we averaged the two satisfaction rs for each sex for an overall satisfaction effect size (r=.02 for men; r=.04 for women). We then averaged these overall values with the sexual dysfunction rs to obtain study-level effect sizes for men (r=.06) and women (r=.10) (see Table 1). Table 2 shows the study-level effect sizes for the five national samples from the Rind and Tromovitch (1997) meta-analysis, along with the study-level effect sizes from the Najman et al. study. As the table shows, the effect sizes from Najman et al. were quite similar to those in the Rind and Tromovitch study. Finally, Table 3 presents a meta-analysis of the earlier five national samples studies plus the Najman et al. study, done for men and women separately. The overall effect sizes (i.e., unbiased effect size estimates) were identical to those in the Rind and Tromovitch (1997) meta-analysis (r u =.07 for men; r u =.10 for women). The interpretation, then, remains that the association between CSA and later poorer adjustment is small. Table 3 Meta-analyses of psychological and sexual adjustment correlates of CSA by gender from national probability samples
* p < .05 in the chi-square test. Causal inference revisitedNajman et al. reported results for one third variable – number of sexual partners. We computed effect sizes for men and then for women for number of lifetime partners. To do so, we collapsed certain categories – for CSA, we combined non-penetrative and penetrative cases; for number of partners, we combined 1 with 2–5 partners. Thus, for each gender, we produced a 2×2 contingency table with the variables CSA (yes or no) and number of partners (1–5 versus 6 or more); we then computed a z-score, contrasting proportions of CSA and non-CSA participants who had had 6 or more partners; finally, we converted this z-score to a Pearson r effect size.
Earlier, we reported that the effect sizes for the CSA-sexual dysfunction relations for men and women were, respectively, r=.09 and r=.15, both statistically significant. The finding that CSA, for both men and women, was related to number of partners and to amount of sexual dysfunction to almost exactly the same degree demonstrates confounding and casts suspicion on a causal relation between CSA and sexual dysfunction. Furthermore, it is noteworthy that Laumann, Gagnon, Michael, and Michaels (1994), in their U.S. national sample, also measured number of sexual partners (10 or more); in our meta-analysis (Rind & Tromovitch, 1997), we showed that the associations between CSA and number of partners in that study were r=.12 for men and r=.17 for women, quite similar to the corresponding values in the Najman et al. study. Because of this confounding of CSA with number of partners, Laumann et al. advised caution in making any causal attributions regarding CSA and sexual dysfunction — a caveat missing from the Najman et al. study. Empirical examination is needed, but a speculation for why more sex partners is related to more sexual dysfunction complaints might be that having more partners creates greater opportunity to experience negative reactions of one sort or another. Such reactions may carryover to later partners or later sexual situations. Discussion
Returning to Najman et al.’s (2005)
conclusion in their Abstract that CSA “contributes to
significant impairment” in sexual functioning, the foregoing
commentary and analyses suggest that we should be agnostic with
respect to CSA’s contribution to impairment and that whatever
difference there is between CSA and control participants in sexual
functioning it is not as large as the term “significant”
implies.
Najman et al. noted that “conceptualization and measurement of sexual dysfunction has generated some interesting debate,” with concern being expressed over whether factors such as failing to engage in sex or to enjoy it constitute dysfunction. They later noted the finding from other research that sexual dysfunction is associated with reduced physical and emotional pleasure obtained from sex. Contrary to this association, they further noted that there was no evidence for such a connection in their national sample. It could be argued that the bottom line in sexual functioning is the degree of pleasure, physically and emotionally, that one derives from it. Given the criticisms that have been leveled against the sexual dysfunction construct as measured by Najman et al. and others, along with the lack of correspondence between this controversial construct and the more face-valid construct of physical and emotional pleasure, it follows that the dysfunction construct in this national sample is suspect in terms of validly assessing current sexual functioning. This further weakens the confidence we can have in the claim that CSA led to “substantial impairment” in sexual functioning. It is also important to note that, in terms of sexual dysfunction, although it is true that CSA participants were nearly twice as likely as controls to report three or more symptoms, as Najman et al. emphasized, a substantial majority of CSA participants reported few or no symptoms. Najman et al. did not point out this latter fact, but it is noteworthy that Laumann et al. (1994), who had similar results in their U.S. national study, did. They did so to caution readers against misinterpreting the results as indicating widespread harm. It is also important to point out that the vast majority of CSA participants in the Najman et al. study experienced current sexual relations as extremely or very satisfying both physically and emotionally. This finding further cautions against inferring that the Najman et al. data indicate widespread harm from the CSA. In fact, after making adjustments for Type I errors, performing focused statistical tests, computing effects sizes, paying attention to valid causal inference, and looking not just at rates of poorer adjustment but also rates of normal adjustment, it is difficult to avoid the conclusion that the effects of CSA on adult sexual adjustment in the general population are typically minor at most, particularly for men. This conclusion is enormously at odds with the stereotype repeatedly expressed in the media and politics that CSA “ruins” its victims for life. Though this stereotype arose in large part for political and ideological reasons (Jenkins, 1998), it is important to note that exaggerated, unwarranted claims made by CSA researchers have contributed substantially as well. It is incumbent upon scientific researchers to transcend popular trends and/or prejudices and to express as precisely as they can the empirical reality regarding CSA and its correlates or effects. National samples provide an especially good opportunity to do this, given their representativeness of the general population. The Najman et al. study is improved by precision with respect to correlates of CSA and caution regarding causal inference. Based on the foregoing analyses, a more appropriate ending to their Abstract would be: “CSA in the Australian population is common but, according to the data in the current study, is only weakly associated with poorer sexual functioning in adulthood. Whether this association is causal, however, needs further study.” Finally, it is important to discuss the term “CSA.” Najman et al. used an operational definition of CSA that included both noncontact and contact sexual episodes that occurred before the participant reached age 16, which the participant did not want. As Rind et al. (1998) documented, this definition is just one of many that have been used in research. Others have restricted CSA to sex events using various different age cut-points (e.g., under 14 or under 12). Many definitions have required contact to have taken place. Still other definitions have allowed for willing participation, provided the other partner was older (e.g., at least 5 years older or an adult). It seems commonsensical that variations in the different operational criteria just listed will correlate with differential reactions and outcomes to CSA experiences. The Rind et al. (1998) meta-analytic review is just one notable example of a study that confirmed this expectation. In that review, the association between CSA and adjustment was higher for men when CSA was restricted to unwanted cases than when CSA also included willing cases; only the former association (unwanted cases) was statistically significant. The clear implication is that research studies using particular definitional criteria for CSA should not infer to CSA experiences characterized by different definitional criteria. It may be just as invalid to generalize from findings involving unwanted CSA in early childhood to wanted CSA in adolescence as it is to generalize from clinical samples to the general population. A strength of the Najman et al. study was that it avoided the latter inferential problem, but a weakness concerned the former inferential problem, as discussed next. While many CSA researchers have inappropriately generalized from highly unrepresentative samples to the general population, Najman et al., in discussing caveats to their findings, carefully described to what population their results do apply. On the other hand, Najman et al., like many in this field, did not sufficiently distinguish among types of CSA in terms of associated adjustment correlates. Specifically, in their discussion of caveats to their study, they noted that their study excluded “wanted” cases of sex before age 16 with adults, which they stated “should still be classified as sexual abuse” (p. 524). Rather than speculate on how this exclusion might have affected findings on sexual adjustment, the chief focus of their investigation, they speculated instead on how this exclusion may have affected prevalence rates of CSA. They speculated that prevalence rates were underestimated, but it would have been more relevant, given the focus of their study, to speculate that sexual adjustment differences may have been overestimated. It is problematic to define both unwanted and willing adult-nonadult sex as CSA, study only unwanted cases, but then offer conclusions that apply to both. In the scientific literature, too little attention has been paid to the biasing problems that overinclusive definitions of CSA can produce. In the Rind et al. (1998) meta-analysis, we ended the article by specifically addressing this issue. We offered the recommendation to scientific researchers to be more discriminating in the use of the term CSA, using neutral language rather than the value-laden “abuse” in certain circumstances. Our recommendation stemmed not only from the long history of misuse of moralizing language in scientific discussions of sexual behaviors (e.g., masturbation as “self-abuse,” homosexuality as “perversion”), but from practical concerns of achieving better construct and predictive validity. The term “child sexual abuse,” when applied in scientific discussions to all sexual interactions between adults and minors under age 18, which is common practice, is arguably construct invalid, because adolescents are very different from children in sexuality, yet are included as “children” in this term (Rind, 2004). Labeling willing relations as abuse mixes them together with unwanted cases, which may bias CSA-adjustment correlations, creating the false impression of greater problems for willing relations and lesser problems for unwanted relations, thereby weakening predictive validity. The fact that this recommendation was vehemently attacked by religious conservatives, certain psychoanalytically-oriented therapists, and talk show hosts, and was later condemned by the U.S. Congress, says nothing about its validity (see Rind, Tromovitch, & Bauserman, 2001, for a full discussion). Scientific validity needs scrutinizing attention perhaps most when the topic of investigation deals with deeply taboo and emotionally charged issues, as it does with CSA. The important Najman et al. (2005) study has offered the opportunity to do just this in the areas of statistical precision and causal attributions, as well as construct and predictive validity.
References
|