- Manual Therapy

Available online at www.sciencedirect.com
Manual Therapy 14 (2009) 152e159
www.elsevier.com/math
Original Article
Interobserver reliability of physical examination of shoulder girdle
Jettie G. Nomden a,b,*, Anton J. Slagers a,b, Gert J.D. Bergman c, Jan C. Winters c,
Thomas J.B. Kropmans d, Pieter U. Dijkstra a,b,e
a
Department of Rehabilitation, University Medical Center Groningen, University of Groningen, P.O. Box 30.001,
9700 RB Groningen, The Netherlands
b
Share, Graduate School for Health Research, University Medical Center Groningen, University of Groningen, The Netherlands
c
Department of General Practice, University Medical Center Groningen, University of Groningen, The Netherlands
d
Department of Medical Informatics & Medical Education, University of Ireland, Galway, Ireland
e
Department of Oral and Maxillofacial Surgery, University Medical Center Groningen, University of Groningen, The Netherlands
Received 2 March 2007; received in revised form 20 December 2007; accepted 6 January 2008
Abstract
The object of this study was to assess interobserver reliability in 23 tests concerning physical examination of the shoulder girdle.
A physical therapist and a physical therapist/manual therapist independently performed a physical examination of the shoulder
girdle in 91 patients with shoulder complaints of varying severity and duration.
The observers assessed 23 items in total: active and passive abductions, passive external rotation, hand in neck (HIN) test, hand
in back (HIB) test, impingement test according to Neer, springing test of the first rib and joint play test of the acromioclavicular
joint. The interobserver reliability was evaluated by means of a Cohen’s Kappa, the weighted Kappa and the intraclass correlation
(ICC). Criteria for acceptable reliability were: Kappa value 0.60, ICC 0.75 or an absolute agreement 80%.
The results showed that Kappa values varied from 0.09 (springing test first rib, stiffness) to 0.66 (springing test first rib, pain),
weighted Kappa varied from 0.35 (pain during HIB) to 0.73 (range of motion HIB) and ICC varied from 0.54 (abduction passive
starting point painful arc) to 0.96 (active and passive ranges of motion in abduction). In total 11 (48%) items fulfilled the criteria of
acceptable reliability.
In conclusion, there appears to be a great deal of variation in the reliability of the tests used in the physical examination of the
shoulder girdle. Over 50% of the tests did not meet the statistical criteria for acceptable reliability.
Ó 2008 Elsevier Ltd. All rights reserved.
Keywords: Reliability; Observer; Shoulder girdle; Physical examination
1. Introduction
Shoulder complaints are common in the locomotor
system. The yearly prevalence of shoulder complaints
ranges from 100 to 160 per 1000 patients in the general
population (Winters et al., 1999). The diagnosis in
patients with shoulder complaints is difficult because
currently no uniformity exists as to how shoulder
* Corresponding author. þ31 50 3613651.
E-mail address: [email protected] (J.G. Nomden).
1356-689X/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.math.2008.01.005
complaints should be labelled or defined (Green et al.,
1998a). Diagnostic criteria for defining shoulder disorders are neither consistently nor reliably applied (Green
et al., 2003). According to the Guidelines for Shoulder
Complaints of the Dutch College of General Practitioners (Winters et al., 1999) most shoulder complaints
are elicited by shoulder disorders, probably resulting
from strain, aseptic inflammation or degeneration of
soft tissues of the glenohumeral joint or of structures
in the immediate surroundings. In most cases it cannot
be determined accurately which structure is affected.
J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159
Hence, the term ‘shoulder complaints’ is used as a working as well as a final diagnosis (Winters et al., 1999).
Shoulder complaints may result in considerable disability (Green et al., 2003). Shoulder pain often impairs
the ability to sleep, and restricted and/or painful range
of motion of the shoulder influences performance of
activities of daily living (Green et al., 2003).
Treatment of shoulder complaints is aimed at reducing symptoms such as pain and restricted range of
motion, increasing functional activities and re-starting
participation in work and social activities.
In order to focus treatment and to evaluate effectiveness of treatment, reliable tests are an important prerequisite. Reliability of assessment of shoulder complaints
and function of the shoulder differs per study, ranging
from low to moderate (Green et al., 1998b; de Winter,
1999; Hoving et al., 2002; Terwee et al., 2005).
Recently, movement tests of the shoulder and shoulder
girdle, as recommended in the Guidelines for Shoulder
Complaints of the Dutch College of General Practitioners (Winters et al., 1999), together with additional
functional tests were used as outcome measures in a randomised controlled trial (Bergman et al., 2002). Thus these
tests were used for evaluation of treatment efficacy. To
interpret the outcomes of this study it is important to
evaluate the reliability of the tests used. Differences found
in the trial within or between groups may be caused by
differences in treatment effects but also by differences
between observers.
The aim of the present study is to determine the interobserver reliability of the physical examination of the
shoulder girdle as performed in the above-mentioned
randomised controlled trial.
2. Methods
Consecutive patients eligible for participation in the
randomised controlled trial were invited to participate
in this reliability study. Inclusion criteria for patients
in that trial were presence of shoulder complaints, not
being treated for these complaints in the past 3 months
and aged over 18 yrs. Shoulder complaints were defined
as pain at rest or provoked or aggravated by movement
in the area between neck and elbow. Informed consent
was obtained from all patients. Extension of the pain
to the region between the scapulae, to the cervical spine
or to the lower part of the arm was not an exclusion
criterion. Exclusion criteria for patients were presence
of specific rheumatic disorders, shoulder complaints
caused by acute severe trauma or previous surgery, signs
of cervical nerve root compression, or shoulder complaints related to general internal pathologic conditions
of thoracic and abdominal organs (Bergman et al.,
2002). Most patients included in the randomised controlled trial also participated in this reliability study.
153
Physical examinations were performed independently
by a physical therapist and a physical therapist/manual
therapist (JGN and AJS, 27 and 12 yrs practice experience, respectively). Before the study (clinical trial and
reliability study) all tests were standardised and the observers received training in the application of the tests.
The diagnosis was unknown to the observers. The order of examination by the two observers varied. Each
observer examined about half of the patients as first observer followed by the second observer who performed
the same examination a few minutes later. Patients
were sitting upright during all examinations. All tests
were performed in the morning. During the study the
two physical therapists did not exchange information
concerning the outcome of the assessments. Patients
were instructed not to give any comment about the previous examination.
2.1. Examination of shoulder girdle
The examination of the shoulder girdle was based
upon the Guidelines for Shoulder Complaints of the
Dutch College of General Practitioners (Winters et al.,
1999). The examination was focused on range of motion
of the shoulder (visually assessed to the nearest 5 ), on
pain experienced (four point ordinal scale: no pain, little
pain, much pain, and excruciating pain) and on occurrence of pain during movement.
The following movements were examined:
2.1.1. Functional tests: hand in neck (HIN) test and
hand in back (HIB) test
Both tests were slightly modified from the tests
described by Solem-Bertoft et al. (1996) (Appendix 1).
The HIN and HIB were graded in to a score (range
0e7) based upon the end point reached. Additionally,
during the HIN and HIB pain was assessed on a four
point ordinal scale: no pain, little pain, much pain,
and excruciating pain.
2.1.2. Active abduction
The starting position of the patient was arm stretched
alongside the body, held in external rotation and thumb
directed sidewards. The patient lifted his extended arm
sideways and upwards in the frontal plane until it was
beside his head. The range of motion and pain was
assessed.
2.1.3. Painful arc during active abduction
Presence of a painful arc was assessed and if present
starting point and end point was visually estimated.
2.1.4. Passive abduction
The starting position of the patient was arm stretched
alongside the body, held in external rotation and thumb
directed sideward. The patient was asked to keep the
154
J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159
shoulder arm muscles relaxed. The observer lifted the
extended arm sideways and upwards in the frontal plane
until it was beside the patient’s head. The range of
motion and pain was assessed.
2.1.5. Painful arc during passive abduction
Presence of a painful arc was assessed and if present
starting point and end point was visually estimated.
2.1.6. Passive external rotation
The starting position of the upper arm was 0 elevation, elbow held in 90 and forearm in neutral position.
The patient was asked to keep the shoulder arm muscles
relaxed. The observer supported the arm at the wrist,
locked the elbow, and held the arm bent at 90 and
rotated it outwards in the transversal plane. Range of
motion and pain was assessed.
2.1.7. Impingement test
The impingement test was only performed if no glenohumeral restrictions were found. The starting position was similar to passive abduction. During the test
scapular rotation was prevented with one hand by the
observer, while the other hand of the observer raised
the patient’s arm in abduction, causing the greater tuberosity to impinge against the acromion. The results
of the tests were interpreted as positive or negative
(Neer, 1983).
2.1.8. Springing test of the first rib
The observer exerted force with the second metacarpophalangeal joint on the first rib of the patient, assessing range of motion (normal or restricted), pain (present
or absent), and joint stiffness (present or absent) (Jirout,
1986).
2.1.9. Acromioclavicular joint assessment
Visual assessment of swelling (present of absent)
and joint play test of the acromioclavicular joint. The
observer manipulated the joint in the sagittal plane
assessing presence of pain (present or absent).
According to the protocol in the randomised clinical
trial each observer assessed the active and passive movements in one or two movements. No verbal encouragements were given by the observers during active tests.
2.2. Statistical analysis
Data analyses were performed in SPSS (version 12).
Percentage of absolute agreement (calculated as the
number of observations in which both observers agreed
with each other divided by the total number of observations), Cohen’s Kappa and weighted Cohen’s Kappa
were calculated to quantify the interobserver agreement
for dichotomous data and ordinal data. Regarding
range of motion of the shoulder, t-tests for related
samples were performed and intraclass correlations
(ICCs) were calculated. Additionally Bland and Altman
(1986) plots were made for range of motion of the shoulder to analyse if the differences between observers were
consistent across the range of measurements. Criteria
for acceptable reliability were a Kappa value 0.60,
and an ICC of 0.75 (Landis and Koch, 1977; Brouwer
et al., 2003). Poor Kappa value can be present although
absolute agreement is very high, probably related to lack
of variation in cell filling. Therefore, an absolute agreement of 80% was also a criterion for an acceptable
agreement.
This study was approved by the Medical Ethics Committee of the University Medical Center Groningen,
University of Groningen, The Netherlands.
3. Results
A total of 91 participants were included in the study.
Table 1 shows baseline characteristics of the patients.
Generally, the duration of shoulder complaints ranged
between 3 and 5 weeks. Many patients had had previous
periods of shoulder complaints. In total 76 participants
were assessed 6 weeks after inclusion in the trial and 15
participants were assessed 12 weeks after inclusion in
the trial. Table 2 shows Cohen’s Kappa and absolute
agreement for dichotomous data. For one test (‘acromioclavicular swelling’) Cohen’s Kappa could not be
Table 1
Baseline characteristics of the participating patients.
Variables
N ¼ 91
Age in years (mean SD)
Male
Female
48.5 (11.8)
43 (47.3%)
48 (52.7%)
Duration complaints
0e2 weeks
3e5 weeks
6e8 weeks
9e11 weeks
12e26 weeks
>26 weeks
Previous periods of shoulder complaints
No
Yes, left shoulder
Yes, right shoulder
Yes, both shoulders
Previous neck complaints (minimally 1 week)
No
Yes
9 (9.9%)
28 (30.8%)
13 (14.3%)
11 (12.1%)
12 (13.2%)
18 (19.8%)
31 (34.1%)
23 (25.3%)
28 (30.8%)
9 (9.9%)
36 (39.6%)
55 (60.4%)
Development of complaints
Rapid/acute
Gradual
28 (31%)
63 (69%)
Shoulder pain (range 0e10)
3.4 (2.2)
Shoulder restrictions (range 0e10)
4.5 (2.8)
J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159
Table 2
Cohen’s Kappa and absolute agreement for dichotomous data.
Variables
Kappa Absolute
agreement (%)
Active painful arc (present, absent)
Passive painful arc (present, absent)
Impingement (present, absent)
Acromioclavicular swelling (present, absent)
Springing test first rib range of
motion (normal, restricted)
Springing test first rib stiff (present, absent)
Springing test first rib pain (present, absent)
0.46
0.52
0.47
e
0.26
74
76
74b
99a
66
0.09
0.66
68
82a
e: Cohen’s Kappa could not be calculated because of incomplete
filling of the 2 2 tables.
a
Tests fulfilling criteria for acceptable reliability.
b
Test only performed if no restrictions in glenohumeral range of
motion were found.
calculated because of incomplete filling of the 2 2
tables.
For two tests (‘acromioclavicular swelling’ and
‘springing test first rib pain’) acceptable reliability (absolute agreement > 80%) was found. Table 3 shows the
results in absolute agreement for ordinal data. In two
functional tests (‘pain HIN’ and ‘pain HIB’) the absolute agreement was less than 80%. In the other seven
tests the reliability was acceptable. Data of the differences between observers, results of t-tests for differences
in mean range of motion between observers, and the
corresponding ICC are shown in Table 4. For the tests
‘abduction passive starting point of painful arc’ and
‘passive external rotation’ the difference between the observers was statistically significant. For these outcome
variables no plots were made because systematic differences between the observers exist (Bland and Altman,
1986). In Figs. 1 and 2 Bland and Altman plots are
shown for ‘abduction range of motion active’ and ‘abduction active starting point of painful arc’ to illustrate
the magnitude and direction of differences across the
range of measurements. No funnel shape was observed
in the plots. Similar results are found in Bland and
Altman plots for ‘abduction passive range of motion’,
Table 3
Weighted Kappa and absolute agreement for ordinal data.
Variables
Kappa
Absolute
agreement (%)
Range of motion
HIN
HIB
0.52
0.73
85a
94a
Pain
HIN
HIB
Abduction active
Abduction passive
External rotation passive
Impingement
Acromioclavicular joint
0.52
0.35
0.65
0.69
0.50
0.62
0.51
79
73
90a
91a
82a
91a
90a
a
Tests fulfilling criteria for acceptable reliability.
155
‘abduction active end point of painful arc’ and for
‘abduction passive end point of painful arc’. Thus differences between observers were consistent across the range
of measurements for these tests. In two tests (range of
motion in active and passive abductions) an ICC of
>0.75 was observed. For these tests the interobserver reliability was acceptable.
In summary, 11 of the 23 tests (48%) had an acceptable interobserver reliability.
4. Discussion
Substantial variation in the interobserver reliability,
ranging from poor to good reliability in the tests of
physical examination of the shoulder girdle was found
in this study. In the 23 tests performed 11 (48%) fulfilled
the criteria of an acceptable reliability. For the tests on
dichotomous data two out of seven tests showed acceptable reliability, for tests on ordinal data seven out of
nine tests showed acceptable reliability and for tests on
interval data two out of seven tests showed acceptable
reliability (Tables 2e4). Thus, tests on ordinal data
showed a higher reliability than tests on dichotomous
or interval data.
One might consider several explanations for the overall moderate reliability reported in this study. These explanations are related to the data level of the physical
examination, training effects within patients, difference
between observers and changes of the outcome as a result
of the first physical examination.
4.1. Data level
An explanation for better reliability results of tests at
ordinal data level could be that patients prefer more response options. Answering on a more gradual, ordinal,
scale (no pain, little pain, much pain, and excruciating
pain) might be easier than answering on a dichotomous
scale: pain absent or present. On a gradual scale patients
can indicate more precisely how they experience the pain
during the test.
The tests producing interval data were all tests based
on visual estimation by the observer of active/passive
range of motion and starting/end point of a painful
arc. Two movements at most were performed during
which the examiner had to do his assessment because
this was the trial protocol. For the movements active
and passive abductions a good reliability was found
despite the large standard deviations of the mean difference between the observers. For the observer it may be
more difficult (i.e. less reliable) to assess range of motion
during the movement, as for instance the starting point
or end point of a painful arc, than in an end position of
active and passive abductions. A significant difference
between the assessments of the two observers was found
156
J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159
Table 4
Differences between observer 1 and observer 2, results of t-test for related samples and ICCs.
Variable
Observer 1
mean (SD)
Observer 2
mean (SD)
Abduction range of motion
Active
Passive
160.2 (40.0)
165.9 (33.0)
160.2 (38.8)
165.0 (34.3)
0.0 (11.1)
1.0 (10.0)
1.000
0.346
0.96a
0.96a
Abduction active
Starting point of painful arc
End point of painful arc
104.8 (39.2)
158.0 (26.4)
110.7 (37.2)
153.0 (31.4)
5.9 (28.5)
5.0 (26.7)
0.180
0.226
0.72
0.57
Abduction passive
Starting point of painful arc
End point of painful arc
114.7 (35.2)
162.6 (24.8)
126.9 (36.3)
160.9 (26.5)
12.2 (33.1)
1.6 (19.5)
0.032b
0.617
0.54
0.72
55.5 (19.4)
63.2 (21.5)
7.7 (14.2)
<0.001b
0.70
External rotation range of motion passive
a
b
Difference
mean (SD)
p value
ICC (one
way random)
Tests fulfilling criteria for acceptable reliability between observers.
Tests showing significant differences between observers.
in ‘abduction passive starting point of painful arc’ and
‘passive external rotation’. The standard deviations of
the mean difference between the observers provide an indication of the range of differences found between these
observers. These differences are illustrated in the Bland
and Altman plots (Figs. 1 and 2).
The standard deviation of mean difference between
the observers for ‘abduction active’ (11.1 ) indicates
that if two observers measure the same patients a difference of 2 11.1 is to be expected in 95% of the number of patients. For the standard deviation of the
‘abduction passive end point of painful arc’ a difference
of 2 19.5 is to be expected in 95% of the number of
patients. These differences are considerable in the light
of the total range measured.
4.2. Training effects
patient during the physical examinations because ‘pain
HIN’ and ‘pain HIB’ tests were the first tests in the examination. Patients may find it difficult, initially, to indicate
the experienced pain level (no pain, little pain, much
pain, and excruciating pain) during the test.
4.3. Observer differences
Examinations were carried out by two experienced
physical therapists, who had been trained extensively
in performing the tests. However, one of them was
also a manual therapist. Manual therapy is a postgraduate course undertaken following a physical therapy
course. Manual therapists are specialised in diagnosing
and treatment of dysfunction of the musculoskeletal
system Therefore, it is possible that the physical signs
and symptoms were interpreted differently by the two
observers.
It is remarkable that the tests on an ordinal scale ‘pain
HIN’ and ‘pain HIB’ did not show an acceptable reliability. It is possible that a training effect occurs within the
Fig. 1. Bland and Altman plot of the mean (of the two observers) active range of motion abduction plotted against the difference in active
range of motion abduction between observers. Note that some data
points represent more than one observation.
Fig. 2. Bland and Altman plot of the mean (of the two observers) starting point of painful arc abduction active plotted against the difference
between observers of starting point of painful arc abduction active.
Note that some data points represent more than one observation.
J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159
Practical issues dictated which of the two observers
performed the first or the second examination. In
a post-hoc analysis the influence of observer sequence
was analysed for differences in active and passive abductions, for passive external rotation and for start
and end of painful arc, active and passive. Only for
two movements, passive external rotation and start of
the painful arc (passive) did the sequence have a significant influence on the differences between the observers.
It is not clear why this phenomenon occurred only in
these two movements. For all other movements the observer sequence had no effect on the differences between
observers.
4.4. Systematic changes of the outcome as a result
of the first examination
It is possible that the first examination induces
a change in magnitude or presence of an outcome measure and as a consequence the results of the second
examination will differ from those of the first. For instance, pain provoked during the first examination of
active abduction may increase pain perception during
the second examination or may even influence the outcome of the assessment of the range of motion.
4.5. Random changes of outcome as a result of the
first examination
Finally it is possible that the differences between the
first and the second examinations are based on random
changes within the outcome variables assessed. An explanation for these differences cannot be given.
Theoretically it might be possible that current neck
pain influenced reliability of physical examination.
This influence would only be possible if the influence
of neck pain were different for the two observers and
thereby inducing a difference in outcomes of the observers. This differential influence of neck pain on reliability results was not analysed in this study.
4.6. Other considerations
The tests analysed in the reliability study are all tests
commonly used in physical therapy practice and in clinical medical practice. The choice to include a test in this
reliability study was pragmatic. Retrospectively it might
have been more interesting or clinically more relevant if
other tests focussing on functional limitations or pathophysiology had been investigated.
For the tests in this study no technical instruments
were used, which make these tests suitable for use in
daily practice. Some reliability studies on shoulder
movement have been performed when using instruments (Riddle et al., 1987; Green et al., 1998b; Hoving
et al., 2002), but is not incontestably found that using
157
instruments results in higher reliability. In Tables 5
and 6 an overview of the results of studies similar to
the current is presented.
Comparing the present results with those of other
studies is difficult because of differences in research
methodology, for instance differences in diagnostic tests
applied, joints assessed, active and passive motions, testing positions, and the profession of the observers
(Riddle et al., 1987; Croft et al., 1994; Green et al.,
1998b; de Winter, 1999; Hoving et al., 2002; Terwee
et al., 2005). Within these studies and in the current
study a similar variability was found concerning interobserver reliability (Tables 5 and 6). In the studies by
Green et al. (1998b) and Hoving et al. (2002) the same
design was used for a similar patient group. The physiotherapists achieved overall better results for interobserver reliability than the rheumatologists. Perhaps the
training of physical therapists in physical examination
during these studies was more extensive than that of
the rheumatologists. In Terwee’s study (Terwee et al.,
2005) five movements of the shoulder were estimated
visually. Three tests and test conditions were similar to
those in the current study. Active and passive abductions showed acceptable reliability in the current study
as well as in the Terwee’s study. The mean difference
and the standard deviation for active and passive abductions were higher in Terwee’s study than in the current
study. In de Winter’s (1999) study interobserver agreement of the examination of the shoulder joint was performed and Kappa’s and absolute agreement were
calculated. Five tests in that study were similar to the
tests in the current study and similar reliability results
were found (Table 6).
In the current study two observers were used for
logistical reasons. Because of the use of two observers
we felt obligated to investigate interobserver differences.
Within the time limits of this trial it was not possible to
assess additionally the intraobserver reliability. In daily
practice it is possible that two colleagues may temporarily take over each other’s duties. In that case, interobserver reliability assessed in this study is important.
Differences in assessment results may be caused by
improvements of the complaints but it may also reflect
interobserver differences.
The strength of the current study is the substantial
number of patients (n ¼ 91) that participated. All
patients who were asked to participate in this reliability
study actually participated. However, not all patients
participating in the trial of Bergman et al. (2002) could
be recruited because of logistical reasons. The authors
have no reason to believe that the selection of the patients
for the reliability study may have influenced the results.
Interobserver reliability of physical tests was moderate in this study as well as in other studies. Differences
in assessments performed by two observers on the same
subject do not automatically indicate actual change in
158
J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159
Table 5
ICC reliability in similar shoulder movements in different studies.
Observers: n
Patients: n
Year
Profession
Professional
experience
Method
Standardization
Time interval
Movements/
comparable
movements: n
Flexion, act.
Abduction, act.
Abduction, pass.
Riddle
(Riddle
et al., 1987)
Croft
(Croft
et al., 1994)
study 1
Croft
(Croft
et al., 1994)
study 2
Green
(Green
et al.,
1998b)
Hoving
(Hoving et al.,
2002)
Terwee Nomden
(Terwee current
et al.,
2005)
16
50
1987
PT
6.3 yrs (mean)
6
6
1994
PT
e
6
6
1994
PT
e
6
6
1998
PT/MT
Experienced
6
6
2002
Rheumatologists
Experienced
Goniometer,
large, small
e
e
7/2
Visual
Visual
Inclinometer Inclinometer
2
201
2005
PT
3 and
10 yrs
Visual
2
91
current
PT/MT
27 and
15 yrs
Visual
Yes
15 min
2/2
Yes
e
2 (4 pos)/2
Yes
1 hr
8/2
Yes
1 hr
8/2
Yes
<1 hr
5/3
Yes
<5 min
5
0.72
0.77a
0.72
0.49
0.88a
0.87a
0.96a
0.96a
4.7 (20.1)
4.1 (22.7)
0.0 (11.1)
1.0 (10.0)
0.88a (lying)
0.29
0.73
0.70
11.2 (12.0)
7.7 (14.2)
0.87a (large),
0.84a (small)
External
rotation, act.
External
0.88a (large),
0.90a (small)
rotation, pass.
Hand behind back
0.95a
0.99a
0.43
0.37
0.80a
0.73
Terwee
(Terwee et al.,
2005) mean
difference
(SD)
Nomden
current
mean
difference
(SD)
94a
(abs. agr.)
e: not reported.
a
Acceptable reliability.
the outcome measures of that subject. Determining
improvement or deterioration is not easy. It is still not
clear which (combination of) tests should be used in diagnosing shoulder disorders and evaluation of shoulder
treatment.
It is recommended that more interobserver reliability
studies should be carried out on tests producing ordinal
data in order to analyse sources of measurement
variation.
5. Conclusion
A great variability in reliability exists in physical tests
of the shoulder girdle. Despite the use of a standardised
protocol to assess physical examination of the shoulder
girdle, acceptable interobserver reliability was hard to
achieve. In this study overall reliability was moderate.
The most reliable tests in the study were tests at ordinal
data level.
In other reliability studies substantial variability was
also been found in interobserver reliability. Unfortunately, it is difficult to compare these studies. Further
investigations have to be carried out to find out which
(combination of) tests is most suitable to assess shoulder
complaints.
Clinicians and researchers should interpret outcomes
of physical examination of the shoulder girdle cautiously
because outcomes might be biased by observer differences, but also by other sources of variation.
Table 6
Kappa and absolute agreement in shoulder tests.
Patients (n)
Statistics
Abduction active, pain
Abduction passive, pain
External rotation passive, pain
Presence painful arc active
Presence painful arc passive
a
Acceptable reliability.
de Winter (1999)
Nomden (current)
de Winter (1999)
Nomden (current)
201
Kappa
0.73a
0.44
0.45
0.67a
0.59
91
Kappa
0.65a
0.69a
0.50
0.46
0.52
201
Abs. agreement
95%a
89%a
80%a
88%a
89%a
91
Abs. agreement
90%a
91%a
82%a
74%
76%
J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159
159
Appendix 1
HIN and HIB as assessed in the randomised controlled trial concerning the effectiveness of manual therapy of the shoulder girdle
Score
HIN, an external rotation movement pattern
HIB, an internal rotation movement pattern
1
From hand on thigh up to and including HIN on affected side,
underarm in sagittal plane (90 flexed elbow fixed against hip)
From HIN at affected side and underarm in sagittal plane just to
touching with fingertips processus spinosi C7 and underarm
(about) in sagittal plane
From fingertips on processus spinosi C7 with underarm (about)
in sagittal plane just to elbow in frontal plane
From fingertips on processus spinosi C7 and underarm in frontal
plane just to fingertips at heterolateral angulus superior scapulae
with underarm in sagittal plane
From fingertips on heterolateral angulus superior scapulae with
underarm in sagittal plane just to elbow in frontal plane
From fingertips on heterolateral angulus superior scapulae with
elbow in frontal plane just to (almost) full abduction/elevation,
but painful terminal passive abduction/elevation
Active full abduction/elevation and (almost) painless terminal
abduction/elevation
From hand on thigh till lateral side thigh-bone with palm of
the hand
From palm of the hand on lateral side of thigh-bone till back
of the hand on homolateral buttock
2
3
4
5
6
7
From back of the hand on homolateral buttock till back of the
hand on lumbosacral crossing (the height of processus spinosus L5)
From back of the hand on lumbosacral crossing till fist on waist
(the height of processus spinosi L3)
From fist on waist till back of the hand on thoracolumbal
crossing (the height of processus spinosi Th 12)
From back of the hand on thoracolumbal crossing to fingertips
on heterolateral angulus inferior scapulae
From fingertips on heterolateral angulus inferior scapulae till back
of the hand between scapulae (the height of processus spinosi Th 7)
HIN and HIB slightly modified from Solem-Bertoft et al. (1996).
References
Bergman GJ, Winters JC, van der Heijden GJ, Postema K, Meyboomde-Jong B. Groningen manipulation study. The effect of manipulation of the structures of the shoulder girdle as additional treatment
for symptom relief and for prevention of chronicity or recurrence of
shoulder symptoms. Design of a randomized controlled trial within
a comprehensive prognostic cohort study. Journal of Manipulative
and Physiological Therapeutics 2002;25(9):543e9.
Bland JM, Altman DG. Statistical methods for assessing agreement
between two methods of clinical measurement. Lancet 1986;
1(8476):307e10.
Brouwer S, Reneman MF, Dijkstra PU, Groothoff JW, Schellekens JM,
Goeken LN. Testeretest reliability of the Isernhagen work systems
functional capacity evaluation in patients with chronic low back
pain. Journal of Occupational Rehabilitation 2003;13(4):207e18.
Croft P, Pope D, Boswell R, Rigby A, Silman A. Observer variability
in measuring elevation and external rotation of the shoulder.
Primary Care Rheumatology Society Shoulder Study Group.
British Journal of Rheumatology 1994;33(10):942e6.
Green S, Buchbinder R, Glazier R, Forbes A. Systematic review of
randomised controlled trials of interventions for painful shoulder:
selection criteria, outcome assessment, and efficacy. BMJ 1998a;
316(7128):354e60.
Green S, Buchbinder R, Forbes A, Bellamy N. A standardized protocol for measurement of range of movement of the shoulder using
the Plurimeter-V inclinometer and assessment of its intrarater
and interrater reliability. Arthritis Care and Research 1998b;
11(1):43e52.
Green S, Buchbinder R, Hetrick S. Physiotherapy interventions for
shoulder pain. Cochrane Database of Systematic Reviews 2003;
2:CD004258.
Hoving JL, Buchbinder R, Green S, Forbes A, Bellamy N, Brand C,
et al. How reliably do rheumatologists measure shoulder movement? Annals of the Rheumatic Diseases 2002;61(7):612e6.
Jirout J. X-ray studies on the dynamics of the first rib. Manual
Medicine 1986;2:59e61.
Landis JR, Koch GG. The measurement of observer agreement for
categorical data. Biometrics 1977;33(1):159e74.
Neer CS. Impingement lesions. Clinical Orthopaedics and Related
Research 1983;173:70e7.
Riddle DL, Rothstein JM, Lamb RL. Goniometric reliability in a clinical
setting. Shoulder measurements. Physical Therapy 1987;67(5):668e73.
Solem-Bertoft E, Lundh I, Westerberg CE. Pain is a major determinant of impaired performance in standardized active motor tests.
A study in patients with fracture of the proximal humerus. Scandinavian Journal of Rehabilitation Medicine 1996;28(2):71e8.
Terwee CB, de Winter AF, Scholten RJ, Jans MP, Deville W, van
Schaardenburg D, et al. Interobserver reproducibility of the visual
estimation of range of motion of the shoulder. Archives of Physical
Medicine and Rehabilitation 2005;86(7):1356e61.
de Winter AF. Diagnosis and classification of shoulder complaints.
Vrije Universiteit; 1999. p. 23e37.
Winters JC, Sobel JS, van der Windt DAWM, Jonquiere M, de
Winter AF, van der Heijden GJ, et al. NHG Standaard Schouderklachten (versie 1999) (Guidelines for shoulder Complaints of the
Dutch College of General Practitioners (version 1999)). Huisarts
en Wetenschap 1999;42:222e31.