Health & Fitness
84 min read
Validating the Chinese Diabetes Health Scale: A Psychometric Analysis
Dove Medical Press
January 20, 2026•2 days ago

AI-Generated SummaryAuto-generated
The Diabetes Health Literacy Scale (DHLS) was psychometrically validated for Chinese adults with type 2 diabetes. The study confirmed its reliability and a three-factor structure assessing informational, numeracy, and communicative health literacy. The Chinese DHLS is deemed a suitable tool for assessing diabetes-specific health literacy, aiding in targeted interventions.
Introduction
Background
Between 1990 and 2022, the global prevalence of diabetes among adults (including both type 1 and 2) doubled, rising sharply from approximately 7% to about 14%, with the most significant increase occurring in low- and middle-income countries (LMICs). However, diabetes treatment rates in LMICs have stagnated at low levels, while residents in North America, Central and Western Europe, and the Pacific region have seen significant improvements in treatment access. This has led to a widening global disparity in diabetes care, leaving nearly 450 million people (59%) with diabetes worldwide untreated in 2022.1 To address this, it is crucial to vigorously implement health programs in LMICs that facilitate diagnosis and effective management. China currently has the largest diabetic population in the world, accounting for approximately one-quarter of all global cases. The focus is primarily on type 2 diabetes mellitus (T2DM), which represents over 90% of all diabetes cases in the country. As a highly preventable disease, its high incidence and persistently rising prevalence trend pose significant challenges for chronic disease management.2
Effective self-management is the foundation of diabetes control, and health literacy (HL) is widely recognized as a critical factor influencing self-management behaviors and health outcomes.3–6 The Consortium Health Literacy Project European defines HL as the capacity of individuals to access, understand, evaluate, and apply health information to make decisions regarding healthcare, disease prevention, and health promotion.7
Literature Survey
Adequate HL empowers patients with diabetes to navigate complex medical information, adhere to prescribed treatments, and engage in scientific dietary and exercise planning, all of which contribute to improved glycemic control and a reduced risk of complications.8–10 Currently, a variety of general HL assessment tools are available.11–15 Among them, the Rapid Estimate of Adult Literacy in Medicine and the Test of Functional Health Literacy in Medicine are the most frequently adopted in clinical research. The former assesses HL through a word recognition test, while the latter incorporates a numeracy component.16 However, these generic instruments often fail to comprehensively capture the specific knowledge and skills required for diabetes self-management. Subsequently, the 14-item Health Literacy Scale (HL-14) is a concise instrument specifically designed to assess the HL of diabetic patients in clinical settings.17 Its primary strength lies in evaluating three key dimensions—functional, communicative, and critical—that patients employ when confronted with health information, which has demonstrated good reliability and validity.18,19 Nevertheless, a notable limitation of the HL-14 is its inability to assess numeracy skills, which are essential for diabetes management tasks such as interpreting blood glucose levels and calculating carbohydrate intake based on food packaging labels. To address this gap, Lee et al developed the new comprehensive Diabetes Health Literacy Scale (DHLS).20 This tool is designed to measure the integrated dimensions of informational, numeracy, and communicative HL. The DHLS has demonstrated robust psychometric properties, meeting four validity indicators (content, construct, convergent, and criterion-related validity) and two reliability indicators (internal consistency and test-retest reliability). The psychometric properties of the scale have been examined in both Persian and Malaysian versions, with studies reporting satisfactory reliability and validity.21,22 Furthermore, application of the DHLS by Shirooka et al demonstrated its value in reducing distress and burnout, enhancing self-management skills, fostering supportive networks, and ultimately improving patient quality of life.23 Similar findings were reported in other studies.24,25 While the DHLS has been developed internationally, its validity and reliability among Chinese patients with type 2 diabetes remain unvalidated. Given the association between HL and metabolic outcomes in Chinese population,26 developing culturally appropriate assessment tools is crucial.
Study Aim
Therefore, this study aimed to introduce the Chinese version of the DHLS and evaluated its psychometric properties for T2DM patients.
Method
Design and Sample
A cross-sectional study was conducted in Wuhu City from May 2023 to June 2024, involving hospitalized patients with type 2 diabetes from the First Affiliated Hospital of Wannan Medical College. The inclusion criteria were as follows: (1) age ≥ 45 years; (2) clinically confirmed diagnosis of T2DM according to the the International Classification of Diseases (10th Revision); (3) clear consciousness with intact mobility and cognitive ability; and (4) willingness to participate and complete the questionnaire. Patients were excluded if they met any of the following criteria: (1) presence of severe mental disorders or intellectual impairment; (2) severe diabetes-related complications or loss of self-care ability; (3) other critical illnesses (eg, severe cardiovascular diseases, serious infectious diseases, or cancer) or auditory/visual impairment due to T2DM complications; or (4) pregnancy or other specific types of diabetes.
The sample size was determined using the criterion proposed by Kendall, which recommends 10 to 20 times the number of variables.27 Given 14-item questionnaire, we inflated the initial estimate by 20% to compensate for potential non-response or invalid data, arriving at a minimum target of 168 participants. Ultimately, a total of 251 valid questionnaires were collected.
The Instrument
The DHLS comprises 14 items across three dimensions, assessing psychometric properties through questioning. Items measure composite aspects of informational (Q1:Q7), numeracy (Q8:Q11), and communicative HL (Q12:Q14). Responses are rated on a five-point Likert scale indicating the answer most consistent with the respondent’s current state (0 = Strongly Disagree, 4 = Strongly Agree). The original scale demonstrated good reliability across all dimensions, with a Cronbach’s alpha of 0.91 and an intraclass correlation coefficient of 0.89.
Translation Procedure
The cross-cultural adaptation was conducted with permission from the original DHLS developers and follows the Brislin model used in other studies.28,29 The process consisted of four stages: (1) Forward translation: The original scale was independently translated by a Ph.D. in Nursing and an Doctor of Medicine in Endocrinology, both native Chinese speakers proficient in English. The research team discussed and reconciled discrepancies to produce a consensus version. (2) Back-translation: This consensus version was independently back-translated into English by two medical English teachers who were blind to the original DHLS. This process was iterative, ensuring semantic and content equivalence with the original, resulting in the preliminary Chinese version of the DHLS. (3) Expert review: A multidisciplinary panel comprising three nursing professionals and two psychologists evaluated each item for cultural and linguistic appropriateness. (4) Pilot testing: The pre-final version was administered to 30 diabetic patients, and their feedback was used to refine the wording and finalize the Chinese DHLS. The final content of the DHLS is provided in Supplementary Table 1.
The Stage of Pre-Survey
A pre-survey was conducted in May 2023 using a convenience sample of 30 patients with T2DM who met the inclusion criteria, recruited from the First Affiliated Hospital of Wannan Medical College. During the pre-survey, participant comprehension of each questionnaire item was assessed through interviews. The results indicated a total scale score ranging from 17 to 47, with a mean of 31.27 ± 8.29. The time to complete the questionnaire ranged from 3 to 5 minutes, averaging 3.53 minutes.
Data Collection
This study was completed using a questionnaire that collected sociodemographic information, diabetes-related conditions, and DHLS. A multistage sampling approach was adopted. First, the First Affiliated Hospital of Wannan Medical College was randomly selected from all tertiary hospitals in Wuhu City, Anhui Province. Second, four department were purposively chosen within the hospital: endocrinology, geriatrics, traditional Chinese medicine, and dermatology. Third, survey stations were set up in these departments to randomly recruit patients with T2DM. Data collection was carried out in two phases: a pre-survey and a formal survey. All questionnaires were distributed and collected on-site, with each participant completing the questionnaire only once. A total of 260 questionnaires were distributed, and 251 valid responses were returned, with a response rate of 96.54%. To ensure data quality, all investigators received standardized training prior to the survey, which included instruction on patient communication strategies and scale scoring criteria. After obtaining written informed consent, questionnaires were administered in a one-on-one setting and completed independently by participants.
Statistical Analysis
Descriptive statistics were used to summarize the demographic and clinical characteristics of the participants. Continuous variables were presented as mean and standard deviation for normally distributed continuous variables, whereas medians and interquartile ranges were utilized for non-normal continuous variables, while categorical variables were expressed as frequencies and percentages. Group comparisons of scale scores across gender and educational attainment were performed using independent samples t-tests. No missing data were encountered in the final sample of 251 participants.
The psychometric validation was conducted using a two-stage analytical strategy that integrates Classical Test Theory (CTT) and Item Response Theory (IRT). This combined approach advances beyond previous validations by evaluating both scale-level properties (reliability, factor structure) and item-level functioning (discrimination, difficulty, precision).
Reliability of the Chinese version of the DHLS was assessed through internal consistency, measured by Cronbach’s alpha coefficient. An alpha value greater than 0.70 was considered acceptable Item-total correlations were also examined, with a threshold of 0.50 used to indicate adequate correlation between each item and the total scale score. Validity was evaluated through multiple approaches: Construct validity was examined using both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). The same sample was used for both EFA and CFA. The factorability of the data was confirmed by the Kaiser-Meyer-Olkin (KMO) measure and Bartlett’s test of sphericity before conducting EFA with principal axis factoring.30–32 EFA (the principal component analysis with Varimax rotation) was used to study the structural validity. With EFA, the criterion for the factor loading of each item was set at not less than 0.40 on its primary factor, and the cumulative contributing rate of the extracted common factors was required to be higher than 40%. Furthermore, the three-factor structure was examined using CFA, and multiple indices were employed to assess model fit: the chi-square to degrees of freedom ratio (χ2/df) < 5 was considered acceptable, the comparative fit index (CFI) and Tucker-Lewis index (TLI) both > 0.90, the root mean square error of approximation (RMSEA) < 0.08, and the standardized root mean square residual (SRMR) < 0.08. Convergent validity was supported by calculating the average variance extracted (AVE) > 0.50 and composite reliability (CR) > 0.70 for each subscale. Discriminant validity was confirmed using the Fornell-Larcker criterion, whereby the square root of the AVE for each construct was required to exceed the correlations between that construct and other constructs in the model.
IRT analysis was conducted to provide a more nuanced examination of item-level characteristics.33–35 Prior to IRT analysis, we assessed whether the data met the assumption of unidimensionality. This was evaluated using the ratio of the first to second eigenvalues from exploratory factor analysis, with a ratio greater than 3:1 considered indicative of a dominant general factor sufficient for unidimensional IRT modeling. Both the Graded Response Model (GRM) and Generalized Partial Credit Model (GPCM) were fitted, with model selection based on the smaller Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). The selected model was subsequently used to estimate item discrimination (α) and difficulty parameters (β). Model fit was assessed at both the scale level (M2, RMSEA, TLI, CFI, SRMSR) and item level using Orlando-Thissen S-X2 statistics. Item information curves (IICs) and the total (scale) information curve were examined to evaluate the measurement precision across different levels of the latent trait.
All statistical analyses were performed using R software (version 4.5.1). The following packages were utilized for specific analyses: psych for EFA; lavaan for CFA; mirt for IRT modeling; and ggplot2 for scree plots. A two-sided P-value of less than 0.05 was considered statistically significant.
Result
Descriptive Statistics
This study included a total of 251 patients with diabetes mellitus, and the relevant characteristics are described in Table 1. The overall mean age was 64.62±10.37 years, comprising 142 males (56.57%) and 109 females (43.43%). The mean BMI was 24.21±3.26 kg/m2. Among the participants, 98.8% were married, 44.62% had an educational level of primary school or below, and 59.76% resided in urban areas. Regarding medical insurance, 43.82% had employee health insurance. In terms of diabetes-related characteristics, 49.00% of patients had complications, 15.94% had a disease duration exceeding 20 years, and 62.15% reported a family history of diabetes.
Table 1 Frequency Distribution of Patient Characteristics (n=251)
As shown in Supplementary Table 2, the distribution of DHLS scores across gender and educational attainment reveals notable patterns. Males consistently outperformed females in all three subscales and in the overall composite score (all P <0.001). Furthermore, patients with primary school education or lower consistently demonstrated poorer performance on all HL scales (all P <0.001).
Reliability Analysis
The reliability analysis of the Chinese version of the DHLS demonstrated excellent internal consistency (Supplementary Table 3), with a Cronbach’s alpha coefficient of 0.938. All items were retained in the final scale as deletion of any individual item did not improve the overall alpha coefficient, and all item-total correlations exceeded the recommended threshold of 0.5.
Validity Analysis
Exploratory Factor Analysis
The KMO measure was 0.90, and Bartlett’s test of sphericity was significant (χ2 = 3119.70; P < 0.001), supporting the factorability of the data. The scree plot justified the extraction of three factors (Figure 1), which collectively explained 65% of the total variance, with individual contributions of 34%, 12%, and 19%. All items demonstrated strong loadings (0.49–0.99) on their respective factors, exceeding the 0.40 threshold with no cross-loadings (Supplementary Table 4), thus confirming the unmodified Chinese version of the scale.
Confirmatory Factor Analysis
CFA was initially conducted to test the original three-factor structure of the Chinese version of the DHLS. The initial model demonstrated an unacceptable fit to the data (see Table 2 and Figure 2a). The model was subsequently refined using modification indices to enhance model fit. The revised three-factor model showed acceptable fit across multiple indices: χ2/df = 3.522, SRMR=0.051, RMSEA=0.079 (0.066, 0.093), TLI = 0.926, and CFI = 0.943 (see Table 2 and Figure 2b).
The AVE values for the three subscales were 0.760, 0.575, and 0.524, respectively, all exceeding the recommended threshold of 0.50. The corresponding CR values were 0.899, 0.764, and 0.705, all above the 0.70 benchmark, thus supporting satisfactory convergent validity. Discriminant validity was also established. The square roots of the AVE values for each subscale (0.872, 0.758, 0.724) were greater than the correlations between any pair of subscales (r12=0.744, r13=0.714, r23=0.692), confirming that the constructs are distinct from one another.
Item Response Theory Models
The assumption of unidimensionality was supported by a first-to-second eigenvalue ratio of 5.59, indicating a strong dominant factor underlying the scale. Model fit comparison between the GRM and the GPCM supported the selection of the GRM, which showed lower values on both the AIC and the BIC (AICGRM=5374.689, BICGRM=5593.268; AICGPCM=5381.763, BICGPCM=5600.341).
The overall fit of the GRM was acceptable across multiple indices: M2=60 (P=0.043), SRMSR = 0.094, RMSEA = 0.077 (0.060. 0.094), TLI = 0.906, and CFI = 0.914. While the M2 was statistically significant, this is commonly observed in models with moderate to large sample sizes and is interpreted alongside the approximate indices, which collectively suggest acceptable model-data fit. Item-level fit was assessed using the Orlando-Thiesen S-X2 statistic. All 14 items demonstrated satisfactory fit (all P>0.05), with item-level RMSEA values ranging from 0.000 to 0.073. Notably, all estimated discrimination parameters (α) fell within an acceptable range of 1.065 to 62.831 (Table 3). Furthermore, the difficulty parameters (β) for each item followed the expected monotonic decreasing pattern from threshold β1 to β4 (Table 3). The steep slopes of the Item characteristic curves confirm high discrimination, and the ordered thresholds (β1 to β4) are evidenced by the sequential rightward shift of the category response curves for each item (Figure 3a).
Analysis of the IICs revealed a multi-peaked distribution, with Items 1 and 2 providing the greatest informational value (Figure 3b). Collectively, the total (scale) information curve peaked within the ability range of −1 to 1 standard deviation, confirming that the scale offers optimal precision for assessing diabetic patients with moderate levels of the latent trait (Figure 3c).
Discussion
This study introduced and localized the DHLS and evaluated its psychometric properties among Chinese patients with T2DM using a three-factor structural equation model and IRT modeling. The results demonstrated that the Chinese version of the DHLS exhibits good reliability and validity, with its three-factor structure validated, establishing it as a reliable and effective measurement tool.
In terms of scale reliability, both the total DHLS score and its three constituent subscales demonstrated great internal consistency, with Cronbach’s alpha coefficients all exceeding 0.80. This finding aligns closely with the reliability coefficient from the original DHLS development study, which achieved a Cronbach’s alpha of 0.91. While the excellent test-retest reliability was validated through an intraclass correlation coefficient of 0.89, it is important to note that this study did not assess temporal stability.20 Similarly, the Persian version and Malaysian version scales also demonstrated robust psychometric properties. The Persian version achieved a reliability coefficient of alpha=0.919, with the computational literacy subscale at 0.879, the communication literacy subscale at 0.784, and the information literacy subscale at 0.865.23 Furthermore, the Malaysian localized version achieved composite reliability indices of 0.962, 0.836, and 0.828 based on Cronbach’s alpha coefficients for its three components,22 indicating that the scale maintains strong measurement stability across different cultural contexts.
The combined application of factor analysis and IRT in this study offers complementary validation perspectives and represents a methodological advancement over previous CTT–based approaches to the DHLS in the Chinese context. While factor analysis confirms structural validity at the scale level, IRT delivers essential insights into item-level functioning by revealing how effectively each item discriminates across the HL spectrum and pinpointing where measurement precision is greatest. This dual-method framework advances the field by shifting psychometric validation from confirming scale structure to precisely characterizing measurement performance, thereby establishing a more rigorous and clinically actionable standard for future HL instrument development.
The three-factor structure of the Chinese version was robustly supported, which is consistent with the theoretical model established during the original scale’s development,20 and similar to other linguistic adaptations.22,23 Both Exploratory and Confirmatory Factor Analyses confirmed the model, with EFA showing three well-defined factors (informational, numeracy, and communicative literacy) without cross-loadings, and CFA indicating an acceptable fit for the modified model. This collective evidence demonstrates that the three target HL constructs are distinct and structurally stable in the Chinese cultural context, aligning well with the core competencies required for diabetes self-management. A cross-cultural comparison of item retention further highlights the scale’s adaptability. Both the present study and the Persian version validation retained all scale items without deleting any questions, with all factor loadings exceeding the 0.4 threshold.23 However, in contrast to these findings, the Malaysian version required the deletion of item Q8 (Calculate the next time to take diabetes medication) due to a factor loading exceeding 1.0.22 The researchers hypothesized that this anomaly might reflect the specific context of long-term medication management among the studied population, where extensive prior education on medication timing could have rendered this item less relevant for measuring health literacy competencies,36 suggesting a potential ceiling effect or cultural specificity in its application.37 Furthermore, differences in DHLS scores across gender and educational groups may reflect sociodemographic disparities in HL access and self-management engagement. These findings underscore the importance of considering population characteristics when interpreting scale scores and designing targeted interventions.
Subsequently, this study applied IRT to conduct an in-depth, item-level evaluation of the scale’s measurement properties. Previous validation studies of other language versions of the DHLS did not employ this psychometric approach. The GRM model demonstrated satisfactory discrimination parameters for all 14 items, indicating each item’s strong ability to differentiate between respondents with different levels of the underlying HL trait. The difficulty parameters exhibited the expected monotonically increasing pattern across response categories, thereby confirming the validity of the item design and the logical ordering of response options. Analysis of the scale’s information function revealed that the DHLS provides the most precise measurement for patients with moderate HL, which indicates the scale most effectively distinguishes and identifies the large patient population whose HL is neither extremely high nor low—yet precisely those most in need of targeted education and support. However, when evaluating extremist groups, scores should be interpreted with caution or supplemented by other assessment methods.
The validation of these measurement characteristics highlights the potential clinical value introduced by the DHLS. The Chinese version of the DHLS effectively fills a crucial gap in domestic tools for diabetes-specific HL assessment, particularly in assessing numeracy skills. Compared to more general HL instruments like the HL-14,38 its design treats numeracy as an independent dimension, thereby enabling precise capture of patients’ actual capabilities in core self-management tasks such as interpreting blood glucose values and calculating carbohydrates. This specialized focus is crucial for comprehensively evaluating the multifaceted nature of health literacy among diabetes patients. From a practical implementation perspective, this scale could provide Chinese clinicians and public health researchers with a concise and effective screening tool. By evaluating scores across three dimensions, educators may be able to develop personalized intervention plans: providing clear science communication materials for patients with insufficient information literacy; offering medication and dietary calculation aids for patients with weak computational skills; and conducting role-play training in doctor-patient communication techniques for patients with poor communication literacy. This multidimensional assessment approach will help optimize the allocation of limited healthcare resources by directing specific interventions to address identified literacy gaps, which might in turn support better clinical outcomes and quality of life, as suggested by prior research.11,12 Particularly in a country like China, where the burden of diabetes is heavy and healthcare resources are unevenly distributed,39,40 the systematic application of this tool could facilitate the systematic identification of patients with insufficient HL in clinical practice and inform the design of targeted interventions. Therefore, future adoption of this scale may contribute to efforts in tiered diagnosis and treatment, enhance community-based diabetes management, and potentially improve the overall landscape of diabetes care in China.41,42
The main strength of this study lies in its first combined application of the structural equation model and IRT modeling to assess the psychometric properties of the Chinese version of the DHLS. However, our study also presents certain limitations. First, the factor structure was derived and confirmed within the same sample, which may limit the external validity of the structural model. Future research should replicate the CFA in an independent sample. Second, the sample primarily originated from hospitals within the same region, potentially introducing selection bias. Future research should validate the generalizability of the DHLS using broader, more representative samples. Third, this cross-sectional design precludes longitudinal testing of the predictive validity of HL on long-term clinical outcomes (eg, HbA1c changes). Future longitudinal studies are warranted.
Conclusions
The Chinese version of the DHLS exhibits a robust three-dimensional structure, comprehensively assessing the three core competencies of information, calculation, and communication among adults with type 2 diabetes. Notably, the inclusion of calculation as a distinct dimension adds specific value in capturing patients’ abilities to handle quantitative self-management tasks. This study employed both CTT and IRT as complementary validation approaches, confirming the scale’s reliability, structural validity, and item-level functionality. Critically, IRT analysis indicated that the scale offers optimal measurement precision for individuals with moderate HL, making it particularly suitable for screening and tailoring interventions for this large patient subgroup. However, several limitations should be acknowledged. The use of a single-region hospital-based sample and a cross-sectional design may affect generalizability and preclude causal inferences regarding HL and health outcomes. This scale demonstrates potential for application in clinical screening, HL research, and intervention evaluation. Future multicenter longitudinal studies are recommended to establish its predictive validity and effectiveness in real world diabetes management.
Rate this article
Login to rate this article
Comments
Please login to comment
No comments yet. Be the first to comment!
