Using Wearable Activity Trackers to Predict Type 2 Diabetes: Machine Learning–Based Cross-sectional Study of the UK Biobank Accelerometer Cohort

Background Between 2013 and 2015, the UK Biobank collected accelerometer traces from 103,712 volunteers aged between 40 and 69 years using wrist-worn triaxial accelerometers for 1 week. This data set has been used in the past to verify that individuals with chronic diseases exhibit reduced activity levels compared with healthy populations. However, the data set is likely to be noisy, as the devices were allocated to participants without a set of inclusion criteria, and the traces reflect free-living conditions. Objective This study aims to determine the extent to which accelerometer traces can be used to distinguish individuals with type 2 diabetes (T2D) from normoglycemic controls and to quantify their limitations. Methods Machine learning classifiers were trained using different feature sets to segregate individuals with T2D from normoglycemic individuals. Multiple criteria, based on a combination of self-assessment UK Biobank variables and primary care health records linked to UK Biobank participants, were used to identify 3103 individuals with T2D in this population. The remaining nondiabetic 19,852 participants were further scored on their physical activity impairment severity based on other conditions found in their primary care data, and those deemed likely physically impaired at the time were excluded. Physical activity features were first extracted from the raw accelerometer traces data set for each participant using an algorithm that extends the previously developed Biobank Accelerometry Analysis toolkit from Oxford University. These features were complemented by a selected collection of sociodemographic and lifestyle features available from UK Biobank. Results We tested 3 types of classifiers, with an area under the receiver operating characteristic curve (AUC) close to 0.86 (95% CI 0.85-0.87) for all 3 classifiers and F1 scores in the range of 0.80-0.82 for T2D-positive individuals and 0.73-0.74 for T2D-negative controls. Results obtained using nonphysically impaired controls were compared with highly physically impaired controls to test the hypothesis that nondiabetic conditions reduce classifier performance. Models built using a training set that included highly impaired controls with other conditions had worse performance (AUC 0.75-0.77; 95% CI 0.74-0.78; F1 scores in the range of 0.76-0.77 for T2D positives and 0.63-0.65 for controls). Conclusions Granular measures of free-living physical activity can be used to successfully train machine learning models that are able to discriminate between individuals with T2D and normoglycemic controls, although with limitations because of the intrinsic noise in the data sets. From a broader clinical perspective, these findings motivate further research into the use of physical activity traces as a means of screening individuals at risk of diabetes and for early detection, in conjunction with routinely used risk scores, provided that appropriate quality control is enforced on the data collection protocol.

Non-diabetes UKBB participants may suffer from other conditions that may impact their normal physical activity. If these conditions are known, the participants can be excluded from the negative portion of any training set to prevent noise. We have used the primary care Electronic Health Records available from the UK Biobank to identify medical conditions that occurred within an interval of time prior to the accelerometer wear time, and including a short period afterwards. We have then developed an activity impact score to express the potential impairment due to the condition. Controls are then selectively excluded from the control portion (negative class) of the training set by setting a threshold on the activity impact score.
To calculate the activity impact score for a person, we assigned individual scores to the Read v2 and Read v3 codes used to encode conditions as expressed in Electronic Health Records (EHR). For each participant, we then aggregated all scores corresponding to the conditions found in that participant's EHR history within a given time interval around the accelerometer wear time (1 year prior and 6 months after was used for these calculations).
Read v2 uses over 28,000 codes to catalogue clinical events, however these are organised hierarchically, starting from macro categories, called chapters. These are characterised by a "byte-depth", for example: 1-byte depth: 'C....' Endocrine, nutritional, metabolic and immunity disorders 2-byte depth: 'C2…' Nutritional deficiencies 3-byte depth: 'C24..' Vitamin A deficiency 4-byte depth: 'C247.' Vitamin A deficiency with other ocular manifestation 5-byte depth: 'C2470' Vitamin A deficiency with xerophthalmia This organisation makes it possible to assign scores to whole categories with the desired precision. Codes were assigned mostly at byte-depth 2 or 3, and administrative (no-clinical) chapters were removed. This pruning resulted in the assignment of activity impact codes to 215 chapters. Online knowledge resources published by the NHS were used to assess the level of physical impairment for each of the chapters. These were encoded as 1: low/no impact, 10: medium impact or 100: high impact. An individual's total impact score is calculated as the average of all scores for the conditions found in their EHRs (default is 0).
It should be emphasised that the aim of this assignment of codes to individuals is to ensure minimal noise in the control group used to train our classifiers. Considering that the size of the controls is about 30 times that of the T2D (participants), the choice of controls can be very selective. To be conservative, we have chosen a non-linear curve that assigns a high impact score to individuals with few severe impairments and requires uniformly no impairment to achieve a low score. Table 1 shows the complete list of chapters with the corresponding frequency of occurrence for a random sample of about 20,000 participants from the accelerometer cohort. This provides an indication of which chapter are common/rare. Read codes physical impairment impact analysis.

Chapter
The assignment of impact scores to individual Read v2 chapters at a given byte-depth consists of several steps.
Firstly, it should be noted that Biobank uses a combination of Read v2 and Red v3 scores. Since Read v3 are not organised hierarchically, as a preliminary step we have mapped all Read v3 to Read v2, using the official cross-maps available at https://nhsdigital.citizenspace.com/uktc/crossmaps/.
The second step involves selecting which chapters are to be or not be assigned an activity score. The decisions are listed below, along with the rationale for the less obvious of those.
Excluded Chapters 0 -Occupation: not relevant, not easy to link codes to physical impairment.
1 -History/Symptoms. Broad chapter covering a vast number of different topics surrounding medical history. This is a very commonly used chapter (99.16% of individuals at an average occurrence of 31.32) and thus unlikely to be useful for selection.
5 -Radiology/Medical Physics. Holds codes surrounding X-rays, ultrasounds and other medical screenings. Potential elements of risk of impairment have to be found at level 4 or 5 and require specialised clinical knowledge.
8 -Other therapeutic procedures. This Is a broad chapter that covers Post Operation Monitoring to Physiotherapy, however it is very common (97.30% of individuals at an average occurrence of 28.68) and thus not useful for selection. Also not immediately relevant, as 30% of the entries in this whole chapter are concerned with Medication Review (8B3V.).

-Administration
L -Complications of pregnancy, childbirth and the puerperium. Excluded as the UKBB population starts at age 40 and this code applies to 14.37% of individuals at an average occurrence of 1.89.
P -Congenital anomalies. These are conditions present from birth, such as Heart Defects or Cleft Lips (4.36% of individuals at an average occurrence of 1.27). Given the age of the individuals of this study, this code was excluded as individuals with congenital anomalies are assumed to have adjusted their lifestyles accordingly at this approximately middle stage of life expectancy (40+ years).
Q -Perinatal conditions. Describes conditions immediately before or after giving birth (0.48% of individuals at an average occurrence of 1.06). Again excluded due to the age of the individuals in this study.

R -[D]
Symptoms, signs and ill-defined conditions. Information too vague to be used confidently.
Vague symptoms used mainly when the medical practitioner is unsure of a condition and wants to submit symptoms into the system.
T -Causes of injury. This is associated with chapter S (which is included), and this chapter simply holds the reason behind an entry.

U -[X]External causes of morbidity and mortality.
Z -Unspecified conditions. Not specific enough to be used.

-Diagnostic Procedures
This is a broad chapter that covers a vast range of medical procedures or tests, Table 1 shows this chapter exists in 83.99% of individuals at an average occurrence of 8.2. However, it is difficult to understand why the procedure has been performed, One subchapter describes tests for disabilities, but the outcomes of these tests are not the main focus, because someone who is being tested is likely to have been diagnosed to have physical ailment using other codes, namely : Disability assessment -Physical (39***)

-Operations & Procedures
Operation & Procedures can be anything from clearing an ear canal to a full lung transplant. Separating the two extremes cannot be achieved simply by truncating the hierarchy at a set byte level. Instead, specific codes that were deemed relevant were selected at 2-byte level and their potential severity was classified as (High {H}, Medium {M}, Low {L}) depending on the potential procedures impacting an individual's health or their ability to be active.

A -Infection and Parasitic Diseases
This is the diagnostic chapter for infectious and parasitic diseases like Syphilis, with a 46.38% prevalence and average occurrence of 2.25. This is a diagnostic chapter, and all subchapters were included at a 2-byte depth.

B -Neoplasms (Cancers)
This chapter includes different types of cancer-related diagnoses, with 36.3% prevalence and occurrence 2.07. These codes were considered at 2-byte depth as it is possible to differentiate between cancerous and benign neoplasms at this level. Cancers, including treatment procedures such chemotherapy, are known to have a severe effect on an individual's mental and physical wellbeing which could affect their activity and sleep levels.

C -Endocrine, nutritional, metabolic and immunity disorders
This is a large and diverse chapter holding large amounts of codes for diabetes and vitamin deficiencies, which appears in 33.29% of individuals at an (average occurrence 3.41). This chapter is extremely technical and granular, most of the key differentials are done at 5-byte depth, such as differentiating between complications of Type-1 and Type-2 diabetes. This chapter was taken at a 3-byte depth with the understanding that these subchapters are still diverse and more granular.

D -Diseases of blood and blood-forming organs
This contains conditions like Anaemia, with 7.44% prevalence. All subchapters were taken at a 2byte depth.

E -Mental disorders
Conditions such as Psychosis or Learning Disabilities, 28.38% prevalence, average occurrence 2.83. All subchapters were taken at a 2-bytes depth.

F -Nervous system and sense organ diseases
Chapter holds information on Meningitis related conditions and sense organ diseases such as Deafness (62.71% prevalence, occurrence 3.46). All subchapters were taken at a 2-bytes depth.

G -Circulatory system diseases (Cardiac/Heart Diseases)
Holds codes relating to the Heart or Circulation of Blood (57.15% prevalence, occurrence 4.67). All subchapters were taken at a 2-bytes depth.

H -Respiratory system diseases
Holds conditions such as Pneumonia and Lung Diseases, 58.12% prevalence, average occurrence of 3.81. All subchapters were taken at a 2-bytes depth.

J -Digestive system diseases
Holds conditions relating to the Mouth, Stomach and Lower Digestive Tract, 45.81% prevalence, average occurrence of 2.76. All subchapters were taken at a 2-bytes depth.

K -Genitourinary system diseases
Holds conditions relating to the Genital and Urinary Glands, Table 1 shows this exists in 50.86% of individuals at an average occurrence of 3.14. Most codes found were for UTI's and conditions relating to the female anatomy. All subchapters were taken at a 2-bytes depth as it is possible to differentiate between male, female and common codes.

M -Skin and subcutaneous tissue diseases
Describes the conditions such as Sore Skin or Ulcers, 64.28% prevalence at an average occurrence of 3.49. The codes in the chapter are mostly mild so this chapter was accepted at a 1-byte depth.

N -Musculoskeletal and connective tissue diseases
Holds conditions such as Arthritis, 73.67% prevalence at an average occurrence of 4.89. All subchapters were taken at a 2-bytes depth.

S -Injury and poisoning
Holds codes for injuries such as Broken Bones and Overdoses, 52.61% prevalence,. average occurrence of 2.51. This chapter has a large range of severity between the codes, from Fractured Skull to Wasp Sting. All subchapters were taken at a 2-bytes depth as then it is possible to differentiate between the severity of the subchapters at this level.
Two additional chapters: 2 -Examinations / Signs, and 4 -Laboratory Procedures, were excluded on the basis of their very high prevalence (98.42%, 97.22%) and thus low selectivity. Table 2 shows the impact of removing and truncating chapters to a set byte-depth as described above. There is a small reduction in the number of individuals, this is to be expected. There is also a drastic reduction in the number of events, this is partially down to the amount of information deemed not useful, but this is mostly down to the exclusion of chapters 2 & 4 which have the highest number of occurrences. Most notably, we have reduced the number of different codes from 28,109 to 215. This is a significant reduction in the amount of information needed to process.

Assignment of Health Impact Scores
Assigning a code impact score The scores assigned to the events in the chapters listed above reflect the criteria that minor events will occur more frequently than major events. A tiered system was used to numerically classify the event based on impact to health, High {100}, Medium {10} and Low {1}. The average of these scores is then used to normalise the effects on individuals having larger numbers of events from High Impacts chapters. With the higher the value, the less likely they are to be considered physically active.
Note that individuals who have no events recorded within the specified time frame are assigned 0 health impact score and we will assume their activity levels are not impaired. Table 3 shows the impact of using codes surrounding individual's activity tracker readings (6 months before and 1 month after). As expected, there is a reduction of individuals and events, as well as a reduction in unique codes, meaning I could have assigned impacts scores to 171 codes instead of 215.