Data analysis
Chao Wang
- Determine the aim of the data analysis: to describe, explain (make causal claims) or predict?
- Tidy up your data. For guidance see Karl and Kara (2018).
- Determine what types of data is available, such as cross-sectional, time-series, longitudinal, survival (time-to-event data). Is the data stratified or clustered? Is the outcome variable continuous or discrete? If discrete, is it categorical or count? And if categorical, ordered or unordered? Appropriate methods are available for different types of data.
- Look for any issues with data, such as missing data, outliers, multicollinearity, small sample sizes. Data visualisation may help to detect data issues.
- If the aim is to explain using observational data, it is useful to think about the causal chain so appropriate confounders can be adjusted for.
- Relevant covariates should also be adjusted for in randomised controlled trials (Thompson et al., 2015).
- Check modelling assumptions. For example: linearity in a linear model; if the variance of residuals is constant; if there are any unobserved confounders.
- Report the results by following the guidelines according to the study design. Various guidelines for different applications are available on the EQUATOR Network (https://www.equator-network.org/).
Reference
Karl W. Broman & Kara H. Woo (2018) Data Organization in Spreadsheets, The American Statistician, 72:1, 2-10, DOI: 10.1080/00031305.2017.1375989
Thompson, Douglas D., Hester F. Lingsma, William N. Whiteley, Gordon D. Murray, and Ewout W. Steyerberg. ‘Covariate Adjustment Had Similar Benefits in Small and Large Randomized Controlled Trials’. Journal of Clinical Epidemiology 68, no. 9 (1 September 2015): 1068–75. https://doi.org/10.1016/J.JCLINEPI.2014.11.001.