DeeDive (short for “Data Dive”) is an automatic data exploration tool designed for hypothesis-generation and/or discovering phenomena/patterns you might have missed during previous exploration. To use, enter your email address and upload your data file – can be .csv, .sav (SPSS), .xls (Excel), .xlsx (Excel), .dta (Stata), .sas7bdat (SAS), or several others – with variable names in the first row and IDs in the first column. Data can be any mix of types (strings, numbers, dates, etc.), and missing values are handled in various ways, depending on the analysis. If data are longitudinal, they must be in wide format. Simply upload the data file, enter your email, and DeeDive will email you a .pdf of results, usually within an hour (but can take up to several days for very large data sets). That’s about all you need to know to use DeeDive, but see below for some pointers.
Remember, data can be .csv, .sav (SPSS), .xls (Excel), .xlsx (Excel), .dta (Stata), .sas7bdat (SAS), .dat, or several others – give one a try!
DeeDive is meant to help you explore your data. If you're using DeeDive for scientific purposes (and you're not a statistician yourself), you can take the output to your statistician and s/he will recognize from the output what types of analyses have been done. S/he can run the analyses properly in your data and make visuals that are publication-quality (unlike DeeDive's). In addition, just glancing through the visuals a couple times can help generate ideas even for other data sets you might have.
- Minimum Data Required: DeeDive needs at least 8 variables (columns), and can handle as many as ~9,000. There is no minimum N (rows), but at least 10 is recommended. Note that DeeDive automatically removes any variables with >50% missing data.
- Dates: DeeDive can handle most date formats out there (mm-dd-YY, mm-dd-YYYY, dd/mm/YYYY, etc.), but if you want to be 100% sure your dates are read properly, put them in "YYYY-mm-dd" or "YYYY/mm/dd" format.
- Variable Naming: For visualization purposes, it's best to use only necessary parts of variable names. For example, if you had variables called "Cognitive_Test_Score_Memory", "Cognitive_Test_Score_Attention", and "Cognitive_Test_Score_Verbal", and there were no other variables with "Cognitive_Test_Score" in the name, it would be best to drop that from the variable names, leaving "Memory", "Attention", and "Verbal".
- Variables Chosen for Analysis: DeeDive does not seek the strongest effects in your data, but rather, tries to identify a diverse set of variables that appear to come from different measurement targets. For example, if your data includes a few questions about depression, a few questions about net worth, and a few biometric measures (e.g. height, weight), DeeDive will likely detect that and use at least one variable (the strongest indicator) from each measurement target (depression, net worth, and biometrics) in its analyses.
- Failures: DeeDive is young and under development, so it fails occasionally. There is currently no notification of failure (coming soon), but note that the processing time depends mostly on how many of your variables are binary (e.g. yes/no, true/false, male/female, correct/incorrect, etc.) or ordinal (e.g. 0/1/2 for "small business"/"mid-size business"/"large business"). If you have more than ~200 binary/ordinal variables, it could take >3 hours to get your results; for more than ~500, it could be 48+ hours.