Practical Issues with Cleaning/Combining Project Implicit Data

Is current?: 

This page outlines basic practical issues with cleaning and combining data that are (perhaps) unique to Project Implicit data.

For a general overview of Project Implicit data files, see Data Management - General Overview of Project Implicit Data Files.

For detailed instruction on data cleaning/combining, see Data Management - Detailed Steps for Cleaning/Combing Project Implicit Data

Practical Issues with Management of Project Implicit Data

Potential Errors and Recommended Checks

  1. When importing data, free response data sometimes cause problems if responses are too long.
    1. Option: increase length of variable or remove free response data prior to import
    2. This is common with explicit files
  2. Not often, but the server will occasionally repeat rows of data for no reason, and these will cause problems with data management
    1. Identify repeat rows of data and delete them (by sorting variables and checking for identical responses in two sequential rows)
    2. If more than .5% of the total data from a single file, signals a problem
  3. Study_name should always be your study. If not, suggests a server error
    1. The data is from your study but the study_name is wrong
    2. Your study_name was assigned to another study, resulting in other data in your study that shouldn’t be there (you will see variable names that you did not assign)
    3. Investigate these cases – should only be in large datasets from the Demo site. These cases will likely need deleted (report as computer error)
  4. Data files in tall formats (demographics, sessionTasks, iat, explicit) need to be transposed so that one session has one row
    1. Errors occur in transpose if two questions are named the same thing
  5. Check log when merging files and ensure that the number of rows per file is similar for sessions and sessionTasks, then iat and explicit will be similar but have fewer, and demographics will have many more.

Recommended Cleaning and Coding

  1. Much Project Implicit data is saved as character data. Recode numeric variables that are saved as character variables into numeric variables.
  2. .jsp files will have ‘-900’ values for dependency questions – set to missing data
  3. Some people may start the session (have a value for startpage) but never get to the consent, so mark these people who didn’t consent in the sessionTasks file and delete them – if they did not reach consent they are not participants (Carlee's opinion - researcher can determine)
  4. Age is calculated with birth date information and session date
  5. Areas to identify bad data and participant misbehavior
    1. Participants who mindlessly click through .jsp files – can use RT on individual .jsp items
    2. Fast latencies on latency task - sessionTasks file has information on time spent on every task