Big Five Example - BiFiX
big-five-example.Rmd
In this vignette, you will get a complete Walkthrough of how to use
rextor
for your own data analysis. We will use Open Source
data on Big Five personality traits, available from the iScience
Konstanz account on GitHub: https://github.com/iscience-kn/BiFiX.
STAY TUNED!
Step 1: Reading in the Data
We will be reading in the data directly from the iScience GitHub Account. Make sure to select the raw data file on GitHub or just copy the code below.
In this dataset, we do not need the validation variable. It facilitates dropout analysis in some cases (especially in one-item-one-screen designs) but here we will focus on other variables to clean the data1. In the first step, we will read in the WEXTOR based data and assign in to a variable named “raw”, that will become our data frame.
raw <- read_WEXTOR("https://raw.githubusercontent.com/iscience-kn/BiFiX/refs/heads/main/data/BiFiX_data.csv",
keep_validation = FALSE)
Step 2: Data Cleaning
Great! We now have our empirical data in R, ready to be worked with.
But there are some things which are not yet ideal, so will use some more
rextor
functions to make our lives easier.
When you open up the data and look at it, you might notice that the
first variables in the data, which are automatically recorded server
variables have the prefix “.wx.”. In R, we generally want our variable
names to start with a letter for easy access, so we can either use the
namepref
function (for variable name
prefix) to change that to “v_”. Notice that this function will
not add that prefix to the variables that were deliberately named by
someone in WEXTOR. Another option is to remove server-generated variable
prefixes altogether, which is the option we will choose here. It will
only change those variable names that actually have the old prefix that
you specify as shown.
data <- removepref(raw) # namepref(raw, ".wx.", "v_")
We will also perform a “seriousness check” and filter out data of anyone who did not confirm they want to participate seriously. This has proven to improve data quality.
CAVE: With this step you will remove likely remove a lot of your data so make sure you always have a safety copy of the original data. I recommend keeping the raw dataset in your R environment while you prepare your data so you can go back and repeat or adjust any step whenever necessary. That way you can also later look at every row of data in your raw data that was removed in later steps and manually check if that filtering was correct or if you need to add some of the data back into your frame.
data2 <- serious_check(data)
Pipe it all together
When preparing data for further analyses it makes sense to have an
unchanged raw version of the data in R and then save any changes in a
new data frame. However, in order to avoid situations like above with
data1
, data2
etc., you might want to use pipes
to perform several data preparation steps in one go. Since version 4.1.0
R offers a native pipe that looks like this: |>
. It
allows you to take the output from previous functions as the first input
of the next function and thus create a workflow pipeline that is easily
adjustable and usually very efficient and clear to read.
final_data <- raw |>
removepref() |>
serious_check()