Once your experiment is used by students you will begin to get data. Whether you get on our automated Saturday e-mail or use the Data Fetcher.

Most of the terms are the same across the files. For detailed information on the different files go to these pages.

Data From Within Your Experiment

- Action Level - One row per action. A good place to understand where the data comes from look at this first.
- Problem Level - One row per problem per student. This spreadsheet will have fewer rows than the action level one since there are multiple actions per problem for each student.
- Student Level - This spreadsheet collapses the rows from Problem Level but in doing so adds many more columns, one for each problem the student did. Use the glossary to understand how we give multiple information on each problem while maintaining one row per student.
Student Level + Problem Level - A set of rows per student, each row is for a problem "feature" collected for each problem completed by that student. The problems are shown in opportunity order by what problem the student saw first, sometimes this differs from student to student.

Data From before on your subject pool before they began their experiment.

- Data from your subject pool before they began their experiment. This data is different. It includes student level features (such as gender) Class level Features (Homework Completion, and School Level Features such as (State, Urban Suburban). It also has Student by Class level features (the homework completion of this student z-scored within their class.)

Contents

**1**General student level data for the user:**2**Data on students' performance within ASSISTments prior to starting the problem set of interest:**3**General class level data for the user:**4**Data on the problem set:**5**Data pertaining to the Automatic Reassessment and Relearning System (ARRS) (appears if the teacher has this setting turned on):**6**Generic data on the student's performance within the problem set of interest:**7**Problem specific data for a student's performance within the problem set of interest:

**General student level data for the user:**

**User ID**- The ID of the student doing the problem.
- Assignment ID
- Each time a teacher assigns a problem set that assignment gets a separate number. This is that number.
**Imputed Gender**- Details on how we infer gender from students' names are listed here. The result will either be "Male", "Female" or "Unknown".
**Birthyear**- Birth year as entered by student when account was created.
**Student Grade**- Estimated grade level of student, calculated via grade level of classes in which student is enrolled and birth year.
**Role Type**- Describes the role that the user holds in the system (i.e., Student, Teacher, etc).

**Data on students' performance within ASSISTments prior to starting the problem set of interest:**

**Prior Problem Count**- The number of problems the student had completed in ASSISTments prior to this assignment. This allows you to know something about the student before they did this problem set. You could use this to split student into low and high incoming ability to see in your intervention is effective. If this is empty it means this student has not done any prior ASSISTments work.
**Prior Correct Count**- Goes with prior_problem_count. The number of problems the student had answered correctly in ASSISTments prior to this assignment.
**Prior Percent Correct**- prior_correct/prior_problem_count. The percent of past ASSISTments problems the student got correct.
**Prior Assignment Count**- The number of skill builder assignments previously attempted by the student. This number reflects only previous assignments begun by the student while a member of the class denoted by the "Class ID."
**Prior Completion Count**- The number of prior skill builder assignments successfully completed by the student while a member of the class denoted by "Class ID."
**Prior Percent Completion**- The percent of previously completed skill builder assignments. (Prior Completion Count / Prior Assignment Count)
**Prior Class Percent Completion**- The percent of previously completed skill builder assignments of the student's class as denoted by "Class ID."

**General class level data for the user:**

**Class Grade**- Grade level listed for teacher's class in which assignment was conducted
**School ID**- ID number for the school
**District****ID**- ID for the school's district
**State****ID**- The ID of the school's state

**Data on the problem set:**

**Problem Set ID**- The problem set ID that you are familiar with from the builder (PSA...)
**Problem Set Number**- The numeric value of the problem set that is stored in the database; this links to the encoded sequence id that is used in the builder.
**Problem Set Name**- The title of the problem set (problem set and sequence are the same thing)
**Problem Set Updated At**- The date the problem set was last updated or altered.

**Data on problem set at the class level:**

**Class Assignments Position**- The placement of the assignment within the teacher's assignment page (i.e., 5 means the 5th problem set assigned)

**Data on at the class level (these columns ignore the problem set you are doing: Features like this account for teacher level variables for this Assignment) :****Assignment Started Count**- The number of students who have started this assignment for the given class (assignment_id)
**Assignment Finished****Count**- The number of students who have finished this assignment for the given class (assignment_id)- Classes that have high finished rates are classroom to look at if you have an intervention that causes differential drop out per condition.
**Assignment Homework****Count**- The number of students who have finished the assignment NOT between the hours of 7-3 server time. This is considered school time. If you want to know if the given teachers did the assignment in class or for homework this variable can help you with this.
**Homework Percent**- The percent of students who did the assignment as homework (Assignment Homework Count / Assignment Finished Count)

**Data pertaining to the Automatic Reassessment and Relearning System (ARRS) (appears if the teacher has this setting turned on):**

**ARRS Correctness**- The correctness on the first reassessment test. A '1' represents that the student answered the question correctly and a '0' represents that the student answered the question incorrectly. All other values (nulls or dashes) mean the student was never assigned an ARRS test or has not yet attempted the ARRS question.
**ARRS Delay Days**- The number of days between when the student finished the skill builder and when the ARRS test was assigned.
**ARRS Adaptive Mode**- ASSISTments now has an adaptive version of ARRS. Students that take a longer number of items to learn, get reassessed earlier. It is important when looking at data to know if the student was assigned in adaptive mode. There is a nicely published paper on this features here.

**Generic data on the student's performance within the problem set of interest:**

**Problem Count**- Number of problems done by the student in this assignment
**Release Date**- The time when the assignment showed up in student view
**Assigned Date**- Should be the date the teacher assigned the assignment, in some problem logs the day appears to be missing and only the time is shown.
**Due Date**- Due date for the assignment. Can be set by the teacher as a date and time or just a date.
**Assignment Logs ID**- A unique assignment id for that assignment for that student
**Assignment Types ID**- A numeric value for assignment type origin.
**Assignment Types Origin**- Where the assignment originated (Teacher, ARRS or Placements)
**Class Assignments Assignment Type ID**- Numerical value corresponding with assignment type (1 = ClassAssignment, 6 = ARRS Relearning)
**Assignment Type**- Type of assignment, usually "Class", but can also be "Individual"
**Assignment Start Time**- The logged time when the student began the assignment
**Assignment End Time**- The logged time when the student finished the assignment
**Assignment Time**- The amount of time spent between when the student started the assignment and when the student finished the assignment.
**Late Assignment**- If the student submitted the assignment before the due date or not. Assignments without due dates are never considered late.
**Last Worked On**- This is the last date the student worked on something from the assignment that goes with this problem log
**Mastery Status**- Only significant for a skill builder problem sets
- mastered = Student completed the number of problems required for mastery
- limit exceeded = Student exceeded the daily limit of problems for that skill builder
- not mastered exhausted = Student attempted all the problems in that skill builder
- blank = Student did not fit into one of the previous three categories
**Network State**- ASSISTments allows students to work offline if the teacher turns on the feature.
- CONNECTED = the student did the assignment while online
- DISCONNECTED = the student did the assignment while offline

**Problem specific data for a student's performance within the problem set of interest:**

In our student level file, the data is presented using opportunity count. *This means each columns represent problems completed in the order experienced by the student.*Each field or feature will have as many columns as the maximum number of problems solved (N) by any student. For students who required fewer problems, the extra columns are filled with " - ".

*You will see Scaffold 1 - N if even one problem in that spot had scaffolds. If the student was exposed to a scaffold you will see the data if not you will see "-".*

**Condition Problem 1 - N**- The idea is that we can tell you the path each student is on as they work on the problems.
**Correct Problem 1 - N**- Binary correctness as measure by the student's first action or attempt at solving the problem.
- 1 = Correct on first attempt
- 0 = Incorrect on first attempt, or asked for help
- This column is often the target for prediction. (Neil Heffernan notes that while this is true most of the time, we also have Essay questions that teachers can grade. If this value is .25, it might reflect an essay question on which the teacher scored 1/4 credit. These types of problems are rare.)
**Answer Text Problem 1 - N**- The answer as entered by the student, or the value the student selected in a multiple choice or "choose all that apply" problem.
**Original Problem 1 - N**- Note that if all problems are main problems (no scaffolding problems exist) this will be a static value that will appear in the problem set level data
- 1 = Main problem
- 0 = Scaffolding problem
- If a problem has scaffolding and the student answers incorrectly or asks for the problem to be broken into steps, a new problem will be created called a scaffolding problem. This creates a separate problem log row in the file with the variable original = 0.
**Problem Logs ID Problem 1 - N**- Each problem the student does is recorded as a unique problem log. This is the ID of a logged problem, and is mostly only useful in the database. These fields are not that helpful to researchers.
**Problem ID 1- N**- The ID of a problem as created or observed in the builder. If a problem has multiple main problems and/or scaffolding, everything will be related to this problem ID.
**Problem Number 1 - N**- The raw version of the above field. This is the numerical ID of a problem that is stored in the database. If you see problem logs with the same Problem Number, they represent multiple main problems(or scaffolding problems) that are part of the same overarching problem
**Template ID**- Some problems were all generated from the same template such as most problems in a skill builder. This is the id of the template that the problem was generated from.
**Template Number**- The raw version of the above field, similar to the problem number.
**Problem Sub Part ID 1 - N**- An ID unique to each part of a problem. If a problem has multiple main problems, each main problem will have a different Problem Sub Part ID within a single Problem id. The same goes for scaffolding problems within a main problem.
**Scaffold IDs**- An semicolon separated list of id's representing the ids of the scaffold problems associated with the given problem. These id's are equivalent to Problem Sub Part IDs.
**First Action Problem 1 - N**- A numerical value representing the student's first action taken while working on the problem
- 0 = attempt
- 1 = requested a hint
- 2 = requested that the problem be broken down into steps via scaffolding
- empty = student opened the problem but made no action before leaving the tutor
**Hint Count Problem 1 - N**- The number of hints a student requested throughout the duration of the problem (or that portion of the problem)
**Bottom Hint Problem 1 - N**- The bottom out hint is the last hint for a problem that generally provides the student with the correct answer to allow them to move on to the next problem in their assignment. The numerical value in these fields represents whether or not the bottom out hint was observed:
- 1 = The student asked for the bottom out hint
- 0 or Blank = The student did not ask for the bottom out hint.
**Attempt Count Problem 1 - N**- The number of attempts a student made throughout the duration of the problem (or sub-part of the problem)
**Problem Start Time Problem 1 - N**- The time the student started the problem.
**Problem End Time Problem 1 - N**- The time the student finished the problem.
**First Response Time Problem 1 - N**- The time between when the problem was started and when the student made his or her first action (in milliseconds)
**Actions Problem 1 - N**- A detailed string of the specific keystrokes and actions taken by a student throughout the duration of the problem (or that portion of the problem). This data is difficult to interpret and may not be the best spot for researchers to begin.
**Condition Problem 1 - N**- Still in production, this field will trace the experimental condition for each student, logging the random assignment of the tutor.
**Problem Name 1 - N**- The name of the assistment as created in the builder. If you created your problem set and named your problems (assistments) with specific names, you may be able to discern condition quickly using this feature (i.e., problem 1 = 'Exp - Multiplying Fractions, Video Problem')
**Prerequisite Skill ID's**- A semicolon separated list of ids for all the skills that the given problem is tagged/associated with.
**Prerequisite Skill Names**- A semicolon separated list of skill names for all the skills that the given problem is tagged/associated with.
**Post-requisite Skill ID's**- A semicolon separated list of ids for all the skills that are post-requisite skills of the skills that the given problem is tagged/associated with.
**Post-requisite Skill Names**- A semicolon separated list of skill names for all the skills that are post-requisite skills of the skills that the given problem is tagged/associated with.

Ostrow, K. & Heffernan, N. (2019) Copyright