Name Size Keywords Download link Reference
tracking log (201508-201608) 1048 MB Activity Log data AAAI'19
Tracking Log (201608-201708) 583 MB Activity Log data AAAI'19
Dropout Prediction Dataset 299 MB Dropout Prediction Dataset data AAAI'19
User Profile 112 MB User Profile data AAAI'19
Course Information 527 KB Course Information data AAAI'19
KDDCUP Data 158 MB KDDCUP Data data AAAI'19

User Activity

These files are the datasets used in the paper "Understanding Dropouts in MOOCs" in AAAI 2019.
Tracking log files (201508-201608, 201608-201708) include all users' learning activities in XuetangX platform from 201508 to 201708. These logs are the supporting data of the analyses in the paper.
Dropout prediction dataset includes the training set and test set used in the paper for the dropout prediction task.
User profile is the information of XuetangX users, including gender, birth year and education level.
Course information includes course start date, course end date, course category and course type.

Tracking log:

Each json file includes user tracking log in a specific period. For example, 20150801-20151101-raw_user_activity.json is the tracking log from 20150801 to 20151101.

The format of these json files is as follow:

Each json file includes user tracking log in a specific period. For example, 20150801-20151101-raw_user_activity.json is the tracking log from 20150801 to 20151101. The format of these json files is as follow:

[

    [course_id,

        {user_id:

            { session_id:

                [

                    [activity_event_1, time_1],

                    [activity_event_2, time_2],

                ... ],

            ... },

        ...}

    ],

...]

Dropout Prediction Dataset:

train_log.csv, test_log.csv:

1st column: enroll_id, the id of (user, course) pair.

2nd column: username, the id of user.

3rd column: course_id, the id of course.

4th column: session_id, the id of session.

5th column: action, the type of user activity.

6th column: object, the corresonding object of the action.

7th column: time, the occurence time of the action.

train_truth.csv, test_truth.csv:

1st column: enroll_id, the id of (user, course) pair

2nd column: truth, the label of user's dropout (1: dropout, 0: non-dropout).

User Profile:

1st column: user_id, the id of user.

2nd column: gender, the gender of user.

3st column: education, user's education level.

4th column: birth, user's birth year.

Course Information:

1st column: id, the id (number) of course, which is used in dropout prediction dataset.

2nd column: course_id, the id (string) of course, which is used in tracking log.

3st column: start, course start time.

4th column: end, course end time.

5th column: course_type, course mode (0: instructor-paced course, 1: self-paced course).

6th column: category, the category of course.

KDDCUP Data:
The full dataset of KDDCUP 15, details can be found in KDDCUP 2015.

Reference

@inproceedings{feng2019dropout,
    title={Understanding Dropouts in MOOCs},
    author={Wenzheng Feng and Jie Tang and Tracy Xiao Liu and Shuhuai Zhang and Jian Guan},
    booktitle={Proceedings of the 33rd AAAI Conference on Artificial Intelligence},
    year={2019}
}