Name Size Keywords Download link Reference
Course Concept Extraction 21.6 MB Concept Extraction, Key-phrase Extraction data IJCNLP'17

Course Concept Extraction

This is the whole dataset of paper "Course Concept Extraction in MOOCs via Embedding-Based Graph Propagation" in IJCNLP 2017.

CSEN,CSZH,EcoEN,EcoZH are evaluation datasets mentioned in the paper. All data file are in standard json format. Each dataset contains two file: Captions and Candidates.

1. .captions file
Video captions of MOOC courses in the dataset, each line represents a video. The text has been tokenized and labeled with POS tagging. For CSZH and EcoZH, we employ Ansj( to perform word segmentation and POS tagging. For CSEN and EcoEN, we select the POS tagger implemented by the Stanford NLP group.(

2. .candidates file
Candidate course concepts extracted from the dataset. The "label" field is the human annotated label for a candidate. "1" stands for a course concept and "0" otherwise.

You may use the dataset to test your concept/key-phrase Extraction model or do some more talent jobs.