|Course Concept Extraction||21.6 MB||Concept Extraction, Key-phrase Extraction||data||IJCNLP'17|
Course Concept Extraction
This is the whole dataset of paper "Course Concept Extraction in MOOCs via Embedding-Based Graph Propagation" in IJCNLP 2017.
CSEN,CSZH,EcoEN,EcoZH are evaluation datasets mentioned in the paper. All data file are in standard json format. Each dataset contains two file: Captions and Candidates.
1. .captions file
Video captions of MOOC courses in the dataset, each line represents a video. The text has been tokenized and labeled with POS tagging. For CSZH and EcoZH, we employ Ansj(https://github.com/NLPchina/ansjseg) to perform word segmentation and POS tagging. For CSEN and EcoEN, we select the POS tagger implemented by the Stanford NLP group.(http://nlp.stanford.edu/software/tagger.shtml).
2. .candidates file
Candidate course concepts extracted from the dataset. The "label" field is the human annotated label for a candidate. "1" stands for a course concept and "0" otherwise.
You may use the dataset to test your concept/key-phrase Extraction model or do some more talent jobs.