Cornell Movie-Dialogs Corpus, a large, metadata-rich collection of conversations extracted from movie scripts. The data includes over 220,000 conversational exchanges involving in total 9000+ characters from 617 movies. Prior uses of this corpus include:
* Cristian Danescu-Niculescu-Mizil, Justin Cheng, Jon Kleinberg and Lillian Lee.
"You had me at hello: How phrasing affects memorability". ACL 2012.
* Tyler Schnoebelen, Feb 2012: how "like" and "I mean" vary across movie genre,
gender, and cast position.
http://corplinguistics.wordpress.com/2012/02/23/like-lets-go-to-the-movies-i-mean/
* Cristian Danescu-Niculescu-Mizil and Lillian Lee, "Chameleons in imagined
conversations: A new approach to understanding coordination of linguistic style
in dialogs", ACL 2011 workshop on Cognitive Modeling and Computational Linguistics.
The download site is:
http://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
Cristian Danescu-Niculescu-Mizil and Lillian Lee
* Cristian Danescu-Niculescu-Mizil, Justin Cheng, Jon Kleinberg and Lillian Lee.
"You had me at hello: How phrasing affects memorability". ACL 2012.
* Tyler Schnoebelen, Feb 2012: how "like" and "I mean" vary across movie genre,
gender, and cast position.
http://corplinguistics.wordpress.com/2012/02/23/like-lets-go-to-the-movies-i-mean/
* Cristian Danescu-Niculescu-Mizil and Lillian Lee, "Chameleons in imagined
conversations: A new approach to understanding coordination of linguistic style
in dialogs", ACL 2011 workshop on Cognitive Modeling and Computational Linguistics.
The download site is:
http://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
Cristian Danescu-Niculescu-Mizil and Lillian Lee