Tuesday, July 24, 2012
I am looking for help with resources for a particular research project, and a colleague suggested that I "crowdsource" my question here. I want to locate a particular word or phrase in a wide range of non-legal sources, such as movies, music, media, and books and other literature. Google's functioning pretty well for searching general media. I also recently discovered Google's NGram function, which permits word and phrase searches within a large database of books.
Any suggestions for other good search resources, particularly something for searching words or phrases within movies and music?
Thanks for any help!
Posted by Brooks Holland on July 24, 2012 at 11:22 AM | Permalink
TrackBack URL for this entry:
Listed below are links to weblogs that reference Resource Assistance?:
You may want to try the Corpus of Contemporary American English: http://corpus.byu.edu/coca/
Posted by: Alexis | Jul 24, 2012 11:39:49 AM
Searchable music lyrics database: www.mldb.org
Movie quote database (not as good as mldb is for lyrics, but decent): www.moviequotedb.com
Posted by: anon | Jul 24, 2012 11:53:15 AM
On the Daily Show with Jon Stewart, they often find these amazing video clips (like politicians contradicting their current statements in footage from ten or twenty years earlier). I have no idea how they do it, but they do it so frequently and so quickly that I suspect they must have a database of television (or at least news show) transcripts.
Does anyhow know how they do it? If there is such a database, Brooks, you might find it very helpful.
Posted by: Adam Kolber | Jul 24, 2012 12:22:47 PM
If you have some financial resources, your best bet is probably MTurk (https://www.mturk.com/mturk/welcome), which allows you to pay people very small sums of money to accomplish tasks like this that require some human intelligence.
Posted by: Erik Girvan | Jul 24, 2012 1:25:47 PM
There are many freely available natural language corpuses that may be of interest to you. I'm a fan of the Python Natural Language Toolkit (NLTK) which comes packaged with many corpuses including online chats, Reuters reports, public domain books and many more: www.nltk.org.
Posted by: Ryan Whalen | Jul 25, 2012 12:14:35 AM
The comments to this entry are closed.