Anonymous avatar Anonymous committed 1b55c9f

[econ/patterns/imdb][l]: (old - last w/e mostly) first stab at working with imdb database.

Comments (0)

Files changed (2)

+# Analysing Movies with IMDB
+
+Data grabbing, analysis etc is currently all in data.py.
+
+## Movies and Actors: Mapping the Internet Movie Database
+
+http://ivl.slis.indiana.edu/km/pub/2007-herr-movieact.pdf
+
+## IMDB Graph Drawing Competition
+
+http://www.ul.ie/gd2005/dataset.html
+
+Evolving-Graph Drawing Competition
+
+The challenge of drawing large evolving graphs can be addressed in various ways. Any visualizations based on the contest data, including animations, static images, subgraphs and derivations of the contest graph, are welcome as submission. In addition to the visualizations, we encourage contestants to submit supplemental material, such as background relevance of the graph, case studies, concepts, algorithms, experiments, structural results, that address the problem of visualizing this type of data in a meaningful way.
+Contest Data
+
+A real-world data set is provided that is based on the Internet Movie Database.
+
+The graph is a bipartite graph where each node either corresponds to an actor or to a movie. There is an edge between a movie and each actor of the movie.
+
+Moreover, the data contain the following attributes at nodes:
+
+    * "movie" indicating if node corresponds to a movie (type boolean)
+    * "name" indicating name of movie resp. actor (type string)
+    * year of the movie (type int); attribute is 0 if node is an actor or year is not known
+    * genre of the movie (type string) 
+
+Download
+
+The graph is available in GraphML format (compressed with bzip2):
+
+    * imdb.graphml.bz2 (26MB) 
+'''Analyse IMDB data.
+
+Requires imdbpy: <http://imdbpy.sourceforge.net/>
+
+On Debian/Ubuntu you can do:
+
+    $ aptitude install python-imdbpy
+
+However to get scripts (which we need) seems you have to install the tarball.
+(I used IMDb-3.6).
+'''
+import os
+import urllib
+
+urlbase = 'ftp://ftp.fu-berlin.de/pub/misc/movies/database/'
+fns = [ 'movies.list.gz', 'actors.list.gz', 'actresses.list.gz' ]
+cache = os.path.abspath('cache')
+
+def retrieve():
+    if not os.path.exists(cache):
+        os.makedirs(cache)
+    for fn in fns:
+        url = urlbase + fn
+        dest = os.path.join(cache, fn)
+        if not os.path.exists(dest):
+            print 'Retrieving %s to %s' % (url, dest)
+            urllib.urlretrieve(url, dest)
+        else:
+            print 'Skipping %s' % url
+
+# follow http://imdbpy.sourceforge.net/docs/README.sqldb.txt
+dburi = 'postgres://rgrp:pass@localhost/imdb'
+def load():
+    cmd = 'imdbpy2sql.py -d %s -u %s' % (cache, dburi)
+    # os.system(cmd)
+    print cmd
+
+def analyse_via_imdb():
+    # does not seem very flexible
+    import imdb
+    i = imdb.IMDb('sql', dburi)
+    movies = i.search_movie('Indiana Jones and the Ark')
+    m = movies[0]
+    print m['title']
+    print m.__dict__
+
+from sqlalchemy import *
+def analyse():
+    metadata = MetaData()
+    metadata.bind = dburi
+    engine = metadata.bind
+    # titles != movies since some stuff is e.g. videogames
+    titles = Table('title', metadata, autoload=True)
+    kinds = Table('kind_type', metadata, autoload=True)
+    result = kinds.select().execute()
+    # movie has kind_id = 1
+
+    def get_year_production(year):
+        query = titles.count()
+        query = query.where(titles.c.kind_id == 1)
+        query = query.where(titles.c.production_year == year)
+        # results = select([func.count(titles.c.id)], and_(titles.c.kind_id == 1,
+        #    titles.c.production_year==1950)).execute()
+        # print results.fetchall()[0][0]
+        count = query.execute().fetchall()[0][0]
+        return count
+    for year in range(1900, 1980):
+        print year, get_year_production(year)
+
+    # persons = Table('person_info', metadata, autoload=True)
+    # castinfo = Table('cast_info', metadata, autoload=True)
+    # titles -> title_id
+    # person_info -> person_id
+    # castinfo -> movie_id (title), person_id
+
+if __name__ == '__main__':
+    retrieve()
+    load()
+    analyse()
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.