Getting Started

Scraping video pages

Most use cases will simply require the auto_scrape function. Usage is incredibly easy:

>>> from vidscraper import auto_scrape
>>> video = auto_scrape("http://www.youtube.com/watch?v=J_DV9b0x7v4")
>>> video.title
u'CaramellDansen (Full Version + Lyrics)'

That’s it! Couldn’t be easier. auto_scrape will determine the right scraping suite to use for the url you pass in and will use that suite to return a ScrapedVideo instance that represents the data associated with the video at that url. If no suites are found which support the url, CantIdentifyUrl will be raised.

If you only need certain fields (say you only need the “file_url” and the “title” fields), you can pass those fields in as a second argument:

>>> video = auto_scrape(url, fields=['file_url', 'title'])

Video fields

If a ScrapedVideo is initialized without any fields, then vidscraper will assume you want all of the fields for the video. When the ScrapedVideo is being loaded, vidscraper will maximize the number of requested fields that it fills; occasionally, this may mean that it will make more than one HTTP request. This means that limiting the fields to what you are actually using can save quite a bit of work.

Getting videos for a feed

If you want to get every video for a feed, you can use vidscraper.auto_feed():

>>> from vidscraper import auto_feed
>>> results = auto_feed("http://blip.tv/djangocon/rss")

This will read the feed at the given url and return a generator which yields ScrapedVideo instances for each entry in the feed. The instances will be preloaded with metadata from the feed. In many cases this will fill out all the fields that you need. If you need more, however, you can tell the video to load more data manually:

>>> video = results.next()
>>> video.load()

(Don’t worry - if vidscraper can’t figure out a way to get more data, it will simply do nothing!)

Note

Because this function returns a generator, the feed will actually be fetched the first time the generator’s next() method is called.

Crawling an entire feed

auto_feed() also supports feed crawling for some suites. You use it like this:

>>> from vidscraper import auto_feed
>>> results = auto_feed("http://blip.tv/djangocon/rss", crawl=True)

Now, when the generator runs out of results on the first page, it will automatically fetch the next page, and then the next, and so on. This is not for the faint of heart. Depending on the feed you’re crawling, you could be there for a while.

Searching video services

It’s also easy to run a search on a variety of services that support it. Simply do the following:

>>> from vidscraper import auto_search
>>> results = auto_search(['parrot'], exclude_terms=['dead']).values()

The search will be run on all suites that support searching, and the results will be returned as a dictionary mapping the suite used to the results for that feed.

Project Versions

Table Of Contents

Previous topic

Welcome to Vidscraper’s documentation!

Next topic

Exceptions

This Page