scrape wikipedia

built a quick recursive wikipedia scraper with depth limit in ruby because wget was an epic fail. uses nokogiri, which is excellent for using CSS selectors.

this script will recursively get all categories based on a single category, so as an example i started off here: http://en.wikipedia.org/wiki/Category:Algorithms and then a text file is created which creates a full list of categories and subcategories. i set the initial depth limit to 3, otherwise it goes really deep and starts getting things that are no longer relevant to the original category.

i'll add more code later to scrape the pages of all the categories.

edit: the code was updated, you should uncomment the lines at the bottom to make this work.

edit: need to scrape stackoverflow? very simple scraper here with delay timer -- i got blocked from stackoverflow the first time i tried this.