posted
My programming skills are limited, to say the least. But I was wondering what would be the simplest and most effective way for a knowledgeable person to write a program that pulled dates from a fixed number of websites.
Specifically, I find myself constantly checking the sites of musicians, comedians, and sports teams to find when they'll be performing in my area. They never say when they'll be adding dates, and I have about two dozen sites like this I follow. How would I develop an app that scans these sites for me every day, then notifies me in some way when tour dates are added. If I could get that far, presumably I could just use a lookup table to eliminate the dates that aren't near me.
Anybody done or found something like this? Think it's a good or bad idea? I can't do it, but I have some bored CS friends who might be willing to undertake this for me.
Posts: 5462 | Registered: Apr 2005
| IP: Logged |
posted
Might be a bit of nasty coding, since the websites probably don't have a specific, standardized format they are following. I think this means you'd have to create a parser for each page.
That alone would probably ruin it for me, personally.
Posts: 1069 | Registered: Feb 2005
| IP: Logged |
posted
Yeah, I figured you'd have to actually scan everything on the page, and extract the dates from all the junk.
Posts: 5462 | Registered: Apr 2005
| IP: Logged |
posted
Assuming they post to some sort of "calendar" page, this would be pretty easy given the websites are predetermined.
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
Particularly if they also provided an icalendar (it predates the apple app, though the apple app uses it in part, just like every other calendaring application) file that you could just give the URL of to your calendaring app and then have the dates automatically show up.
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
posted
What you want to do is often called "scraping," and a lot of programs did exactly that back in the days before RSS caught on.
Posts: 37449 | Registered: May 1999
| IP: Logged |
posted
It doesn't really work like that. Give me an example site or two (link to the pages with the dates of interest) and I'll tell you what would be involved.
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
Here are three sample sites. The specific dates don't matter. If I can pick out any date, I should have a good starting point.
Posts: 5462 | Registered: Apr 2005
| IP: Logged |
posted
Okay, those are all fairly regular, I could write some code to pick out the dates and dump all the new ones in a file or somesuch pretty easily. Don't know when I'll have time, but it would take under an hour, so I might do so this weekend.
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
Thanks fugu. That's way more help than necessary, but if you get a chance that'd be great. I'll owe you some custom circuits or something.
Posts: 5462 | Registered: Apr 2005
| IP: Logged |
Here's how I'd do it (in python, a very nice language for basic scripts like this): fetch the entire page for site X. Use a site specific regular expression (based on the date format for the site, should be very simple to make one for each of those three) to grab all the dates out. Check and see if those dates are already in a sequence that has been shelved w/ the site name as a key (shelve is a library in python for transparently storing python objects in a key-value database). If you find some that are not, do something like email those to yourself. Add the new ones to the list, then re-shelve it.
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
posted
I highly suggest trying one or two of the python tutorials for programmers you can find all over the place, or even one of the ones for beginners (the one on the python site proper is pretty good).
And of course, feel free to make queries of myself.
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
posted
It's worth noting that if there's a programming language with which you're familiar, you can use it INSTEAD of Python. The core principle is the same.
Posts: 1777 | Registered: Jan 2003
| IP: Logged |
posted
Of course, but python in particular has some libraries built in that make this much, much easier, notably the transparent key-value persistence database. To get that in Perl you'd need to browse CPAN, I don't even know of such a library in PHP, and I think its unlikely he's a Rubyphile . Any other language he's likely to know would be far more painful than learning enough python and doing it that way.
Not having that will add significant LOC and potential errors.
Plus, little scripts like this are the perfect time to pick up a new language, something I try to do with great regularity .
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
posted
(Oh, and there are nice html tidy libraries that mean you can even do it to non-standards compliant pages. It adds processor overhead, but that doesn't matter in toy apps like this).
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
posted
The only languages I have any familiarity with are C++, VB, Java, and machine code level assembly code.
Posts: 5462 | Registered: Apr 2005
| IP: Logged |
posted
Python looks pretty good, too. I'll probably never get it to do what I want, but it's making sense so far.
Posts: 5462 | Registered: Apr 2005
| IP: Logged |