FacebookTwitter
Hatrack River Forum   
my profile login | search | faq | forum home

  next oldest topic   next newest topic
» Hatrack River Forum » Active Forums » Books, Films, Food and Culture » Hatrack coders

   
Author Topic: Hatrack coders
El JT de Spang
Member
Member # 7742

 - posted      Profile for El JT de Spang   Email El JT de Spang         Edit/Delete Post 
My programming skills are limited, to say the least. But I was wondering what would be the simplest and most effective way for a knowledgeable person to write a program that pulled dates from a fixed number of websites.

Specifically, I find myself constantly checking the sites of musicians, comedians, and sports teams to find when they'll be performing in my area. They never say when they'll be adding dates, and I have about two dozen sites like this I follow. How would I develop an app that scans these sites for me every day, then notifies me in some way when tour dates are added. If I could get that far, presumably I could just use a lookup table to eliminate the dates that aren't near me.

Anybody done or found something like this? Think it's a good or bad idea? I can't do it, but I have some bored CS friends who might be willing to undertake this for me.

Posts: 5462 | Registered: Apr 2005  |  IP: Logged | Report this post to a Moderator
Swampjedi
Member
Member # 7374

 - posted      Profile for Swampjedi   Email Swampjedi         Edit/Delete Post 
Might be a bit of nasty coding, since the websites probably don't have a specific, standardized format they are following. I think this means you'd have to create a parser for each page.

That alone would probably ruin it for me, personally.

Posts: 1069 | Registered: Feb 2005  |  IP: Logged | Report this post to a Moderator
El JT de Spang
Member
Member # 7742

 - posted      Profile for El JT de Spang   Email El JT de Spang         Edit/Delete Post 
Yeah, I figured you'd have to actually scan everything on the page, and extract the dates from all the junk.
Posts: 5462 | Registered: Apr 2005  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
Assuming they post to some sort of "calendar" page, this would be pretty easy given the websites are predetermined.
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
El JT de Spang
Member
Member # 7742

 - posted      Profile for El JT de Spang   Email El JT de Spang         Edit/Delete Post 
The websites would be the same, although updating them would be a nice thing to add.

How would you do this?

Posts: 5462 | Registered: Apr 2005  |  IP: Logged | Report this post to a Moderator
Bokonon
Member
Member # 480

 - posted      Profile for Bokonon           Edit/Delete Post 
The solution, right to the webmasters of the bands' sites to add an RSS feed to the tour date postings.

Anything else is going to get messy.

-Bok

Posts: 7021 | Registered: Nov 1999  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
Yes, that would be nice :-)

Particularly if they also provided an icalendar (it predates the apple app, though the apple app uses it in part, just like every other calendaring application) file that you could just give the URL of to your calendaring app and then have the dates automatically show up.

Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
TomDavidson
Member
Member # 124

 - posted      Profile for TomDavidson   Email TomDavidson         Edit/Delete Post 
What you want to do is often called "scraping," and a lot of programs did exactly that back in the days before RSS caught on.
Posts: 37449 | Registered: May 1999  |  IP: Logged | Report this post to a Moderator
El JT de Spang
Member
Member # 7742

 - posted      Profile for El JT de Spang   Email El JT de Spang         Edit/Delete Post 
Where might I find one of these scrapers?

Are they open sourced somewhere where I can modify one to do what I'm after?

Posts: 5462 | Registered: Apr 2005  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
It doesn't really work like that. Give me an example site or two (link to the pages with the dates of interest) and I'll tell you what would be involved.
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
El JT de Spang
Member
Member # 7742

 - posted      Profile for El JT de Spang   Email El JT de Spang         Edit/Delete Post 
Brian Regan
Marc Broussard
Amos Lee

Here are three sample sites. The specific dates don't matter. If I can pick out any date, I should have a good starting point.

Posts: 5462 | Registered: Apr 2005  |  IP: Logged | Report this post to a Moderator
Minerva
Member
Member # 2991

 - posted      Profile for Minerva           Edit/Delete Post 
Coding it is going to be way more trouble than it's worth.
Posts: 289 | Registered: Jan 2002  |  IP: Logged | Report this post to a Moderator
El JT de Spang
Member
Member # 7742

 - posted      Profile for El JT de Spang   Email El JT de Spang         Edit/Delete Post 
That's almost always the case with my "Hey, I bet I could write a program to do this!" ideas.

Still, I have spare time and need the practice.

Posts: 5462 | Registered: Apr 2005  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
Okay, those are all fairly regular, I could write some code to pick out the dates and dump all the new ones in a file or somesuch pretty easily. Don't know when I'll have time, but it would take under an hour, so I might do so this weekend.
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
Note that Marc Broussard will email you when his tour schedule is updated, btw [Wink]
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
El JT de Spang
Member
Member # 7742

 - posted      Profile for El JT de Spang   Email El JT de Spang         Edit/Delete Post 
Yeah, but what's the fun in that?

Thanks fugu. That's way more help than necessary, but if you get a chance that'd be great. I'll owe you some custom circuits or something.

Posts: 5462 | Registered: Apr 2005  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
Or if you want to [Smile]

Here's how I'd do it (in python, a very nice language for basic scripts like this): fetch the entire page for site X. Use a site specific regular expression (based on the date format for the site, should be very simple to make one for each of those three) to grab all the dates out. Check and see if those dates are already in a sequence that has been shelved w/ the site name as a key (shelve is a library in python for transparently storing python objects in a key-value database). If you find some that are not, do something like email those to yourself. Add the new ones to the list, then re-shelve it.

Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
El JT de Spang
Member
Member # 7742

 - posted      Profile for El JT de Spang   Email El JT de Spang         Edit/Delete Post 
I'll check out python. I've never used it, although I hear it's not too hard to pick up.
Posts: 5462 | Registered: Apr 2005  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
I highly suggest trying one or two of the python tutorials for programmers you can find all over the place, or even one of the ones for beginners (the one on the python site proper is pretty good).

And of course, feel free to make queries of myself.

Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
Christy
Member
Member # 4397

 - posted      Profile for Christy   Email Christy         Edit/Delete Post 
It's worth noting that if there's a programming language with which you're familiar, you can use it INSTEAD of Python. [Smile] The core principle is the same.
Posts: 1777 | Registered: Jan 2003  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
Of course, but python in particular has some libraries built in that make this much, much easier, notably the transparent key-value persistence database. To get that in Perl you'd need to browse CPAN, I don't even know of such a library in PHP, and I think its unlikely he's a Rubyphile [Wink] . Any other language he's likely to know would be far more painful than learning enough python and doing it that way.

Not having that will add significant LOC and potential errors.

Plus, little scripts like this are the perfect time to pick up a new language, something I try to do with great regularity [Smile] .

Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
Dagonee
Member
Member # 5818

 - posted      Profile for Dagonee           Edit/Delete Post 
It's also worth noting that this is generally far easier to do with standards-compliant pages.

If these are all XHTML compliant, you could probably do it all in XSLT.

Posts: 26071 | Registered: Oct 2003  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
I wouldn't recommend XSLT unless someone knew it already or had a future need for it [Wink] .

XSLT is painfully verbose, though I do like a lot of its capabilities.

Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
(Oh, and there are nice html tidy libraries that mean you can even do it to non-standards compliant pages. It adds processor overhead, but that doesn't matter in toy apps like this).
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
El JT de Spang
Member
Member # 7742

 - posted      Profile for El JT de Spang   Email El JT de Spang         Edit/Delete Post 
The only languages I have any familiarity with are C++, VB, Java, and machine code level assembly code.
Posts: 5462 | Registered: Apr 2005  |  IP: Logged | Report this post to a Moderator
El JT de Spang
Member
Member # 7742

 - posted      Profile for El JT de Spang   Email El JT de Spang         Edit/Delete Post 
Python looks pretty good, too. I'll probably never get it to do what I want, but it's making sense so far.
Posts: 5462 | Registered: Apr 2005  |  IP: Logged | Report this post to a Moderator
Dagonee
Member
Member # 5818

 - posted      Profile for Dagonee           Edit/Delete Post 
Yeah, but if XSLT can do it, I can set up something to do it in SQL almost automatically.

Dag likes SQL. [Big Grin]

Posts: 26071 | Registered: Oct 2003  |  IP: Logged | Report this post to a Moderator
   

   Close Topic   Feature Topic   Move Topic   Delete Topic next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:


Contact Us | Hatrack River Home Page

Copyright © 2008 Hatrack River Enterprises Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.


Powered by Infopop Corporation
UBB.classic™ 6.7.2