Hatrack River Forum: As long as we're asking web questions...

my profile login | search | faq | forum home

»	Hatrack River Forum » Active Forums » Books, Films, Food and Culture » As long as we're asking web questions...

Author

Topic: As long as we're asking web questions...

saxon75
Member
Member # 4589

posted

*** Warning -- Techno-babble to follow ***

I tried asking this at a PHP-related forum, but I didn't really like the answers I was getting, and I know there are at least a few Hatrack web gurus.

Anyway, some of you may be familiar with the front page at sakeriver. Currently it just has site-related news, so it doesn't change much. I have the site news content stored in a MySQL table, and then I use PHP to dynamically generate the page. I do the same thing for all of the content pages, and I have admin pages that I made for composing and storing the content into the various database tables.

What I'd like to do is sort of like what John does with GreNME: include a snippet of all new content on the front page, instead of just site news. So, whenever I add a review or in the unlikely event that I write a new editorial, it'll put a couple of sentences on the front page, in more or less the same format in which site news is currently displayed.

The only way I can think of to do this is to have the composition/storage scripts store snippets to a new table and have the front page script pull info from that table. The thing is, duplicating data like that doesn't seem very efficient or very robust. Plus, I don't want my front page to have a million entries on it. I know I can just alter my query string to pull only the twenty or so most recent entries from this hypothetical new table, but the table will eventually have a lot of entries in it, and since I will only be looking at the most recent ones, that means most of the entries will be total junk and never looked at again.

So is there a better way to do this? Or am I stuck with data duplication and manually trimming the table? Or using a cron job for trimming. But it seems like there still must be a more elegant way of doing it.

Posts: 4534 | Registered: Jan 2003 | IP: Logged |

Bokonon
Member
Member # 480

posted

First, store reviews/editorials in the existing DB. Essentially, they are now the same as news updates. Secondly, I would either add a column to the existing table that describes the type of front page entry, or create a separate table keyed off of the unique id for each item in the current table, with the new table having a field that holds the type of the item "Reviews", "News", "Rants", whatever. When you grab your front page info, you cross reference to see what type the item is (or in the first case, you merely see what the new column says the item is), and then...

Thirdly, you take the type info, and if it is a category you want to just have a snippet of, you can just post the first 256 characters, or snazzier, all text up to the 1st or second paragrapg break/line-return.

-Bok

Posts: 7021 | Registered: Nov 1999 | IP: Logged |

Dagonee
Member
Member # 5818

posted

Make an optional field in the site news table that points to the editorial or full article (either a DB address if they're in the DB, otherwise a relative URL).

If this link is filled in, then the "read more" link is added at the end of the news snippet by the PHP code.

In essence, you've created a new class of news called story intro.

You could then add an upload routine that automatically creates the news entry for you, pulling the first x characters or the first 4 sentences or something like that. The two tables lets you tailor your intro, so your not stuck using the exact same text as the intro.

As an alternative, you could add a field to the table with the stories that tells how many characters to pull out of the content for the intro. No duplication of data, fairly flexible, a little painful (but not impossible) to do the counting.

Dagonee

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

Add a text field to each table used for back pages (and a text field for entering/editing on your entry forms, if you have such), and a timestamp field if you don't already have such. Call the text field front_page_summary or whatever (or just summary, or what have you), and the timestamp (the kind with both date and time) time_created or somesuch.

Then do a SELECT on each table (using ORDER BY and LIMIT to restrict totals to a limited number for each) then merge programmatically by date. You can add little icons or color as appropriate to the table selected from at this time. Oh, and links to the appropriate full items.

Its possible to get a result out of pure SQL as well, but the complexity of the query would be large, and would increase geometrically with each news table added to the list (because you'd have to add a comparison to each existing table's date for each added table). This is one reason I'm using an RDF query library for the next version of PWeb. It doesn't constrain me into coming up with table metaphors for data which isn't well suited to a purely relational model.

edit; that's the simplest solution that doesn't involve cross updating; I also highly recommend having a bit more metadata (for instance, should the post be sticky on the front page? should it have some sort of "site related announcement" icon? that sort of thing).

[ June 23, 2004, 06:53 PM: Message edited by: fugu13 ]

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Dagonee
Member
Member # 5818

posted

Everything's well suited to the relational model, if you view the relations in the right way. [Razz]

It requires a lot of metadata stored in tables and creative use of the query optimizer.

That was my major area of expertise back in my programming days. We did some fun, wild stuff, including a web site where pretty much everything was in two tables. It was set up so we could translate the site to spanish on a moment's notice. We tested it by running the content throught the Swedish Chef translator over night with a bot. Created the new site in exactly 15 minutes once we had the content. No limit to the number of languages, either.

Dagonee

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

I'd actually recommend against using Dag's approach of a URL. Every time you switch your setup you have to either constrain yourself so those URLs are still appropriate (presumably relative), or change all the URLs. Now, a URI may be useful, if appropriately chosen -- something which will uniquely identify the linked story, at least locally, and will allow you to generate the URL. But directly storing the URL is asking for portability issues.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Hobbes
Member
Member # 433

posted

Are URIs really supported yet? I thought they were still too much in development to be of any real use besides just pretending that URLS are really basic, non-interesting URIs. [Dont Know]

Hobbes

Posts: 10602 | Registered: Oct 1999 | IP: Logged |

Dagonee
Member
Member # 5818

posted

I agree - I wasn't sure if his content was in the table or on the site or how much work he wanted to do.

Edit: Which is why my first recomendation was for a primary key into a DB table.

[ June 23, 2004, 06:54 PM: Message edited by: Dagonee ]

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

Lots of things suck in a relational model. For instance, storage and searching of extensive and evolving metadata. Luckily RDF storage models can be given a fast SQL backend through query optimization and storage of data in several semi-parsed manners.

Lets see, how I would do this as an RDF query (expand as needed) . . .

SELECT ?item, ?postedon, ?summary WHERE

?item site:itemtype site:newsitem

?item dc:date ?postedon

?item site:newsummary ?summary

ORDER BY ?postedon DESCENDING LIMIT 15

You see, the RDF model is a superset of the relational model.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Bokonon
Member
Member # 480

posted

So do my methods suck?

I personally don't like adding new columns to a production table.. Too many side effects possible for this QA tester [Smile]

New tables with new code would be better, IMO.

-Bok

Posts: 7021 | Registered: Nov 1999 | IP: Logged |

fugu13
Member
Member # 2859

posted

A URI is just something unique-ish, particularly when we're talking locally. There's lots of theoretical work being done on developing effective URI sets, but its easy enough to make up a URI schema for local use. Such as site:movies:45 being the 45th movie article created at the site. SQL databases support them fine, they're called strings [Wink]

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

Adding columns should have zero side effects if you've written good code.

edit: of course, stuff added to take advantage of those columns may have side effects. However, this should be very simple on a standard add/update/view site. Database access should be mediated by a library of classes/functions that hide database interaction properly, so that all he has to do is update that library, then update all the points where the functions/methods he updated are used (very easy to do in any modern IDE), and he's done, which should be nearly error free.

[ June 23, 2004, 07:11 PM: Message edited by: fugu13 ]

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Bokonon
Member
Member # 480

posted

fugu, I wouldn't say "good" code, I would say "extendable" code. And home websites are exactly the types of projects that aren't usually designed to be extensible.

Of course, I'm a big fan of concious, occassional refactoring of code/software, but many just want what works, so I was addressing the latter.

-Bok

Posts: 7021 | Registered: Nov 1999 | IP: Logged |

Dagonee
Member
Member # 5818

posted

I don't get why you're example doesn't work directly as SQL (translating the syntax, of course).

Dagonee

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

I'd say good. Columns should always be referred to in queries by name, and so long as that is being done, adding extra columns won't affect existing queries one whit (if all that is being done is adding columns for new code to use, for instance).

[ June 23, 2004, 07:15 PM: Message edited by: fugu13 ]

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

Multiple tables. Try doing the same thing when your news summaries are in twenty five different tables each reflecting a different category of item.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Dagonee
Member
Member # 5818

posted

Yes. I had a hard and fast rule against * in any query (except Count(*) in dialects that optimized it).

That and

If boolVar = true then

I hated that.

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

Dagonee
Member
Member # 5818

posted

If you put the news items in separate tables by category you deserve what you get. [Smile]

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

But then you've got duplicate data floating around, and rows in tables that are only related by your understanding (even considering foreign keys as they don't make explicit the relationship) -- you're forcing your data into your model. Similarly if you stored all your different categories of stuff (say some are news stories, some are polls, et cetera) into one table so you could do an efficient query and only get the top so many. There are lots of ways to put such data into SQL that are adequate, but hardly optimal.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

saxon75
Member
Member # 4589

posted

quote:
If you put the news items in separate tables by category you deserve what you get.

Guess what? [Smile]

Posts: 4534 | Registered: Jan 2003 | IP: Logged |

Dagonee
Member
Member # 5818

posted

The rest of the issues exist, but I'm confused as to why there'd be duplicate data floating around?

And to a good SQL person, the other problems are minor compared to the benefits received. The understanding based on foriegn keys isn't hard, and one layer of abstraction would get rid of it (though it's usually not worth it).

Plus, with the efficiency of good query engines, I'd bet the tradeoffs in getting the top N different items from SQL are less than the one-step indirection caused by the RDF layer.

Anyone competent with indexing and joins should have little problem optimizing such queries in the unlikely event the costing system doesn't do it right for you.

Granted, for most of the applications I've done, the content features are much smaller percentage of the system than the pure transactional stuff, so SQL was really the top choice for other reasons. If I were doing a pure content system that was big enough to justify investing in a new technology, I'd look into RDF more closely.

The other possibility is a "compiled" web site, where a batch process creates straight HTML pages from the data set. I used it for a couple of sites where the customer had no database capability on their web host (don't ask). Change data using the application, push the publish button. Not real time, but much faster on the site and very easy to add a QA check to it. For sites that change nightly, it worked great.

Except for that one time they had to make a change in the middle of the day...

Dagonee

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

Oh, RDF databases aren't quite as fast for the simplest SQL-like queries, but for manipulation of rich semantic data they're already pretty much the best, simply because rich semantic data is best kept, well, semantically. The most important bit, in fact, is that RDF data is highly extendable. I can add assorted metadata ad hoc that I wish to use, without creating worrying problems with my table setup and complexity.

For instance, say I've been maintaining a list of contributors to an open source project (a not uncommon situation), but just for bug tracking purposes. Suddenly everyone wants a biography. In an SQL database this isn't hard, but it adds complexity. in an RDF database its trivial and doesn't introduce any further levels of complexity -- each person (the things which are members of the project_contributors group, say) just get a new property -- they now have a biography predicate, which object contains the biography. Each person wants to add a list of recommended sites? Instead of storing that in some opaque format (as would be typical in an SQL database, either that or creating an entirely new table with foreign keys), each person just has assorted objects of a recommends_site predicate.

Its not that they can't be done relatively easily in SQL, its that they can be done very, very easily in RDF, and that furthermore anyone who pokes around in your datastructures can immediately understand what they mean because its semantically stored, the semantics are low level and easily human parsed, and if one doesn't understand a semantic the webpages to find out what they mean are built into the data itself.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

Oh, and RDF databases inherit much of the transactional support from other databases by virtue of using them as backends. Though there isn't yet an accepted equivalent of ACID in RDF databases, however, it will come. They're new.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Dagonee
Member
Member # 5818

posted

quote:
anyone who pokes around in your datastructures can immediately understand what they mean because its semantically stored

THis is a good thing? What about job security? [Razz]

I'm just kidding, although I've met people who would mean that seriously.

Actually, one of the things I'm forever grateful for is no longer having to keep up with information technology advances. I figure by the time I graduate law school, my skills will look like a Cobol programmer's did to me 10 years ago.

But it's still interesting to read about.

One concern: I've noticed when something is as easy to extend as that, people fail to carefully plan the extensions and you end up with inconsistent semantics. For example, someone adds a "RelatedSites" attribute to one table while someone else adds "Links" to another. It's not that this doesn't happen w/ SQL, but you HAVE to pay attention to where things fit in the data model to do it at all, so some types of problems are more easily avoided.

Does that kind of thing happen a lot w/ RDF installations?

Dagonee

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

RDF tries to rely on publicly available vocabularies, such as those from the Dublin Core Metadata set ( http://dublincore.org ), Friend Of A Friend relationship description set ( http://xmlns.com/foaf/0.1/ ), and many others. This semantic flexibility and wide range of these established sets keeps the number of "site specific" things and predicates to a minimum.

Furthermore, RDF works closely with a language called OWL, which is expressed in RDF. OWL is the Web Ontology Language, and it allows one to describe RDF terms ontologically. Thus if you know that two properties are identitical, or are inverses, or have many other complex ontological properties, you can express that in OWL, add it to your OWL compliant database (unfortunately, very few of these exist yet), and your database will transparently handle the identity in the relationship (that's actually another benefit of RDF databases -- they can "know" far more data than is put in, through ontologies, which are really bits of declarative programming. For instance, the hasmember and ismember predicates might be inverses. I might store in my database the equivalent of bob ismember samsclub, and also have the OWL ontology loaded up which expresses that inverse relationship. Then I could ask my database samsclub hasmember ?who, and it would tell me bob, because it worked out the relationship.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Dagonee
Member
Member # 5818

posted

Cool. I was working on ontologies in cancer clinical trials when I quit. Fascinating stuff. Too fascinating actually, because I could get bogged down for days mapping it all out.

The same type of thing attracts me about the study of law, actually.

Dagonee

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

Printer-friendly view of this topic

Contact Us | Hatrack River Home Page

Copyright © 2008 Hatrack River Enterprises Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.

Powered by Infopop Corporation
UBB.classic™ 6.7.2