Hatrack River Forum: SQL question (another computer advice thread)

This is topic SQL question (another computer advice thread) in forum Books, Films, Food and Culture at Hatrack River Forum.

To visit this topic, use this URL:
http://www.hatrack.com/ubb/main/ultimatebb.php?ubb=get_topic;f=2;t=026528

Posted by MrSquicky (Member # 1802) on :

I've already searched the tech forums and I can't find an answer to this question. I figured I'd give it a shot here.

I'm having problems getting something to work with a MySQL database that I'm working with. Pretty much, I've got a many to many relationship and I'm tying with one SQL statement to check if an id from the one table is linked to all the ids from the other table.

The setup is I have a Sources(like a book or an article) table and an Authors table, which a linked by a Author_To_Source that maps an author_id to a source_id. So I'm trying to check a source_id to see if it is linked to all the author_ids in a list I provide.

Now, MySQL doesn't have cross-tab queries, so right now, I'm using a query like this:

SELECT Author_To_Source.Source_ID, Sum(IIf(Author_ID IN ([Author List.Author_IDs]),1,0)) AS Count
FROM Author_To_Source
GROUP BY Author_To_Source.Source_ID
HAVING (((Sum(IIf(Author_ID IN ([Author List.Author_IDs]),1,0))=[Author List.Count]));

Now, this works, but it requires a bit of complex nerfing to get the [Author List.Count] number. So, I'm wondering, is there someone out there with better SQL experience than I who can tell me the obvious way of checking a complete many to many relationship in a single SQL statement, or a way of, in a SQL statement, taking a list of IDs like (1,2,5,6) and getting the number of elements in it? Thanks.

Posted by jeniwren (Member # 2002) on :

Forgive me if I'm not understanding...but are you trying to get a source count by author? Or (rereading) are you trying to get a source count for authors on a list you're providing?

Posted by MrSquicky (Member # 1802) on :

jeniwren,
I'm trying to check if a given source is associated with all the authors in a list I proivde. The statement I have counts up all the authors that are in the list that are associated with each individual source and selects those sources where the total number of matching authors equals the number of authors in the list. Since an author can only be associated with a source once, this works to check to see if the source is associated with all the authors in the list.

Posted by jeniwren (Member # 2002) on :

This is probably not helpful (I'm in the midst of learning SQL syntax, but have done query writing in other languages for many years, especially for reports and the like), but you need to simplify what you're doing then go more complex as needed. Queries with known many-to-many relationship links are a recipe for frustration...and suspicious data returns. One way might be to summarize your provided list to a count, then summarize the author to source table by source to a count. You should then be able to link the two summaries together and determine if your counts match. This means you'd have to do temp tables, but really, if you end up with results you're not sure of, you can debug much more easily.

Sorry if this doesn't help or if I've misunderstood what you're trying to do. I'm not sure I understand what a "source" is in practical terms. I was initially picturing them as a book with many authors, but then you said that no author could be on more than one source, so that shot a hole in what I was visualizing.

Posted by fugu13 (Member # 2859) on :

Lets see . . .

Off the top of my head, I'd suggest (may need tweaking to be valid MySQL, but you get the idea). . .

SELECT DISTINCT authors.authorid FROM authors LEFT JOIN author_to_source AS a_t_s ON a_t_s.authorid = authors.authorid LEFT JOIN sources ON sources.sourceid = a_t_s.sourceid WHERE sources.sourceid IN (. . .list of source ids. . .)

No, that doesn't work, that just gets all the unique authors for the group . . . silly me, wrong direction. Lets see . . . Oh yes, I see.

I think your solution is pretty much the easiest way, but it can be refined.

SELECT count(a_t_s.authorid) FROM sources LEFT JOIN author_to_source AS a_t_s ON a_t_s.sourceid = sources.sourceid WHERE sourceid IN (. . .list of souce ids. . .) AND a_t_s.authorid = test_authorid

(you could of course also use a USING clause instead of ON, it doesn't really matter, I tend to prefer ON)

Then just compare the result to the number of sources. In fact . . .

SELECT count(. . .list of source ids. . .) = count(a_t_s.authorid) FROM sources LEFT JOIN author_to_source AS a_t_s ON a_t_s.sourceid = sources.sourceid WHERE sourceid IN (. . .list of souce ids. . .) AND a_t_s.authorid = test_authorid

It'll return 1 if the counts match up, 0 if they don't. This has not been tested by me, but I'm pretty certain it'll work.

Posted by fugu13 (Member # 2859) on :

jeniwren -- you don't need temp tables, you can just JOIN existing tables (I was writing my above reply while you were writing yours).

Also, its not that no author can be on more than one source, its that each author can only appear once for a source (that is, nobody is twice-or-more-over the author of a single work).

Posted by jeniwren (Member # 2002) on :

I stand corrected, fugu. [Smile]

That makes *much* more sense.

Posted by fugu13 (Member # 2859) on :

Yeah, I had a bit of a time parsing that myself, but many-to-many relationships can make explanatory semantics difficult.

I hope my answers were of some help.

Posted by Dagonee (Member # 5818) on :

fugu's idea is right on (I haven't verified syntax either, but the idea is sound). One caveat is that the join table must have a primary key that is the combination of the primary key from the author table and the primary key from the source table, and nothing else.

It seems like your schema is set up that way, but it's the most common mistake I've seen in this kind of query, so I thought I'd mention it.

For example, if the source were actually related to article, and author was related to article, you couldn't just compare the numbers without doing some aggregation first, unless your variation supports COUNTDISTINCT (few do). You'd have to do something similar in principle but quite different in the details.

Dagonee
Edit: Now that I think about it, you should be able to do this with just the join table, since it should have the author_id in it. No need to JOIN in the author table unless you need some attributes of the author, and then it's probably faster to use another query.

[ August 12, 2004, 08:53 AM: Message edited by: Dagonee ]

Posted by fugu13 (Member # 2859) on :

Oh, and I hope you're using InnoDB tables so you can have two foreign keys in a_t_s, one linking sourceid with sourceid and one linking authorid with authorid. That'll speed my solution up considerably, for instance.

Dags is right, if you need to grab author information then a two column primary key on authorid/sourceid is your friend. Basically the two foreign keys and one joint key would work together to make the entire three-table join "free". You don't need author information in this instance, but I'm betting it'll be useful for you to be able to pull author information based on source information in a single (very fast) query.

[ August 12, 2004, 10:21 AM: Message edited by: fugu13 ]

Posted by Dagonee (Member # 5818) on :

I had some very complex aggregation/cross-tab queries for one of my applications, and it was amazing what the order of tables and adding (or even deleting) foreign keys could do.

Just changing those two items resulted in query times of less than half a second to more than 3 minutes.

I only mention the extra tables because we got three orders of magnitude improvements in speed by removing some intervening tables in these queries. I was constantly harping on my programmers to not slavishly follow the ERD in making their joins.

Of course, this required taking extra effort to ensure that a unique key was underlying at least one side of each join, but the time savings were phenomenal.

Dagonee

Posted by fugu13 (Member # 2859) on :

I love SQL [Smile]

Its so peaceful to let one's mind slip towards the simplicity of an elegant query . . . I unfortunately seem to be in the minority on this opinion . . .

btw, Dags, do you have an AIM name?

Posted by Dagonee (Member # 5818) on :

That's how I feel about it, too. An elegant query is a thing of beauty, especially once you can make Venn diagrams in your head that represent the query.

Nope, no AIM name yet. I browse Hatrack enough during class - AIM would be death. [Smile]

You can reach me at the email in my profile, though.

Dagonee