Hatrack River Forum: SQL question (another computer advice thread)

my profile login | search | faq | forum home

»	Hatrack River Forum » Active Forums » Books, Films, Food and Culture » SQL question (another computer advice thread)

Author

Topic: SQL question (another computer advice thread)

MrSquicky
Member
Member # 1802

posted

I've already searched the tech forums and I can't find an answer to this question. I figured I'd give it a shot here.

I'm having problems getting something to work with a MySQL database that I'm working with. Pretty much, I've got a many to many relationship and I'm tying with one SQL statement to check if an id from the one table is linked to all the ids from the other table.

The setup is I have a Sources(like a book or an article) table and an Authors table, which a linked by a Author_To_Source that maps an author_id to a source_id. So I'm trying to check a source_id to see if it is linked to all the author_ids in a list I provide.

Now, MySQL doesn't have cross-tab queries, so right now, I'm using a query like this:

SELECT Author_To_Source.Source_ID, Sum(IIf(Author_ID IN ([Author List.Author_IDs]),1,0)) AS Count
FROM Author_To_Source
GROUP BY Author_To_Source.Source_ID
HAVING (((Sum(IIf(Author_ID IN ([Author List.Author_IDs]),1,0))=[Author List.Count]));

Now, this works, but it requires a bit of complex nerfing to get the [Author List.Count] number. So, I'm wondering, is there someone out there with better SQL experience than I who can tell me the obvious way of checking a complete many to many relationship in a single SQL statement, or a way of, in a SQL statement, taking a list of IDs like (1,2,5,6) and getting the number of elements in it? Thanks.

Posts: 10177 | Registered: Apr 2001 | IP: Logged |

jeniwren
Member
Member # 2002

posted

Forgive me if I'm not understanding...but are you trying to get a source count by author? Or (rereading) are you trying to get a source count for authors on a list you're providing?

Posts: 5948 | Registered: Jun 2001 | IP: Logged |

MrSquicky
Member
Member # 1802

posted

jeniwren,
I'm trying to check if a given source is associated with all the authors in a list I proivde. The statement I have counts up all the authors that are in the list that are associated with each individual source and selects those sources where the total number of matching authors equals the number of authors in the list. Since an author can only be associated with a source once, this works to check to see if the source is associated with all the authors in the list.

Posts: 10177 | Registered: Apr 2001 | IP: Logged |

jeniwren
Member
Member # 2002

posted

This is probably not helpful (I'm in the midst of learning SQL syntax, but have done query writing in other languages for many years, especially for reports and the like), but you need to simplify what you're doing then go more complex as needed. Queries with known many-to-many relationship links are a recipe for frustration...and suspicious data returns. One way might be to summarize your provided list to a count, then summarize the author to source table by source to a count. You should then be able to link the two summaries together and determine if your counts match. This means you'd have to do temp tables, but really, if you end up with results you're not sure of, you can debug much more easily.

Sorry if this doesn't help or if I've misunderstood what you're trying to do. I'm not sure I understand what a "source" is in practical terms. I was initially picturing them as a book with many authors, but then you said that no author could be on more than one source, so that shot a hole in what I was visualizing.

Posts: 5948 | Registered: Jun 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

Lets see . . .

Off the top of my head, I'd suggest (may need tweaking to be valid MySQL, but you get the idea). . .

SELECT DISTINCT authors.authorid FROM authors LEFT JOIN author_to_source AS a_t_s ON a_t_s.authorid = authors.authorid LEFT JOIN sources ON sources.sourceid = a_t_s.sourceid WHERE sources.sourceid IN (. . .list of source ids. . .)

No, that doesn't work, that just gets all the unique authors for the group . . . silly me, wrong direction. Lets see . . . Oh yes, I see.

I think your solution is pretty much the easiest way, but it can be refined.

SELECT count(a_t_s.authorid) FROM sources LEFT JOIN author_to_source AS a_t_s ON a_t_s.sourceid = sources.sourceid WHERE sourceid IN (. . .list of souce ids. . .) AND a_t_s.authorid = test_authorid

(you could of course also use a USING clause instead of ON, it doesn't really matter, I tend to prefer ON)

Then just compare the result to the number of sources. In fact . . .

SELECT count(. . .list of source ids. . .) = count(a_t_s.authorid) FROM sources LEFT JOIN author_to_source AS a_t_s ON a_t_s.sourceid = sources.sourceid WHERE sourceid IN (. . .list of souce ids. . .) AND a_t_s.authorid = test_authorid

It'll return 1 if the counts match up, 0 if they don't. This has not been tested by me, but I'm pretty certain it'll work.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

jeniwren -- you don't need temp tables, you can just JOIN existing tables (I was writing my above reply while you were writing yours).

Also, its not that no author can be on more than one source, its that each author can only appear once for a source (that is, nobody is twice-or-more-over the author of a single work).

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

jeniwren
Member
Member # 2002

posted

I stand corrected, fugu. [Smile]

That makes *much* more sense.

Posts: 5948 | Registered: Jun 2001 | IP: Logged |

fugu13
Member
Member # 2859

posted

Yeah, I had a bit of a time parsing that myself, but many-to-many relationships can make explanatory semantics difficult.

I hope my answers were of some help.

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Dagonee
Member
Member # 5818

posted

fugu's idea is right on (I haven't verified syntax either, but the idea is sound). One caveat is that the join table must have a primary key that is the combination of the primary key from the author table and the primary key from the source table, and nothing else.

It seems like your schema is set up that way, but it's the most common mistake I've seen in this kind of query, so I thought I'd mention it.

For example, if the source were actually related to article, and author was related to article, you couldn't just compare the numbers without doing some aggregation first, unless your variation supports COUNTDISTINCT (few do). You'd have to do something similar in principle but quite different in the details.

Dagonee
Edit: Now that I think about it, you should be able to do this with just the join table, since it should have the author_id in it. No need to JOIN in the author table unless you need some attributes of the author, and then it's probably faster to use another query.

[ August 12, 2004, 08:53 AM: Message edited by: Dagonee ]

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

Oh, and I hope you're using InnoDB tables so you can have two foreign keys in a_t_s, one linking sourceid with sourceid and one linking authorid with authorid. That'll speed my solution up considerably, for instance.

Dags is right, if you need to grab author information then a two column primary key on authorid/sourceid is your friend. Basically the two foreign keys and one joint key would work together to make the entire three-table join "free". You don't need author information in this instance, but I'm betting it'll be useful for you to be able to pull author information based on source information in a single (very fast) query.

[ August 12, 2004, 10:21 AM: Message edited by: fugu13 ]

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Dagonee
Member
Member # 5818

posted

I had some very complex aggregation/cross-tab queries for one of my applications, and it was amazing what the order of tables and adding (or even deleting) foreign keys could do.

Just changing those two items resulted in query times of less than half a second to more than 3 minutes.

I only mention the extra tables because we got three orders of magnitude improvements in speed by removing some intervening tables in these queries. I was constantly harping on my programmers to not slavishly follow the ERD in making their joins.

Of course, this required taking extra effort to ensure that a unique key was underlying at least one side of each join, but the time savings were phenomenal.

Dagonee

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

fugu13
Member
Member # 2859

posted

I love SQL [Smile]

Its so peaceful to let one's mind slip towards the simplicity of an elegant query . . . I unfortunately seem to be in the minority on this opinion . . .

btw, Dags, do you have an AIM name?

Posts: 15770 | Registered: Dec 2001 | IP: Logged |

Dagonee
Member
Member # 5818

posted

That's how I feel about it, too. An elegant query is a thing of beauty, especially once you can make Venn diagrams in your head that represent the query.

Nope, no AIM name yet. I browse Hatrack enough during class - AIM would be death. [Smile]

You can reach me at the email in my profile, though.

Dagonee

Posts: 26071 | Registered: Oct 2003 | IP: Logged |

Printer-friendly view of this topic

Contact Us | Hatrack River Home Page

Copyright © 2008 Hatrack River Enterprises Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.

Powered by Infopop Corporation
UBB.classic™ 6.7.2