posted
I've already searched the tech forums and I can't find an answer to this question. I figured I'd give it a shot here.
I'm having problems getting something to work with a MySQL database that I'm working with. Pretty much, I've got a many to many relationship and I'm tying with one SQL statement to check if an id from the one table is linked to all the ids from the other table.
The setup is I have a Sources(like a book or an article) table and an Authors table, which a linked by a Author_To_Source that maps an author_id to a source_id. So I'm trying to check a source_id to see if it is linked to all the author_ids in a list I provide.
Now, MySQL doesn't have cross-tab queries, so right now, I'm using a query like this:
SELECT Author_To_Source.Source_ID, Sum(IIf(Author_ID IN ([Author List.Author_IDs]),1,0)) AS Count FROM Author_To_Source GROUP BY Author_To_Source.Source_ID HAVING (((Sum(IIf(Author_ID IN ([Author List.Author_IDs]),1,0))=[Author List.Count]));
Now, this works, but it requires a bit of complex nerfing to get the [Author List.Count] number. So, I'm wondering, is there someone out there with better SQL experience than I who can tell me the obvious way of checking a complete many to many relationship in a single SQL statement, or a way of, in a SQL statement, taking a list of IDs like (1,2,5,6) and getting the number of elements in it? Thanks.
Posts: 10177 | Registered: Apr 2001
| IP: Logged |
posted
Forgive me if I'm not understanding...but are you trying to get a source count by author? Or (rereading) are you trying to get a source count for authors on a list you're providing?
Posts: 5948 | Registered: Jun 2001
| IP: Logged |
posted
jeniwren, I'm trying to check if a given source is associated with all the authors in a list I proivde. The statement I have counts up all the authors that are in the list that are associated with each individual source and selects those sources where the total number of matching authors equals the number of authors in the list. Since an author can only be associated with a source once, this works to check to see if the source is associated with all the authors in the list.
Posts: 10177 | Registered: Apr 2001
| IP: Logged |
posted
This is probably not helpful (I'm in the midst of learning SQL syntax, but have done query writing in other languages for many years, especially for reports and the like), but you need to simplify what you're doing then go more complex as needed. Queries with known many-to-many relationship links are a recipe for frustration...and suspicious data returns. One way might be to summarize your provided list to a count, then summarize the author to source table by source to a count. You should then be able to link the two summaries together and determine if your counts match. This means you'd have to do temp tables, but really, if you end up with results you're not sure of, you can debug much more easily.
Sorry if this doesn't help or if I've misunderstood what you're trying to do. I'm not sure I understand what a "source" is in practical terms. I was initially picturing them as a book with many authors, but then you said that no author could be on more than one source, so that shot a hole in what I was visualizing.
Posts: 5948 | Registered: Jun 2001
| IP: Logged |
Off the top of my head, I'd suggest (may need tweaking to be valid MySQL, but you get the idea). . .
SELECT DISTINCT authors.authorid FROM authors LEFT JOIN author_to_source AS a_t_s ON a_t_s.authorid = authors.authorid LEFT JOIN sources ON sources.sourceid = a_t_s.sourceid WHERE sources.sourceid IN (. . .list of source ids. . .)
No, that doesn't work, that just gets all the unique authors for the group . . . silly me, wrong direction. Lets see . . . Oh yes, I see.
I think your solution is pretty much the easiest way, but it can be refined.
SELECT count(a_t_s.authorid) FROM sources LEFT JOIN author_to_source AS a_t_s ON a_t_s.sourceid = sources.sourceid WHERE sourceid IN (. . .list of souce ids. . .) AND a_t_s.authorid = test_authorid
(you could of course also use a USING clause instead of ON, it doesn't really matter, I tend to prefer ON)
Then just compare the result to the number of sources. In fact . . .
SELECT count(. . .list of source ids. . .) = count(a_t_s.authorid) FROM sources LEFT JOIN author_to_source AS a_t_s ON a_t_s.sourceid = sources.sourceid WHERE sourceid IN (. . .list of souce ids. . .) AND a_t_s.authorid = test_authorid
It'll return 1 if the counts match up, 0 if they don't. This has not been tested by me, but I'm pretty certain it'll work.
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
posted
jeniwren -- you don't need temp tables, you can just JOIN existing tables (I was writing my above reply while you were writing yours).
Also, its not that no author can be on more than one source, its that each author can only appear once for a source (that is, nobody is twice-or-more-over the author of a single work).
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
posted
fugu's idea is right on (I haven't verified syntax either, but the idea is sound). One caveat is that the join table must have a primary key that is the combination of the primary key from the author table and the primary key from the source table, and nothing else.
It seems like your schema is set up that way, but it's the most common mistake I've seen in this kind of query, so I thought I'd mention it.
For example, if the source were actually related to article, and author was related to article, you couldn't just compare the numbers without doing some aggregation first, unless your variation supports COUNTDISTINCT (few do). You'd have to do something similar in principle but quite different in the details.
Dagonee Edit: Now that I think about it, you should be able to do this with just the join table, since it should have the author_id in it. No need to JOIN in the author table unless you need some attributes of the author, and then it's probably faster to use another query.
posted
Oh, and I hope you're using InnoDB tables so you can have two foreign keys in a_t_s, one linking sourceid with sourceid and one linking authorid with authorid. That'll speed my solution up considerably, for instance.
Dags is right, if you need to grab author information then a two column primary key on authorid/sourceid is your friend. Basically the two foreign keys and one joint key would work together to make the entire three-table join "free". You don't need author information in this instance, but I'm betting it'll be useful for you to be able to pull author information based on source information in a single (very fast) query.
posted
I had some very complex aggregation/cross-tab queries for one of my applications, and it was amazing what the order of tables and adding (or even deleting) foreign keys could do.
Just changing those two items resulted in query times of less than half a second to more than 3 minutes.
I only mention the extra tables because we got three orders of magnitude improvements in speed by removing some intervening tables in these queries. I was constantly harping on my programmers to not slavishly follow the ERD in making their joins.
Of course, this required taking extra effort to ensure that a unique key was underlying at least one side of each join, but the time savings were phenomenal.
posted
I love SQL Its so peaceful to let one's mind slip towards the simplicity of an elegant query . . . I unfortunately seem to be in the minority on this opinion . . .
btw, Dags, do you have an AIM name?
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
posted
That's how I feel about it, too. An elegant query is a thing of beauty, especially once you can make Venn diagrams in your head that represent the query.
Nope, no AIM name yet. I browse Hatrack enough during class - AIM would be death. You can reach me at the email in my profile, though.