Hi all,
Thank you for your various replies. I'd like to provide some further
information that I probably should have furnished in the first place,
and hopefully my reasoning will seem a little more thought-out (if not
accurate!).
When I mentioned the Linnaen taxonomy, I didn't want to get too much
into what I thought were unnecessary specifics. But in point of fact,
what I'm trying to do is to build a GIS database that maps linguistic
diversity. So the taxonomy comes into play in that linguistics
utilizes such labels. What I would like to do is create separate
feature classes for each level of the taxonomy - e.g., the extent of
families, subfamilies, and so on. (The issue that language boundaries
are never clean-cut is duly acknowledged - this is just what can be
done based on historical data. Also, as it is historical data, the
boundaries are considerably more discrete than they would be if I were
mapping present-day data.)
Here's an example of what I expect users to want to do: they want to
isolate and look at the distribution of a set of languages from
subfamilyA, and compare it to the distribution of an entire
subfamilyB. If families, subfamilies, branches and so on were all in
one large table, and the columns moved from top - down as they went
left - right, surely it would violate 2NF - right? I hope I'm not
wrong about that as well. So I have separate tables: one for
languages that lists their entire taxonomy, which can be used to
generate the set of languages. And, to keep things simple for the
user, they can then just bring in a separate table that has only
subfamilies and select the one they want. That's essentially why I
don't want only a large table that moves from down - top as columns go
from left - right.
Given my need for separate feature classes that can be loaded
independently in a GIS, I threw together the structure that I
mentioned above. (I should say here that while I'm experienced in GIS
and working with databases, it's the first time that I've
independently tried to build something from the ground up. So I
apologize for what must be an odd mix of jargon and cluelessness.)
Given all that, looking at your replies, here is my reply: I concede
the point that having broken down tables the way I did, the UIDs are
not necessary. I didn't mean to imply that it was required nor
"magic". But if it makes sense to have separate tables for each
level, as in my user scenario, why not assign them UIDs? Also, I will
look into nested sets, but I would like to make sure that I'm straight
on this part first. And, looking at my user scenario, I see that I
could have a language table that has family, subfamily, etc as
attributes (i.e., down-top:left-right) and still get away with having
separate tables for each level. I *think* this eliminates the
concerns shown about how I'm not meeting 3NF.
Sorry this is so long - I've tried to be concise. If anybody is still
reading this, I look forward to hearing if I am right about 3NF and
how, given the constraints I've named, I can get to 4NF.
>> Stay informed about: a neophytish normalization question