MySQL Normalization

sublimer's picture

They have: 41 posts

Joined: Aug 2006

I am redoing my site's (http://www.sublimestylee) current flat database in exchange for a relational one. I am still trying to get the hang of normalization and wanted to see if I did it correctly.

Here is an image I whipped up of how I think it should be...

Here is an example of an entry for the database...

downloads
1
Song Name
SongName.mp3
564264543
/sddf/sdfsdfasf/sdfsadf/sdfasdf/SongName.mp3
43
11232007
1
1
1

artist
1
Cool Band Name
Rock
Pop
R&B
This is our story
http://blahblah.com/image.gif/
http://blahblah.com/

boot
1
Our Latest Album
This boot contains 15 tracks
http://blahblah.com/image2.gif/

category
1
Album

The above is an example row in each table. The thing that I feel would fail normalization would be the genres. Should I make a table just for genres and then have artist_genre1, artist_genre2, and artist_genre3 become primary keys to the genre table?

I also have a question about redundancy. Lets take the genres for example. If I have 1,000 different artists in my database there will be several artists with genres in common. So lets say I don't include a genre table. If you were to look through all the records of the artist table you would see "rock," "reggae," and "blues" for example probably dozens of times. This would be considered redundant, right? But, how does that differ from replacing all of those genre names with foreign keys. Then instead of having all the genre names, you have a bunch of numbers. So you would see the number "3" for example dozens of times. Wouldn't this also be considered redundant.

I think I missed the boat on understanding how to prevent redundancy.

Thanks for all the help you guys can provide. You have always been great in the past! :wave:

He has: 1,380 posts

Joined: Feb 2002

At a glance, it looks pretty good. You have the general right idea... id's and primary key's are your friend. They let you easily cross reference between tables.

As for your redundancy question... something I would do would be to possibly not have "artist_genre1", "artist_genre2", etc... but rather have a field for EACH genre.

So an artist entry would have the fields (in addition to what you have already):

rb
rock
metal
jazz
classical
rap
hip-hop
funk
spoken
...etc
'

What would you do with that? Those would all be booleans, basically: an int(1) type. 1 = genre is true for that artist, 0 = not.

So... for someone like... Metallica, you might have

metal = 1
rock = 1
neoclassical = 1
heavy = 1
jazz = 0
funk = 0
... etc
'

This allows you to easily query the database...

Looking for metal artists?

"SELECT * FROM artists WHERE metal = 1"
'

Looking for an artist that is funk, jazz, and metal combined?

'SELECT * FROM artists WHERE metal = 1, jazz = 1, funk = 1"
'

Does that help? I think I was pretty clear.

Are you going to be using PHP to run the queries? Or something else?

A suggestion I would have would be to absolutely decide on a db/table structure before implementing anything... I've learned that the hard way.

A final suggestion would be to keep the entries to the bare minimum possible... so instead of having "boot_info" say "this boot contains 15 tracks"... create a "boot_tracks" field and then just insert "15"... and use whatever language you're in to display the text differently. This would be across the board... keep the data either simple numbers, or simple phrases (besides things like names and titles), it will make your life easier.

sublimer's picture

They have: 41 posts

Joined: Aug 2006

So, you lets say I have a good 30 different genres, just make a table with the current artist table info, drop artist_genre1, 2 and 3, and then add 30 different columns with the names of the genres?

I will be using PHP to make my queries. The "this boot contains 15 tracks" was just something I used as a filler, it could be anything. I intend it to be a paragraph or so about the album. It would include things like where it was recorded, special guests on the album, like a bio of the album.

Once I get the table set up, I will create some indexes and all too. I really want to get this database done right.

JeevesBond's picture

He has: 3,956 posts

Joined: Jun 2002

That diagram violates first and second normal forms. artist violates the first and downloads the second. Something like the attached might be more in order.

To be in first normal form all values must be atomic. Meaning no repeating groups like artist_genre1, artist_genre2, artist_genre3. So in the diagram I've split the table to form a many-to-many relationship, think of it like this: one genre has many many artists, whilst one artist has many genres. You could constrain it to one genre by removing the linker artist_genre table.

To be in second normal form all values must be dependent on the primary key. This means a table must describe one thing and one thing only: in the downloads table you're lumping information about available files in with the downloads, this means attributes like filename, filesize etc. are not dependent upon the primary key (download_id). A good clue to this is the attribute: download_count, if there's to be one row for each download how are you to include a download_count? How does this describe the download?

download_count is actually superfluous, since the information can be found by aggregating download and file information. Something like: [incode]SELECT COUNT(*) FROM download INNER JOIN file ON download.file_id=file.file_id WHERE file_name='test.mp3'[/incode].

*** EDIT ***
I had no idea what boot is so have guessed where it should go. Smiling

a Padded Cell our articles site!

sublimer's picture

They have: 41 posts

Joined: Aug 2006

Thank you very much JeevesBond. download_count is a measure of how many times the file has been downloaded, it's not an ID.

He has: 1,380 posts

Joined: Feb 2002

Sublimer: About the genres... yes.

JeevesBond's picture

He has: 3,956 posts

Joined: Jun 2002

sublimer wrote: download_count is a measure of how many times the file has been downloaded, it's not an ID.

Yes I realised that. Smiling

It's superfluous though, the download table is describing two things: downloads and files, which is incorrect.

a Padded Cell our articles site!

sublimer's picture

They have: 41 posts

Joined: Aug 2006

so then how do i track how many times a file has been downloaded? do i make a new table that stores the download_id along with download_count? why would it be so bad just to leave download_count with the download table?

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.