mysql, store array VS delimiter

greg's picture

He has: 1,581 posts

Joined: Nov 2005

On the site are groups (group names chosen by users), and in that group can be XX users, maximum 20 users per group.
More users can join, or users can leave the group. Users within the groups have one of three different authorisation levels. (admin, user and supervisor)

Using multiple fields is not practical for this situation as far as I see, as other wise I would need a row for each user of the group.

So my basic mysql table for that is
id -> group_name -> group_users

id is the index, numeric primary auto inc.
group_name is simply varchar of the groups name

So in group_users I want to store each user name for that group and their auth level.

eg
greg | admin | joe | user | jane | superv

When I gt that info, I would have to take the first string as username, then second string as their auth level, then 3rd string as username next string as their auth level. All very possible, but is this ok?

I could store an array (using serialize() of course) in a mysql field, but is this good practice? Or best for my circumstance?
It seems to me that storing an array with username as key and auth level as value imo that would be better.

But which is considered faster/better. And what mysql field type should I use?

teammatt3's picture

He has: 2,102 posts

Joined: Sep 2003

If I'm understanding your scenario correctly, I think you're going about this the wrong way.

You need to design your database in a way that prevents you from storing a string and then parsing it in your application code. That kind of defeats the purpose of having a database. MySQL is going to scan and retrieve data significantly faster than PHP.

I'd do something like this

tblGroup
- GroupID (int)
- Name (varchar

tblUser2Group
- UserID (int)
- GroupID (int)
- UserLevel (int)

tblUserLevel (three rows containing 1;admin, 2;user and 3;supervisor)
- LevelID (int)
- Name (varchar)

So you have a Group table with the name of the Group and the ID. Then you have another table that links the User to the Group based on the User's ID (I'm assuming you have a User table somewhere) and the Group's ID. But there is another field in that table that allows you to specify the Level of that user within that group. What you put in UserLevel references the LevelIDs you put in tblUserLevel.

pr0gr4mm3r's picture

He has: 1,502 posts

Joined: Sep 2006

Yup, storing multiple sets of data in one field is a violation of the Second Normal Form for database normalization.

I like teammatt's table layout, but I don't think that having a separate table for the user levels may not be necessary if you are just going to have those three key/value pairs. Using the ENUM type would accomplish the same thing. Using a separate table for the levels would only be beneficial if you wanted to add additional levels and store other fields like detailed permission information.

greg's picture

He has: 1,581 posts

Joined: Nov 2005

Thanks for the reply, but....

I don't understand why that way is better.
Your suggestion has two tables, where with my way I would only have one. As I see, it would be more php code with an additional table to query, insert and update.

I don't know your reason for making a reference or link from the Group to the User in an additional table (tblUser2Group). I could just use separate rows for each group with each user and their auth level in one table, as you have done in two tables.
With Group Name in the same table row as the username and their auth level:

tblUserLevel
id -> Group Name -> User Name -> Auth Level

Although, I was trying to avoid using individual rows for each user in a group as I thought it would be better with one row for the group and the users and auth level in one field.

If I have 1,000 groups, with 20 users (max allowed)in each group, with my way of one table with one row for each group with all users and levels in one field, that's a total of 1,000 table rows, in one table.

With your way, each user for each group is a separate row in tblUser2Group and tblUserLevel, that's a total of 40,000 rows - and an additional table.

____________________________

So using the idea of one row for each user and their auth for each group.....

Have a separate column for groupname, username and level, that's still 20,000 rows with the above member base.

I would have thought accessing and searching the db would be faster with such a lot less rows, 1,000 rows VS 20,000 rows (?)
When I query the data, I will only be looking for one row with my way, whereas with one row per user per group, I could be searching for 20 rows, all jumbled up throughout the db.

Please correct me if I have any of this totally wrong, as this is an area I am still learning.

Although I still feel my original way with 1 table and 1,000 rows sounds the best. With 1,000 groups and 20 members per group, even if you say the username and authlevel in one field is bad, and therefore I use individual rows for each user in each group, it's 20 times the rows.

So if I had 30,000 groups with 20 users in each group, my way is 30,000 rows, with separate rows for each user in a group, that's 600,000 rows.
Surely this is not the best way?

----
Although it is a violation of the Second Normal Form for database normalization, I thought that was really a guide for general best practice, rather than a must do for all circumstances.

teammatt3's picture

He has: 2,102 posts

Joined: Sep 2003

There's more than one way to skin a cat. How 'bout you just put everything in one big table. That's what Google does. Smiling

Let me ask you this Greg: Say when someone logs into your site, you want to show a list of what groups their in. How are you going to do that with your setup? You would have to scan every group row, and for every group row you would have to scan every single character for that member's name. That's in slow PHP. That's a ton of string scan and manipulation. That is going to take a long time. If you have 30,000 groups and 20 users in each group (say a username is 10 characters long) you would have to scan more than 6000000 characters. How is that going to scale better than MySQL which can use an index to figure it out in half a second? 600,000 rows is nothing. I have a table with 2 millions rows, and I can lookup a set of records in 0.025 seconds. MySQL is fast man!

The problems with your setup: It doesn't scale, it's inflexible. People use normalized databases for good reasons. Every DBA I've talked to says always start completely normalized. Denormalization should only be used when you really know what you're doing. And some people say you should never denormalize.

Just having fewer columns and tables does not make it better. Actually, Jay Pipes of MySQL says it's better to have two tables with a one to one relationship than one large one with a load of columns that aren't used a lot (that doesn't necessarily apply in your case, but it's good to know).

greg's picture

He has: 1,581 posts

Joined: Nov 2005

The table isn't required to be queried like that, and never will be. If it was I wouldn't have even asked about this, I would have just had separate rows for each user.

Firstly I have a user_main table, which has their timezone, group_name and other stuff. Most of that data is obtained from a query that is included in the header, therefore available in each page.
Timezone lets me show their local time around the site, and the $group_name is the name of the group they are in, so I don't need to query the table that stores all the users and auths for each group.

When a user joins a group, their name and auth level gets stored in the table we have been discussing, but also in the user_main table.

The auth level is only ever used on a settings page, where the admin of the group can define what the two other levels are allowed to do, then the two other levels also depending on what admins set.

If a member of GROUP-A views the profile of GROUP-B, the query will fetch_array where group_name = GROUP-B
the array is unserialize and used in a foreach to list all the players.

The data is also extraced when a user of a group accesses their own groups data, in the same way it gets the array, unserializes it a foreach outputs each user in that group.

So one row is searched for and one field of data is obtained.

I suppose any "best practice" depnds on the circumstance. Perhaps I would still be better of for performance to use a row for each user.

It would allow additional fields and usage of the table in the future, but I can't see any reason I would need to add anything to the table. It is onyl for the purpose of storing each users within a group and their auth level.

teammatt3's picture

He has: 2,102 posts

Joined: Sep 2003

Greg, if you setup your app like that, and if it is used by a lot of people, and if you're forced to maintain it and build new features, you will regret building it that way.

When a user joins a group, their name and auth level gets stored in the table we have been discussing, but also in the user_main table.

So you're putting the same data in two places. That will be a nightmare to maintain.

Earlier you said

As I see, it would be more php code with an additional table to query, insert and update.

Well now every time someone changes a group you are running more queries because you have to manually maintain the integrity of the data in both places. You're setup ain't much better if you're counting queries.

But let's continue...

That also means a user can only be part of one group (unless you're going to delimit the data you put in the user_main table in the group_name column). Which again, is not a good way of doing it.

The table isn't required to be queried like that, and never will be.

Thinking like that gets programmers in a lot of trouble. How do you know that you will never need to find out which user is in which groups? That seems like a pretty standard thing to me (that's why I used that example). What if your users what to find out which groups their friend is in so they can join them?

You're concerned about scalability which is ok, but that also means you have to look at how your program will change overtime. Your setup is not flexible.

If a member of GROUP-A views the profile of GROUP-B, the query will fetch_array where group_name = GROUP-B
the array is unserialize and used in a foreach to list all the players.

With a normalized design, you could run one single query, and do a while loop to loop through the rows returned. That would eliminate the need to use unserialize (or something like it).

I suppose any "best practice" depnds on the circumstance.

Of course is does, and what makes you think your application is so different and unique that best practices don't apply?

Once again Greg, you will regret designing your application like that. I'm not trying to make it slow and bulky, I'm trying to get you to rely on the power of MySQL and apply the principles of normalized database schemes to make sure your application performs in all areas.

greg's picture

He has: 1,581 posts

Joined: Nov 2005

I understand your advising me for what you think is the better, and not to make it slow and bulky, which is why I an grateful of the discussion.

teammatt3 wrote:

When a user joins a group, their name and auth level gets stored in the table we have been discussing, but also in the user_main table.

So you're putting the same data in two places. That will be a nightmare to maintain.

Well, not really. Users can only be in one group at any one time.
So should a user leave a group, the group_users table gets the user removed from the array and the users_main gets changed to "no". Then if they join another group, users_main is changed to their new group name and the new groups users_groups array has the new user added.

I simply change two fields in two tables, it's not a complex issue really.

I suppose arguably removing one row where username = $username is easier and much faster than getting array, removing user where $key = etc and then updating the table again.
And of course the same for adding them.

BUT, it is for xbox live, so people, 90% of the time, already have a group and members they want as it's friends they already know. Groups are formed from people who already know each other really.

SO, people wont be changing groups very much as groups are more like teams. The site doesn't really serve any content of its own, the content is from statistics that users and groups enter, and teams are formed and played against each other, so staying in the same team to build up stats and score is going to be the norm.

In this case, having the usergroup name in two places is much more efficient.
I need a table that stores groups, the group's users and their auth level. So that table is required.
I also, in the nav menu and 80% of pages in the member area, need to know what group they are in to serve them options for their group, so adding their group ALSO in users_man is saving me having to query the large table for what group they are in, even if I were to use one row per user in a group.

I query users_main on all pages (header incl) for certain user data anyway that is used on nearly all pages - timezone, pm unread and checking if banned so the banning system is instant and not next login. (there will be a lot of banning due to the nature of the site)
SO also storing their group in users_main is the most efficient way to do it as that table is used a great deal.

teammatt3 wrote:
What if your users what to find out which groups their friend is in so they can join them?
It is invite only. A registered user can create a group and invite others, other members of the group (with auth) can also invite other registered users.
The registration for the entire site is based around the phpbb3 login system and the forum has a place to recruit new members or people to find a new group.

teammatt3 wrote:
How do you know that you will never need to find out which user is in which groups?
I will, but that is simple as I would have the group name OR the username. With username I query the users_main -> select group_name where username = $user.
Obviously the reverse for group name, or query the groups_main table that has all the groups listed (and their stats etc)
Then should I want to get all users for that group, I query user_groups table, select users_auth where group_name = $group_name
That returns my array with all the users for that group, and their auth level. One table row.

I take in what you say about future developments though, and am thinking I will make them on separate rows, although the way the site works this wont ever really be needed.
It is more easier from a programmers point of view to use one row for each user in a group, I just thought having one row per group with all users and auths in one field would be better when mysql is searching the table.

I still can't get my head around how having mysql search for 20 rows jumbled somewhere within 40,000 rows is better than searching 1,000 rows for one row.
Then simply having php unserialize an array, and use the array as needed with the username as the key. As I only need this on 2 pages, it's really not going to be used very much.

Ahh, complications when trying to carry out precise best practise.
I might just go and work in a supermarket Laugh

pr0gr4mm3r's picture

He has: 1,502 posts

Joined: Sep 2006

BUT, it is for xbox live, so people, 90% of the time, already have a group and members they want as it's friends they already know. Groups are formed from people who already know each other really.

I think you are missing the point. Matt is trying to say that it doesn't matter what it is being used for now. If you have an opportunity to make it run more efficient, why not take advantage of that, even if it's only a tiny improvement? That will translate to a big difference if your website gets a couple thousand members. Think of it this way - you said that people won't be changing groups often, but what if someone creates a croup one evening that everybody wants to be a part of? If a few thousand users change their group settings with that inefficient method of updating, it would bring the server to its knees.

Good discussion though. It really gives other people some perspective on some best practices.

You are going to kick yourself if your site grows, and you find your server CPU usage more than you can handle.

greg's picture

He has: 1,581 posts

Joined: Nov 2005

pr0gr4mm3r wrote:
If you have an opportunity to make it run more efficient, why not take advantage of that, even if it's only a tiny improvement?
That is exactly my thoughts, and why I asked in the first place.
I do think of the future potentials. I would love for the site to have 1 million users, and therefore I always code and setup DB's with that thought in mind, even though I know in reality I would be happy with a few thousand.
Even with a few hundred I want it to run as smoothly and efficiently as possible.
pr0gr4mm3r wrote:
you said that people won't be changing groups often, but what if someone creates a croup one evening that everybody wants to be a part of?
20 members per group maximum, and this will never be changed. "How do I know that wont change", because the nature of the site and way it works requires small groups.
Again, "best practice" depends on the circumstance.

pr0gr4mm3r wrote:

If a few thousand users change their group settings with that inefficient method of updating, it would bring the server to its knees.
The settings is a separate table for groups, and doesn't have a lot of columns, and single rows for each group.
What we are discussing is ONLY for storing all members of a group and each members pre-set auth level.
admin, supervisor and member.
The actual settings for these three are changed and stored in a separate table to the members and the auth they have.

pr0gr4mm3r wrote:
You are going to kick yourself if your site grows, and you find your server CPU usage more than you can handle.
Again, I think like that too, and is why I often ask these questions regarding databases and PHP code.

teammatt3's picture

He has: 2,102 posts

Joined: Sep 2003

Well, not really. Users can only be in one group at any one time. So should a user leave a group, the group_users table gets the user removed from the array and the users_main gets changed to "no". Then if they join another group, users_main is changed to their new group name and the new groups users_groups array has the new user added.

If a user can only be in one group and you know that will never change (I still don't think that is a safe assumption), then I think you could do something like the following. This might not seemed normalized because the Level is in the User table. It is normalized because Level is actually dependent on the user, not the group (because a user can only be part of one group, and it will always be that way).

tblGroup
- GroupID
- Name

tblUser
- UserID
- Username
- GroupID
- Level (this could reference another table [tblLevel], or have the Level value string like pr0g suggested).

I have a feeling you won't like this setup either because you have to query the User table to figure out which users are in which group, as opposed to "just" querying the Group table, parsing the big column with that serialized array, and looping through the data. But at least this way you can't complain about "extra" tables and columns Wink. It's still better (as in more maintainable, flexible and efficient) than your first setup.

Quote:

I still can't get my head around how having mysql search for 20 rows jumbled somewhere within 40,000 rows is better than searching 1,000 rows for one row.

Well no wonder! That doesn't make any sense! But it doesn't apply to your case. You're not just searching for one record, you query for that record, fetch it, unserialize it, and then loop through it.

greg's picture

He has: 1,581 posts

Joined: Nov 2005

Yeah, your suggestion is pretty much what I said in my last post when I started to think one row per users might be better as it loses the retrieving of arrays..

tblUserLevel
id -> Group Name -> User Name -> Auth Level

teammatt3 wrote:
Quote:

I still can't get my head around how having mysql search for 20 rows jumbled somewhere within 40,000 rows is better than searching 1,000 rows for one row.
You're not just searching for one record, you query for that record, fetch it, unserialize it, and then loop through it.

Understood, but the other way is still searching the db for data, fetching it and then using the resulting array in various ways in the php.
The only differences between the two methods are the total number of rows searched for, the number of rows in total to search within and unserialise().
Search one row from 1,000 and unserialise() VS search 20 rows from 40,000 rows

That again is of course only with 1,000 groups
With 50,000 groups it would be finding one row in 50,000 rows VS finding 20 rows from 1 million rows

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.