Grouping individual user data in one table

greg's picture

He has: 1,581 posts

Joined: Nov 2005

Basically, each member on my site has a profile page, and members can leave comments on other members profiles.
So in my table that stores the comments, there can be multiple entries for the same user.

When someone loads a profile page, a DB query goes in the table and displays all comments for that user. (select * where user_to = $user)
But there could be massive amounts of data for numerous members, and all mixed up throughout the table.

Is there any way to somehow group each users data so when doing a select to get this data is speeds it up?

Another question.
I have the need to store in 40 fields of data in a table. I need 20 of these tables.
70% of the fields will be yes or no, the other 30% varchars of about 12 chars average
Is that ok or is it better to make three tables with 13 fields in each?

So splitting them would mean 60 tables with 12 fields.
Or is 20 tables with 40 fields ok/better.

blater's picture

They have: 2 posts

Joined: May 2008

This is what an index will do. Relational databases have very efficient index structures. If you ensure you have an index on user_to and your table has 10,000 rows, then instead of having to check all 10,000 rows individually it may only have to check 4 or 5 rows, as it is (depending on the low-level implementation of the index) navigating down a tree to the right bunch of data instead of stupidly checking every row one by one.
The second question cannot be answered without some context. Are the fields of data logically related? Do they group together naturally into a table?
Do you think that splitting into small tables will help performance? - the answer to that is likely to be no.

greg's picture

He has: 1,581 posts

Joined: Nov 2005

I read that indexing can result in performance drop in certain circumstances as MYSQl has to create, update and delete data to the index list it stores.
So it's six for one method and half a dozen for the other.
I presume that would of course depend on the data stored, how many rows in total and how many rows are actually to be grouped.

As for the second question.. the rows are not related to each other at all within the table as each row is for one user only.
The data between fields within one row is obviously not related as it's different data.

teammatt3's picture

He has: 2,102 posts

Joined: Sep 2003

If "user_to" is part of the primary key, it is already being indexed.

If a table is being written to a lot more than it's being read from, an index isn't really a great thing to have. Like you said, the index has to be updated every time a row is inserted or deleted.

An index works best on columns with a high level of selectivity. That is, the uniqueness of the data in that column. A primary key is always indexed because it is alway unique. To find the selectivity of the column, divide the the distinct values contained in the data set by the total number of records in the data set. The larger the quotient, the better an index will perform on that column.

I'm a little concerned that you're prematurely optimizing your SQL and database structure. It's good to keep in mind what performance problem you may have in the future, but throwing indexes on every column because you can, is not always the best idea, and it could backfire on you.

P.S. I recommend you get a copy of Pro MySQL by Michael Kruckenberg and Jay Pipes. It's a great read, and a great reference to have. It has a chapter on profiling and that's something you'll want to do so you know what you're doing to the structure is increasing the performance.

JeevesBond's picture

He has: 3,956 posts

Joined: Jun 2002

Another question.
I have the need to store in 40 fields of data in a table. I need 20 of these tables.
70% of the fields will be yes or no, the other 30% varchars of about 12 chars average
Is that ok or is it better to make three tables with 13 fields in each?

The only meaningful answer to this is: normalise it! As teammatt3 pointed out, it seems like you're prematurely optimising, moreover you're getting paranoid about what's right and wrong.

Generally, if you're going to sort by a field often, index it! And for the future: ensure you've got the MySQL slow log switched on, that can help you find queries which are running slowly.

As a rough guide, 40 fields of booleans sounds rather a lot. Sounds like you need a 'settings' table with a many-to-many relationship with your users table. Also, throwing around lots of tables is a bad sign. 20 tables? For what? Each table should relate to something, and should never be arbitrarily split to reduce the number of fields in each one.

a Padded Cell our articles site!

greg's picture

He has: 1,581 posts

Joined: Nov 2005

Thanks for the info peeps.

I read that article on normalisation before (that exact one actually), and really don't see I can do anything else.

Ok, I seem to have two topics in this thread (my bad).
So I've continued the discussion of the 20 tables and 40 fields in another thread.
I used the members area as I provided more info that I'd like to keep a little more private...

http://www.webmaster-forums.net/members-only/continuedgrouping-individua...

.
.
.

Continuing this thread about having multiple rows in one table for the same USERNAME.

Say member "John" has had 3 other members comment on his profile page.
So in table "profile_commnets" John has 3 rows.

Take a 1,000 members with an average of 3 comments per member, that's 3,000 rows in table.
$username = John;
When I query - select * where username=$username - I only want 3 rows, but mysql has to tral through 3,000

John's 1st comment might be row id 100, then his second is row id 1,200, then his third is row id 2,500
I was wanting a solution to avoid mysql having to trawl through 3,000 rows to find only three. But not knowing how indexes work I don't know if there will be any benefit, and dont know any other solutions.

As I understand it, an index would take the username "John" (as the index reference) and relate that reference to the ID in the table profile_comments, so "100" "1,200" and "2,500" (Johns comments row ID's)

So it goes into it's index list and gets all for "John". But there are 1,000 other members with comments stored in this index, so surely that wouldn't help any. Yes it's a third less, but with the having to search index, then goto actual table with found ID's and the fact it has to update/delete the index I don't see whre it would help.

Sites such as forums must have tens of thousands of rows with duplicate username entries for each post they make.
Like I have 360 rows in this sites DB, that has to be queried when I select "my topics".

Perhaps I just have to simply store them all and let mysql do its job?

They have: 4 posts

Joined: May 2008

I am not sure if i am understanding you correctly and i am by no means an expert on Databases or profile type site but:
comment table is auto incremented and numbered maybe 3 fields:
Comment_number, COmment, Comment_added_by_user

users table contains additonal fields:
user_comments--> containing comma seperated values of each of the users comments that they made (Ex. 1,1000,3500)
comments_by_other_users--> containing comma seperated values of each of comments that were posted by other users

Now when a users profile is loaded you can fetchrow or whatever for just those specific comments and no need to search the DB. Same way When you want to show the comments_by_other_users. Just seperate the comma seperated value with a script and get those comments directly. ALso you can get Comments_added_by_user if you want to show what user posted what comment.
I hope this helps.

They have: 76 posts

Joined: Jul 2008

To present things in groups, we first rely on SQL to group the data before it.This is done with a GROUP BY or ORDER BY clause.

greg's picture

He has: 1,581 posts

Joined: Nov 2005

Yes, but group by doesn't change the amount of rows mysql will query. I was wanting to minimise the actual searching for.
So instead of searching through 100,000 rows, I could group things within the DB structure, so it only needed to search thorugh say 20,000 rows for a specific group of things.

Thanks anyway

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.