I have covered how to prevent the same account being used by multiple users at the same time. I therefore wonder (very naturally) how to prevent the same user from registering multiple accounts.
There is many reasons that we don’t want the same user holding multiple accounts. By using different accounts, users may vote themselves up by, creating fake discussion and opinion, consuming extra resources on the website etc. So we want to ensure just one account per user.
This is usually done by charging users for creating accounts, or verifying the user by some real-name identifiers (such personal identification number or business registration number).
Charging users for creating accounts does help reduce duplicated accounts, dramatically. Whenever it comes to money, people are very careful and thoughtful (or “mean” if you like) so they would prefer to stick with one account. However, there is one exception: If the user’s benefit from holding multiple accounts is greater than the cost of creating extra accounts, they will be happy to pay for it.
Verifying the real-name identifiers is the most reliable approach to prevent duplicated accounts. But it is also the most difficult approach when it comes to a real practice. There is no universal standard of real-name identifier. Each country has their own identifier, unless your website is targeting only one or two countries, maintaining the formats of each identifier is silly and not possible. Even you do so, you are still putting yourself under huge amount of workload because verifying the identifier in right format and checking if it is really associated with a real entity is not easy and usually they are where manual process involved. The even worse: users can easily crack your checking by borrowing the identifier from others (usually from their family members or friends) to create multiple accounts.
Currently I don’t see any website that is doing well in preventing users from holding multiple accounts.
Detect the duplicated accounts
Since we cannot totally stop users holding multiple accounts, we have to work hard to prevent the situation from going worse. We need to find out the duplicated accounts (i.e. accounts held by the same users), and more importantly, do this on a regular basis.
The process of finding duplicated accounts are based on an assumption that : There are always something in common for two accounts that are held by the same user. For example, they may shows the same name (not user name but the name of the account holder), same email address, same password, same address etc. What we need to do is to quantify the level of similarity of two accounts, and sort out the accounts with high similarity. Then we do further investigation on those suspicious accounts.
The approaches to quantify the level of similarity varies depending on the nature of the website and the user account information. From my personal experience, I prefer to use this algorithm:
1. The value of similarity between two accounts is initialized as ZERO.
2. Go through the checking on each items as listed below:
Password: Usually if the user is holding multiple accounts, the user would prefer to use the same password for all the accounts so as to avoid remembering too many passwords. So if two accounts has the same password (or hashed password string), Similarity += 30;
Email address: If the user emails from two accounts are exactly the same, Similarity += 100; It is almost 100% certain that these accounts are used by the same user. However, sometimes the user may not register the account with same email address, instead, they will use another email address with different domain name (i.e. the string after @), and same local-part (i.e. the string before @). So if the two email addresses has the same local-part, (e.g. email@example.com vs. firstname.lastname@example.org), similarity += 40;
Mailing Address, Phone Number, Answer to forget-password questions : If they are the same for two accounts, each one would add 40 to the Similarity.
UserID or User name : Calculate the word similarity with PHP’s similar_text(), convert the result as a percentage, then times 100 and add it to the Similarity. That is: Similarity += round(similar_text(UserID1, UserID2) / max(strlen(UserID1), strlen(UserID2)) * 100)
Date of birth, Sex : add 10;
Last log-in IP : If the IP of the last log-in of two accounts are the same, add 40;
Last log-in User-Agent String : add 10;
3. Finally if the Similarity is greater or equal to 50, then highlight the two accounts.
Don’t ask me how the scores for each item come out. They are just my experience. My only concern is whether these scores work well, and fortunately they do work quite well in my case. You should alter those scores and the checking items to fit your projects. Nevertheless, there are always false alarms from the detection. So the highlighted accounts should pass through a manual screening and confirmation.
Again, we are not experts on web security, we just discuss what we think possible. Comments are always welcomed.