How should password information be stored?

Plain text is NEVER an option

In last Christmas (2011), there was a breaking news (reported: 1, 2, 3) that over 6 million online user credentials were leaked out from CSDN, a popular China online community for programmers that stores user credentials in plant text, resulting in 50 million user accounts (users use the same credential for multiple websites, including social networks and personal emails) put under security risk.

As we never know when and how hackers steal the user records in the database, storing user credential information in plain text format in the database is never an option for any reasons.

Encryption of password

As a good practice, the user credentials should be encrypted to write to the database so that even the database records are exposed by hackers, the user credential is still less risky. There is many ways to encrypt password, namely using a hash function, adding static salt,  adding dynamic information.

Hashing

Hashing a password is the way to convert the password from a human-recognizable strings (e.g. “QuoKa88$%”) into a set of non-recognizable strings (e.g. “398db0fdd8a26857080a77fd4996377d”) so that hackers who steals the user credential cannot recognize the actual password string.

Hashing the password is the very first and the most basic step to store password information. Commonly used hash functions in PHP include sha1(), md5() and hash().

<?php

$password_hashed = md5($password);

?>

However, it’s clearly stated in the PHP documentation that solely using these functions to secure password is not recommended as they can be easily hacked by modem computer’s computation abilities.

Adding salt

One of the attack that breaks the hashed password is called “Dictionary attacks“. Generally users tend to adopt meaningful words as the password to that they can easily remember it. Hackers collect million pairs of meaningful words and their hashed string to build a “hash dictionary”. So when they obtain a hashed password, they can look up the “hash dictionary” and find out the original password.

Adding salt to the hashed password will make the password more secure against the dictionary attack. Salt is additional strings, generated randomly, added to the password  so that the password will no likely exists in the “hash dictionary”.

There are two ways to add salt to a password. The first is to add the salt to the original password so as to make the original password longer in length (increase the password complexity) and less likely to be found in an ordinary dictionary. The second is to add the salt to the hashed password so as to make the “hash dictionary” malfunction.

<?php
$salt_before_hash = "eGj1&E2%k@";

$salt_after_hash = "12fd53a3b";

$password_hashed = md5($salt_before_hash.$password)  . $salt_after_hash;
?>

Adding dynamic information

Adding salt to password, either before or after the hashing, has one potential vulnerability : the salt is a static string (no matter how complicated it is) that the hacker can easily identify the salt by comparing the multiple hashed passwords. Once the table of user credentials is being stolen, hackers only need a few days (if not a few hours) to break the salts.

To fix this vulnerability, we could use dynamic information (such as date of account created, record row ID, user ID, check sum of the password itself etc.) which is static to one user but dynamic to other users.

<?php

$salt_before_hash =$userID . $date_user_account_created;

$salt_after_hash = dechex(crc32($password));

$password_hashed =md5($salt_before_hash.$password)  . $salt_after_hash);

?>

Adding interference

Finally we can add interference in the whole process to make further secure the password encryption. Commonly used interference include re-odering the sequence of the strings and cut the hashed password to a long-enough length.

<?php

$salt_before_hash = substr($userID . $date_user_account_created, 5,10);

$salt_after_hash = dechex(crc32($password));

$password_hashed =strrev(substr(md5($salt_before_hash.$password))  . $salt_after_hash, -25));

?>

PHP isset() and multi-dimentional array

** The issue discussed in this post has been fixed since PHP5.4.0, so the below discussion and solution are for PHP 5.3.x or lower. Thanks David for clarifying.

A few weeks ago I covered how to check the existence of an array element in PHP. In the post I explained why isset() is dangerous to check the existence of elements in an array. I also proposed a better solution (the isset()+array_key_exists() method) to do the checking.

Today I’m going to discuss another strange (and dangerous) behavior brought along with isset() function and multi-dimensional arrays.

The problem

Let’s consider this simple code:

<!--?php $a = array('test'=-->'ABC');
var_dump(isset($a['test']));                       //true
var_dump(isset($a['test']['non_exist']));          //true?!!
var_dump(isset($a['test']['non_exist']) || array_key_exists('non_exist', $a['test'])); //true again?!!!
?>

Surprise, huh? Isset() returns true for a non-exist element!

What even worse is that the previous proposed method (the isset()+ array_key_exists() method) also gives a wrong result! This is because isset() returns true for the non_exist element so the overall OR operation will become “true”. The array_key_exists() is never implemented.

The reason

So why isset() returns true for a non-exist element? I’m not sure the exact reason but I have a guess:

PHP first look at $a[‘test’]. Since $a[‘test’] does exist, isset($a[‘test’]) returns true. Then PHP checks the 2nd dimension: the ‘non_exist’ element. As $a[‘test’] is a string, it is also considered as an array (In PHP, string is a sequential array by type-casting). When checking the sequential array where all index should be integers, the index [‘non_exist’] is **converted** to an integer which equals zero. So actually PHP is checking isset($a[‘test’][0]). Unfortunately $a[‘test’][0] does really exists (with value ‘A’). So the overall result of this checking is “true”.

To verify this guess, let’s run this code:

<!--?php $a = array(1=-->'', 2=>'ABC');
var_dump(isset($a[1])); //true
var_dump(isset($a[1]['t'])); //false => $a[1] is empty string, $a[1][0] doesn't exist
var_dump(isset($a[2])); //true
var_dump(isset($a[2]['t'])); //true => $a[2] is 'abc', so $a[2][0] exists and equals 'A'.
?>

The result has shown that my guess is pretty reasonable.

The solution

You say: OK, I know your guess is somehow right, so how to fix it?

Usually when we check the existence of elements in multiple dimensional array, we use  something like

array_key_exists('non_exist', $a['test']); 

Yes. This is true…but if you really do so in our case, you will get this warning:

Warning: array_key_exists() expects parameter 2 to be array, string given 

Somehow for unknown reason array_key_exists() doesn’t consider string as array now and is complaining us.

So what’s the solution?

Complete array element existence checking function

Combined with what I proposed in the previous and this post, I have worked out a function that checks whether an element does exist in an array, regardless the array’s dimensions:

<!--?php function elementExists($key, $array){     if (is_array($key)) {         $curArray = $array; 		$lastKey = array_pop($key); 		foreach($key as $oneKey) { 			if (!elementExists($oneKey, $curArray)) return false; 			$curArray = $curArray[$oneKey]; 		} 		return is_array($curArray) && elementExists($lastKey, $curArray); 	} else { 		return isset($array[$key]) || array_key_exists($key, $array); 	} } $a=array(1,2,3,4, 'dim1'=-->array('dim2'=>array('dim3'=>null)));

//multi-dimension : check if $a['dim1']['dim2']['dim3']['dim4'] exists:
var_dump(elementExists(array('dim1', 'dim2', 'dim3', 'dim4'), $a)); //false

//multi-dimension : check if $a['dim1']['dim2']['dim3'] exists:
var_dump(elementExists(array('dim1', 'dim2', 'dim3'), $a)); //true

//single dimension : check if $a['dim1'] exists:
var_dump(elementExists('dim1', $a)); //true
?>

This piece of codes looks quite awful and dirty, and its performance  is not evaluated. I think there are more elegant (and faster) codes to do the same thing. Since I’m in a hurry and got to complete my project ASAP, I prefer to leave it as it is now.

Comments are always welcomed!

Really simple CSS compression function in PHP

Thanks to the simple syntax of CSS, we can easily compress the CSS code with a PHP function of only 3 lines of codes.

Since the compression will remove all the code indention and comments so it should be apply to production codes only.

Here is the CSS compression that I used in my projects.


function CSS_Compress(&$css){
    //new lines, multiple spaces/tabs/newlines
    $css = preg_replace('/[\r\n\t\s]+/s', ' ', $css);
    //remove comments
    $css = preg_replace('#/\*.*?\*/#', '', $css);
    //remove extra single spaces
    $css = preg_replace('/[\s]*([\{\},;:])[\s]*/', '\1', $css);
}

Check if an array is associative or sequential(indexed) in PHP

In PHP, depending on the type of the keys in an array, the array can be identified as associative (if the key is a string like $arr[‘mykey’]) or sequential (some may call it indexed, i.e. if the key is an integer like $arr[1], $arr[2], etc.).

Although the internal representation of both associative and sequential arrays are the same in the PHP’s core (Both are ordered map), in some situations we still need to differentiate one from another. There is no built-in function to tell if an array is associative or sequential, so we have to write our own.

The most popular method to check this is to compare the keys of an array with the result of range(), like this, this and this:


function isAssoc($arr){
	return array_keys($arr) !== range(0, count($arr) - 1);
}

The problem of using range() is that it only works when the keys of the array are continuous integer numbers. That is, the keys of an array has to be {0,1,2,3,4,…..}. If the array’s smallest key is not 0 but 1, like {1,2,3….}, or there is a “hole” in between, like {1,2,4,5,8,…}. They will be treated as associative array.

Whether this method makes sense depends on how you define the term ”sequential array” and “associative array”. Unfortunately I didn’t find any “official” definition about them in PHP’s manual (as they doesn’t matter to PHP). My personal definition is that when there exist a non-integer key in an array, the array is not sequential. So for the arrays with keys like  {1,2,100, 1000, 1500}, it is still sequential. With this logic, my method to check associative / sequential arrays is this:


function is_asso($a) {
	foreach(array_keys($a) as $key)
		if (!is_int($key)) return TRUE;
	return FALSE;
}


This function will check every keys in the array until it find one that is not an integer, then it quit the loop and tell it is associative array. This function can handle sequential arrays with any starting index, and any hole in between.

PHP – isset() vs array_key_exists() : a better way to determine array element’s existence

The story

In the CourseYou project, we’re asked to check if an element is set in an array. That is,  we’re asked to determine whether $Arr[‘MyElement’] exists.

So we use the following code as a start.

<?php 
if (isset($Arr['MyElement'])) { 
     ... do my stuff ... 
} ?> 

This code works fine, but, it works fine for most of cases only. In some other cases (and it’s quite often actually), using this code  to check the existence of an array element can be very DANGEROUS.

What’s wrong with isset()?

Perhaps isset() is one of the most frequently used function that do a very frequent task: determine if a variable has been set. It is simple, and more importantly is FAST, is very FAST. However, the returned result of isset() can be misleading sometimes.

According to the PHP’s manual: isset() — Determine if a variable is set AND is not NULL

So the case that the isset() cause you danger is: the element does exist in the array but it is set NULL. i.e. $Arr[‘MyElemenet’] =NULL; In this case, isset() always return FALSE.  Professional programmers should be aware of this.

The right solution: array_key_exists()

The right way to check  if an element exists in an array is to use array_key_exists(). The array_key_exists() will tell if the given key or index has been “created” in the array regardless the value of the element. So to tell if elements ‘MyElement’ exists in the array $Arr, we should use this:

<?php if (array_key_exists('MyElement', $Arr)) { ... do my stuff ... } ?> 

Why array_key_exists() still sucks?

However, array_key_exits() still sucks. Yes, it’s more reliable than isset(), but it’s SLOW.  We benchmarked the array_key_exists() and isset() methods as shown below and find that array_key_exists() is almost 5 times slower than isset().

To take the speed advantage of isset() while keeping the reliable result from array_key_exists(), we combined the both: Usually an element being set NULL is a rare case, so in most of the time, isset() is still reliable. When isset() fails, we should do an additional checking by array_key_exists() to double confirm that the key really doesn’t exist. It turns out that the below code works the best:

<?php 
if (isset($Arr['MyElement']) || array_key_exists('MyElement', $Arr)) { 
      ... do my stuff ... 
} ?>


The beauty of PHP (also many other modem languages) is that it doesn’t require the whole conditional statement being fully parsed. So the PHP engine actually only evaluate the result of isset(). if isset() returns FALSE, it then evaluate array_key_exists(). If isset() returns TRUE, array_key_exists() is never evaluated. That’s saying the sequence of the two conditions cannot be reversed.

Benchmarking

We did a simple benchmarking base on the isset(), array_key_exists() and the combined method, and the result of the combined method is very promising.

<?php 
$a = array('a'=>1,'b'=>2,'c'=>3,'d'=>4, 'e'=>null); 
$s = microtime(true); 
for($i=0; $i<=100000; $i++) { 
     $t= array_key_exists('a', $a); //true 
     $t= array_key_exists('f', $a); //false
     $t= array_key_exists('e', $a); //true 
} 

$e = microtime(true); 
echo 'array_key_exists : ', ($e-$s); 

$s = microtime(true); 
for($i=0; $i<=100000; $i++) { 
     $t = isset($a['a']); //true 
     $t = isset($a['f']); //false
     $t = isset($a['e']); //false 
} 

$e = microtime(true); 
echo 'is_set : ' , ($e-$s); 

$s = microtime(true); 
for($i=0; $i<=100000; $i++) { 
     $t= (isset($a['a']) || array_key_exists('a', $a)); //true 
     $t= (isset($a['f']) || array_key_exists('f', $a)); //false
     $t= (isset($a['e']) || array_key_exists('e', $a)); //true 
} 

$e = microtime(true); 
echo 'isset() + array_key_exists : ', ($e-$s); 
?> 

The benchmarking result (average):

  • array_key_exists() : 308 ms
  • is_set() : 4.7ms
  • isset() + array_key_exists() :217ms

Latest Update: I have packaged this method to a single function, and added the checking of element existence in multiple-dimension arrays. Please check my another post: A complete element existence checking function for PHP.

Equal (==), identical (===) and array comparison in PHP

Equal (==)

If you use equal (==), you are allowing type conversion which means PHP will try to convert the two sides into the same type and then do the comparison. So even if the two sides are NOT the same thing, they MAY still be treat as the SAME.

Consider this code:

<?php 
$left = "C"; 
$right = 0; 
var_dump($left == $right); 
?> 

Output:

bool(true)

"C" equals to 0 ?? The logic behind is : $left is a String of "C", since it is compared to $right which is a number, PHP will first convert the String "C" to a number by parsing "C" as a numeric value which is unfortunately 0, then this 0 is compares to $right which is 0, so although strange the comparison result is logically "true".

Identical (===)

On the contrary, when identical (===) is used in the comparison, PHP will not do any type conversion. PHP firstly check if the both side is of the same type. If not, then just return false. If they are of the same type, it then compare the values to see if they are the same. So it should be no wonder that the output of the below codes is "false":

<?php 
$left = "5"; 
$right = 5; 
var_dump($left === $right); 
?> 

Output:

bool(false)

What if they are Arrays?

Consider this code:

<?php 
$a = array('a'=>1, 'b'=>2, 'c'=>3);                 //reference array 
$b = array('a'=>1, 'b'=>2, 'c'=>3);                //equal and identical 
$c = array('a'=>1, 'b'=>2);                                //one element less 
$d = array('a'=>1, 'b'=>100, 'c'=>3);          //one element has different value 
$e = array('a'=>1, 'c'=>3, 'b'=>2);               //same key-value pairs but different sequence 
echo '$a == $b is ', var_dump($a ==$b); 
echo '$a === $b is ', var_dump($a === $b); 
echo '$a == $c is ', var_dump($a ==$c); 
echo '$a === $c is ', var_dump($a === $c); 
echo '$a == $d is ', var_dump($a ==$d); 
echo '$a === $d is ', var_dump($a === $d); 
echo '$a == $e is', var_dump($a ==$e); 
echo '$a === $e is', var_dump($a === $e); 
?> 

Output:

$a == $b is bool(true) 
$a === $b is bool(true) 
$a == $c is bool(false) 
$a === $c is bool(false) 
$a == $d is bool(false) 
$a === $d is bool(false) 
$a == $e is bool(true) 
$a === $e is bool(false) 

So we conclude that:

  • When two arrays are same in each key/value pair, and they have the same amount of elements, and the elements are in the same sequence, they are equal (==) and identical (===),
  • If one array has less elements than another one, they are neither equal (==) nor identical (===).
  • If one of the elements in an array has different value, the two arrays are neither equal (==) nor identical (===)
  • If two arrays have the same element, but different sequence, they are equal (==) but NOT identical (===).

Reference:

  1. Type conversion during comparison in PHP (they call it type juggling): http://php.net/manual/en/language.types.type-juggling.php
  2. Type comparisons in PHP: http://php.net/manual/en/types.comparisons.php