Posts Tagged ‘hacker’

Why shouldn’t I store SSNs in databases?

Monday, May 4, 2009 posted by Taz Lake

I get this question a lot more than I would expect.  There are still many misconceptions from clients, students and even developers about what is ok and what is not when storing sensitive data in web applications.  This is particularly problematic for small and medium sized businesses that may not have the resources or expertise to put the appropriate security mechanisms in place.  This is especially true in a business where capturing SSNs are a necessary part of doing business.

Almost all of us have personally identifiable information stored in a database somewhere on the Internet.  Quite commonly this information is stored in the public view in the form of social networking sites like Facebook or Twitter.  However, the real litmus test of data sensitivity for consumers is whether or not the information may be used to compromise the user’s identity.  There are certain security standards in place to help with this such as HIPAA or PCI.

Your web host is PCI compliant.  You’re using Zen Cart, osCommerce, or a COTS e-commerce solution.  Your database is mySQL and you have SSL running to protect the transport.  By all practical measures your e-commerce environment is secure. 

However, if a compromise should occur no one can steal your customer’s identity by simply finding out their name and address.  Anyone can find that easily via the white pages, Google or any number of other mechanisms.  To steal my identity, the attacker would also have to also know something unique about me.  In fact, they may need multiple unique pieces of information to effectively steal my identity (my billing history, my SSN, mother’s maiden name, etc.).  Anyone can get names and addresses from any list provider.
 Credit reports with billing information can be had.  The brokering, and compromise, of SSNs has been around for a while… maybe you’ve heard of the ChoicePoint debacle?

The latter is the worst, because if a database is compromised, and SSNs get out in the open, they are very difficult to change.  If a piece of data, like a credit card, is compromised then the problem can be contained.  Simply change the number, reverse the charges and open a criminal investigation.  If an SSN is compromised, it cannot be changed easily and may be utilized until the criminal is caught.  The criminal may also sell this data to others. 

The risk to any company collecting data is enormous, but even more so when collecting SSN data.  The question of how to shift this risk is answered by the process used for collection and whether or not the data is stored.  There are ways to protect the SSN using well known techniques like AES encryption.  These are built into some databases or can be coded fairly easily. 

Unfortunately to decrypt the data, to view the clear text after initial encryption, requires a developer to use encryption methods which allow for the data to be decrypted.  This mechanism can also be attacked by compromising a password for the administrative interface where the SSN is viewed.  If an authorized human can view it, a hacker could view it also.

For something with a well known pattern, like an SSN, it is also possible to do a brute force or dictionary attack to compromise the SSN if the encryption algorithm is known or can be guessed. 

Let’s say a hacker, we’ll call him Bob, compromises your database and gains access.  However, you have encrypted the data using a built-in algorithm.  Good for you, the clear text identifying data is not in the open… yet.  However, Bob knows there are a few limited mechanisms used for encryption (AES for example).  Bob also knows there are a limited number of numeric combinations for SSNs.  So, Bob can write a program to run through all possibilities, encrypt them with various algorithms, and then match the encrypted string in your database.  By matching these, Bob knows what the SSN is because he knows the starting clear text.  Because the information is stored in the same DB Bob also has matching names and address data.  Obviously, if it is stored in clear text, Bob has a much easier time.

So, how do we defeat Bob the hacker?  Here are some suggestions and more are welcome:

  •  Use an API from a credit reporting agency.  This shifts the risks to the credit reporting agency because the SSNs are never stored.  You can still do a credit check based on information entered by the user.
  • Add “salt” to the original SSN string if you are storing an encrypted version.  This can be done at time of encryption and means the dictionary attack won’t work in a feasible amount of time with a strong salt value.
  •  Set up a separate encrypted database for SSNs.  This keeps the data separate from your main e-commerce system and allows additional security measures to be put in place.

The good news is that hackers don’t often waste their time on small and medium sized companies, unless they are small to medium sized hackers.  The prize is simply not big enough.  Unfortunately, automated scripts can help hackers find vulnerabilities to exploit, which includes your web site.  Follow some of the suggestions above and you can feel more comfortable conducting business online where SSNs are required as part of the transaction.