Data storage is one of the most important pieces of software engineering.
Data needs to be stored for rapid access, but also securely and durably.
Some of the database options we use include row-based relational databases, column-based relational databases or data warehouses, nosql document databases, and data blob stores.
The correct storage solution for your project will depend on your specific needs and goals.
Data needs to be available for immediate use.
There are many ways to cache data, including configuring proper database indices, using external network caches such as memcache or redis, pre-computing data structures for later use, or preparing in-memory stores to ensure the next time that information is needed it's retrieval duration is negligible.
Intelligent caching is what separates applications that scale from those that don't.
Creating programs to process stored information has many use cases.
Whether it be calculating analytics to be delivered in a PDF report, cleaning low quality events out of a large dataset and saving the result, or pre-computing a complex data structure to reduce the cost of information retrieval in other programs by several orders of magnitude.
These applications can increase performance, and reduce expenses for infrastructure and server provisioning, by allowing applications to achieve desired performance on less expensive hardware.
Data loss is an unacceptable outcome.
Data needs to be securely backed up so that in the event of node failure, or even a complete data center outage, your data is still available and ready for use.
When working with large datasets, it quickly becomes obvious that one of the major performance bottlenecks in processing that data is moving it around.
Whether it be over the network, or from the disk and into memory, the smaller the data is the faster it can be processed.
Using standard compression libraries such as GZIP and LZ4 is very useful, but when the time is right additional options exist for even further compaction through binary serialization.
With this technique we use bitwise operations to extract the most value we can out of each bit. Instead of using hundreds of bits to represent a value, it may be possible to use less than 10.
This can reduce the size of a dataset by several orders of magnitude and has an incredible performance benefit. Let's discuss if this is right for your business.
Data security is important, and part of that is ensuring that wherever your data is, it's safe from those that shouldn't have it.
That's why, when the project calls for it, ensuring that data is encrypted in transit, at rest and in backups is critical.
Any company dealing with sensitive information will wish to have this functionality, an example would be companies storing personally identifiable information, or PII, such as those storing health data and adhering to HIPPA compliance regulation.
Let's make sure that your data is completely unreadable, until an entity with the proper access credentials requests read access.