As data volume and transactions continue to grow, every organization is concerned about data management and security. In the digital world, data is moving from on-premise to cloud at a faster pace to reduce operational cost and to increase scalability. The biggest concern is around personal data protection. There are regulations that exist for data protection like GDPR, PCI, HIPAA, etc. In addition to these, we need to protect/mask personal data in non-validated environments like training, development and testing. There are multiple tools and technologies available in the market for data masking/anonymization to avoid misuse and to be statutory compliant.
Effective data protection process:
A data protection process encrypts the data and removes personally identifiable information from data sets, so that the actual people whom the data reflects remain masked. Every organization sets its own methods to mask the data, based on the data types and data sources. Below are the best practices to follow for data in transit and data at rest.
When data in transit:
For data which is in movement from point to point either on premise or cloud, the original is replaced with dummy values. This is also called as dynamic data masking so that the end user can’t view the original data. Follow the process below for data in transit:
Figure 1: High level data-masking ETL flow of data in move
When data in rest:
PII data which are in rest either on-premise or cloud which are mostly used for analytics or training purpose, can be protected through proper data access governance process or by masking the data when user sends any requests through Reports or SQLs. Follow the process below when data is in rest:
Figure 2: High level data masking of data in rest
Purushottam Joshi
Senior Architect Data, Analytics & AI, Wipro
Purushottam has over 20+ years of data warehouse and ETL experience. He is currently focused on open source integration technologies and has successfully executed large engagements for global companies. He is a TOGAF certified Enterprise Architect. He is also certified in different database and ETL technologies and supports the practice in managing both, cloud and on premise native ETL tools.