Myth: Because of digital technology and cloud computing, businesses avoid generating documentary trash or waste produced in relation to the storing of information. Organizations get to do away with the piles of trash that include multimedia DVDs or Blu-Rays, invoices, contracts, reports, proposals, budgets, and business correspondence.
In reality, wastage happens even with digital technology. People create various kinds of data waste including unnecessary data that is taking up storage space, unsorted data that could be useful but forgotten (and difficult to locate), duplicate data, and data intended for certain users but are underused or not used at all by the supposed users. These are costly data waste that can be addressed by the following best practices.
Organizations that are in the business of data gathering and analytics should ensure efficiency in the way they store, manage, and discard data. AI or machine learning developers, in particular, need an efficient way to classify and manage data as they constantly collect and analyze a variety of information. There has to be a system that makes it easy to locate, retrieve, and subsequently delete data to free up storage space for more data. The absence of which can lead to storage redundancy, the continued storage of unneeded or unwanted data, and difficulties in locating data.
There are different approaches to handling data, such as data warehousing and the use of data lakes. There are also various data storage, management, and analytics solutions. Examples of which are Druid, ClickHouse, Cassandra, Prometheus, and Elasticsearch. These approaches and solutions present different pros and cons, so it is important to evaluate them meticulously.
In-depth comparisons or guides like this article about Apache Druid vs Clickhouse can be useful in picking the right tools and strategies to implement. Different organizations have different needs, while different data storage and analytics solutions also have varying functions and features. It is important to ascertain that the solution chosen matches the specific requirements of an organization.
ROT refers to data that is redundant, obsolete, and trivial. According to data security firm ManageEngine, at least 30 percent of data in organizations can be considered ROT. This presents a major challenge for data management, as it does not only add unnecessary data storage costs; it also makes it difficult to efficiently find and utilize specific data when they are needed.
All existing data should be examined to determine if they should still be kept or permanently erased. Then, the remaining useful or potentially useful data can be inventoried and classified/cataloged. If it is difficult to ascertain if a specific bunch of data should be deleted, they can be given their own category or storage location that can be easily revisited later on.
Having an efficient data management system, however, is not just about the hardware and software. One crucial component that should be taken into account is the people creating, using, and managing the data in an organization. They need to be properly oriented or trained on the roles they play in eliminating and preventing ROT data.
Accenture says that nearly 80 percent of enterprise data is unstructured. This means that the data being kept has no logical classification. Different kinds of data for different uses are stored in various locations arbitrarily. Some employees may have some form of sorting or organization, but the schemes they employ are inconsistent.
The lack of organization or data storage structure is one of the biggest reasons why some data become redundant and difficult to locate. Redundancy wastes storage space not only on-premises but also in the cloud. When going over collections of files to locate specific data, there is computing power involved and unnecessary time and effort wasted.
To avoid inefficiencies and wastage, it is advisable to set up clear data organization and retention policies from the get-go. It helps to lay out the details as to what data to store, where to store them, how to classify the data, and how long to keep the data in storage. It also helps to make it a policy to add metadata to all files being stored to aid data discovery and evaluation. Having a clear and comprehensive policy on data organization and retention also has the added benefit of facilitating automation and complying with data regulations.
Moreover, it helps to adopt the “single source of truth” concept. This means having a central repository or index of all data in an organization. This ensures that unnecessary duplicate copies are avoided and also makes it easier to find data whenever it is needed and to evaluate the data for retention or deletion.
Some organizations keep data for as long as they can because they are unsure of what laws and regulations require. These regulations include those set by IRS and FTC, ISO standards, industry standards like those in CCPA and PCI-DSS, and internal company policies such as employee record retention requirements and version control schemes.
In the United States, a number of federal and state laws have data retention mandates. The Federal Information Security Management Act (FISMA), for one, obliges contractors and federal agencies to keep their data in storage for at least three years. The National Energy Commission (NERC) requires energy-related entities to retain data for three to six months. The Health Insurance Portability and Accountability Act (HIPAA) imposes a minimum of at least six years of health information archive requirement for health-related entities.
For organizations operating in different parts of the world, it is necessary to become familiar with the different laws and regulations of specific countries. In Switzerland, for example, all business data is mandated to be retained for 10 years after the end of a financial year. Also, the International Regulatory Framework for Banks (Basel III) requires banks to maintain a data history of three to seven years.
Data storage waste is not limited to digital costs. It can also have an offline impact. According to a Sound Advice for a Green Earth Q&A, 0.2 tons of carbon dioxide is generated every year for every 100GB of data stored in the cloud. This means that unnecessarily saving data on the cloud translates to emissions that could have been avoided.
Just like other forms of waste, data storage waste is avoidable or at least reducible. Ensuring efficient data storage and following best practices can significantly curb unwanted data storage waste, including its corresponding effects offline.