Archiving is dead

Solutions & Products
- Solutions & Products
- Cloud Services
  Cloud Services
  
  World-class data management and storage solutions in the biggest public clouds.
  Visit Cloud Services
  
  Solutions
  
  Microsoft Azure
  
  Google Cloud
  
  AWS
  
  IBM Cloud
  
  Products
  
  Azure NetApp Files
  
  Amazon FSx for NetApp ONTAP
  
  Cloud Volumes Service for Google Cloud
  
  Cloud Volumes ONTAP
  
  Compute Optimization
  
  Cloud Sync
  
  Cloud Data Sense
  
  Cloud Tiering
  
  Cloud Backup Service
  
  Cloud Volumes Edge Cache
  
  Global File Cache
  
  Cloud Manager
  
  Astra
  
  Cloud Insights
  
  File Services / File Sharing
  
  MySQL
  
  PostgreSQL
  
  Kubernetes
  
  Quick Links
  
  Cloud Central
  
  Data Fabric
  
  Why NetApp for Cloud Services
  
  Spot by NetApp
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Hybrid Cloud
  Hybrid Cloud
  
  Build your business on the best of cloud and on premises together with Hybrid Cloud Infrastructure solutions.
  Visit Hybrid Cloud
  
  Solutions
  
  Virtualization
  
  Service Provider Infrastructure
  
  IT Automation
  
  Private Clouds
  
  VMware
  
  Red Hat
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for Hybrid Cloud
  
  What is Hybrid Cloud
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Data Storage
  Data Storage
  
  NetApp is the proven leader when it comes to modernizing and simplifying your storage environment.
  Visit Data Storage
  
  Solutions
  
  SAN
  
  Scale-Out NAS
  
  Unstructured Data Solutions
  
  Products
  
  AFF A-Series
  
  AFF C190
  
  E-Series
  
  EF-Series
  
  FAS
  
  FlexPod
  
  SolidFire
  
  StorageGRID
  
  Disk Shelves & Storage Media
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for Data Storage
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Cyber Resilience
  Cyber Resilience
  
  Our industry-leading solutions are built so you can protect and secure your sensitive company data.
  Visit Cyber Resilience
  
  Solutions
  
  Data Protection
  
  Ransomware Protection
  
  Business Continuity / Disaster Recovery
  
  Data Backup and Recovery
  
  Data Compliance
  
  ONTAP Data Security
  
  Products
  
  SnapCenter
  
  Cloud Backup
  
  Quick Links
  
  Data Fabric
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Data Management
  Data Management
  
  Get complete control over your data with simplicity, efficiency, and flexibility.
  Visit Data Management
  
  Solutions
  
  Simplicity365
  
  Products
  
  Active IQ
  
  Element Software
  
  OnCommand Insight
  
  ONTAP Data Management
  
  SANtricity Software
  
  Virtual Infrastructure Management
  
  Quick Links
  
  Data Fabric
  
  Data Management Specialists
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Enterprise Applications
  Enterprise Applications
  
  Speed application development, improve software quality, reduce business risk, and shrink costs.
  Visit Enterprise Applications
  
  Solutions
  
  SAP
  
  Oracle
  
  MS SQL
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for Enterprise Applications
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- DevOps
  Devops
  
  Our solutions remove friction to help maximize developer productivity, reduce time to market, and improve customer satisfaction.
  Visit Devops
  
  Solutions
  
  Configuration Management
  
  Containers
  
  Google Clouds Anthos
  
  Continuous Integration Continuous Delivery
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for DevOps
  
  What is DevOps
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- AI
  AI
  
  NetApp AI solutions remove bottlenecks at the edge, core, and the cloud to enable more efficient data collection.
  Visit AI
  
  Solutions
  
  Big Data Analytics
  
  High Performance Computing
  
  Products
  
  ONTAP AI
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for AI
  
  What is AI
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- VDI
  VDI
  
  Provide a powerful, consistent end-user computer (EUC) experience—regardless of team size, location, complexity.
  Visit VDI
  
  Products
  
  Spot PC
  
  Virtual Desktop Service
  
  Quick Links
  
  Data Fabric
  
  What is VDI
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Services
  Services
  
  We have a service for your every need, plus the ones you’re about to discover.
  Visit Services
  
  Services
  
  Professional Services
  
  Support Services
  
  Quick Links
  
  Data Fabric
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
Support & Training
How to Buy
Community

The ancient history of stubs

A stub is a very small file that effectively points to data that has been moved to a secondary storage location. Stubs played an integral role in file system archiving, but before I can explain why I’m talking about them, I need to explain why they were needed in the first place.

File archiving is part of a broader category known as hierarchical storage management (HSM). HSM has been around for almost as long as data itself, because primary storage media (typically high-speed disk) have a high price point and we human beings (and some applications) are sloppy with data management. We like to create tons of data, but we rarely, if ever, clean up after ourselves and delete useless data. And in some scenarios, we are encouraged or even required to keep everything and actively not delete any data—for example, for compliance reasons.

So file archiving was created as a way to intelligently move primary data to a cheaper, secondary location and replace it with a stub. If users ever need to recall the data, they click the stub and the data is recalled from the archive storage tier. In the very early days, that storage was predominantly tape devices (although I personally know of some instances still running out there today!). The next step was near-line storage such as disk for faster access to and recall of archived data.

When the solution is worse than the problem

Looking back to the days when a stub-based file archiving solution was used to move cold or stale data out to a cheaper tier and free up space on the expensive primary tier, it’s clear that the “solution” created more problems than it solved. The stubs were necessary because they were the only way for users or applications to know that the data was still there and accessible (via recall), but they had many challenges associated with various everyday data activities. Actions like copying and moving stubs would create havoc and either trigger unnecessary recalls or effectively “orphan” the stubs from their archived data when they were copied or moved. Also, daily activities such as backup jobs and virus scanning would appear the same as a user or application recalling the data, unnecessarily filling the disk and creating lots of recall jobs.

On top of all of this, most folks forgot, or didn’t realize, that the archived data was the only copy of the data, because the data remaining on the primary was only stubs. And the archive itself was never protected from accidental deletion or other data loss scenarios, resulting in the worst-case scenario—permanent, irrecoverable data loss. Space management wasn’t really that important for a while, because storage systems offered better performance and higher capacities at lower price points, negating the need for file archiving. That is, until most storage consumption became “pay per use” versus ownership/sunk cost, which is exactly how it is in the cloud.

Performance tiers still remain more expensive than secondary and tertiary retention tiers, so we’re back to a place where we want the best of all worlds: The ability to move cold or stale data out to the most cost-effective tier without the pain of stubs while maintaining a seamless user and application experience, both on premises and in various cloud environments.

Dump the cold data out to where it belongs, without the pain of retrieving it

At NetApp we have been working with and managing data across the different types of storage tiers for many years. Even with the evolution of a hybrid multicloud world, we have the same user experience through an “omnipresence” of the same capabilities. (For more information, check out my blog on the road to data immortality.)

To have the best of both worlds, we need the ability to intelligently and seamlessly automate the process of identifying stale or cold data, and securely and efficiently moving it to a more cost- effective tier. And it’s important not to impact the user experience or performance when it comes to retrieving that data (not necessarily moving or copying it back to the primary). On top of all that, we need to be able to do it on premises and in the cloud hyperscalers, leveraging the appropriate cold storage tier, which today is typically object storage.

This is exactly what NetApp^® Cloud Tiering is designed to do.

Cloud Tiering extends high-performance flash tiers located on premises (or NetApp Cloud Volumes ONTAP^® in the hyperscalers) to the cloud by seamlessly moving cold data to high-capacity, durable, low-cost object storage tiers. And there’s no impact to the front-end applications and users of that data.

With Cloud Tiering, active (hot and warm) data remains on the high-performance tiers to meet the performance needs of the application. Cold, inactive data is automatically identified and tiered off to an object storage platform, freeing up valuable capacity on the on-premises storage array.

This arrangement gives us the best of both worlds without stubs, but I’m going to go even further, because this is ONTAP and we’re not just going to stop at cold data. We also do it for NetApp Snapshot™ copies! This is huge, because most businesses want to retain their Snapshot data for long periods of time without the premium storage consumption. Cloud Tiering can be configured to move cold Snapshot data or cold user data or both, a capability that is unique in the industry.

More importantly, because the tiered data is transparent to the user or application, retrieval is seamless. The data typically comes directly from the cold storage, thus never repopulating the primary, keeping consumption down and performance and user experience up. All while maintaining the cost-saving ONTAP storage efficiencies such as deduplication and compression within the cloud tier, which means that the cost savings are optimized on the object storage tier as well.

This service (like NetApp Cloud Manager) is completely hosted, operated, and maintained by NetApp, so you simply “switch it on” against multiple on-premises and cloud storage instances as needed.

Learn more

To find out how much you can save right now, check out our Cloud Tiering calculator. To get a taste of what I’m talking about, watch the video Introducing NetApp Cloud Tiering in 60 seconds or less.

Share this page

Chris “Gonzo” Gondek

The ancient history of stubs

When the solution is worse than the problem

Dump the cold data out to where it belongs, without the pain of retrieving it

Learn more

Chris “Gonzo” Gondek

Next Steps

Blogs

Community