Introduction
Teradata is an integrated platform that provides functionality to store, access, and analyze organizational data on the Cloud as well as On-Premise infrastructure. Teradata Database provides an information repository system, along with support for various tools and utilities, making it a complete and active relational database management system.
Teradata is based on a parallel Data Warehouse with shared-nothing architecture.
- Data is stored in a row-based format.
- It supports a hybrid storage model in which frequently accessed data is stored in SSD whereas rarely accessed data is stored on HDD.
- The platform supports a Table partitioning feature and enforces Primary and Secondary Indexes.
- The Data Warehouse can scale up to 2048 nodes, thus offering data storage ability up to 94 petabytes.
- The Data Model is designed to be fault-tolerant and be scalable with redundant network connectivity to ensure reliability for critical use case
Teradata Storage Approach and Challenges
The sizes of the Teradata systems managing relational databases range up to 2,048 CPUs and 2,048 GB of memory supporting 128 TB or larger databases. The BYNET interconnect supports up to 512 nodes.
However, the major challenges with Teradata are:
- Huge data warehouse cost
- Not being an agile cloud data warehouse
Teradata is on the higher end of the pricing spectrum, and so capacity management is its biggest challenge. And when no solution is presented to optimize storage, customers decide to move away from Teradata to other alternatives. Managing storage capacity effectively is the only solution to overcome this challenge
Optimization of Teradata Storage
While working with Teradata, the following database sizing considerations must be taken into account:
- Capacity planning of system and data disks
- Data disk space allocation
- Determination of usable data space
Capacity Planning of System and Data Disks
- Warm and hot data typically constitute what is often called the operational data store. One of the Teradata data warehouse’s features is its hybrid storage, where frequently used hot data is stored on very high-performance solid state drives while less frequently used cold data is placed on traditional hard disk drives. The placement and migration of data based on data temperature is fully automatic and ongoing with the Teradata Virtual Storage feature. By paying attention to data temperature, Teradata can deliver higher query throughput and more consistent response times.
- Data compression includes both value-compression and algorithmic compression of data. When describing the compression of hash and join indexes, compression generally refers to row compression. While algorithmic compression can be either lossless or lossy, depending on the algorithm used. Compression has two principal applications: Reducing storage costs and Enhancing system performance.
Data Disk Space Allocation
- Disk or spool space is allocated based on system needs and tends to run out quickly unless proactively planned and reserved. Some guidelines for allocating disk space optimally include:
- Create a special database to act as a spool space reservoir.
- Allocate 2% of the total user space in the system for this database.
- Assign roughly 0.25% of the total space to each user as an upper limit, ensuring that each receives at least as much space as the size of the largest table they access concurrently.
- Apart from the actual allocation of the disk space, there are other operational considerations that can optimize disk allocation:
- Limit query size – the smaller the query, the less spool space required. If a particular user only performs small queries, then allocate less spool to that user. If a user performs many large queries, then allocate more spool to that user.
- Optimize database size – the more AMPs in the configuration, the more thinly spread the data, so the more spool required per AMP.
- Optimize the other factors such as average spool use per user, number of concurrent users, number of concurrent queries permitted for any one user.
- Plan for system and table space. Consider database sizing issues such as allocating permanent space and estimating database size requirements
Determination of Usable Data Space
This looks into system and hardware-oriented considerations undertaken for capacity planning.
Here are some recommended tasks in this regard:
- Use Teradata Viewpoint to monitor and manage the workload.
- Collect user resource usage detail data. Heavy resource consumers over time, skewed work, and application usage trends can be identified.
- Collect ResUsage data. The ResUsage tables report on the aggregate effect of all user requests on various system components over time, and can identify bottlenecks and capacity issues. See Resource Usage Macros and Tables for details on all ResUsage data and macros.
- Use the Locking Logger utility. Locking logger is essential for identifying locking conflicts.
- Use Database query logging. The Database Query Log records details on queries run, including arrival rates, response times, objects accessed, and SQL statements performed
- Use the Priority Scheduler utility. Priority Scheduler monitor output information shows comparative CPU usage across Priority Scheduler groups. This monitor output should be collected daily, minimally at a 5 or 10-minute interval, then summarized or charted.
- Additionally, use the following best practices:
- Reserve 25% to 35% of total space for spool space and spool growth buffer. When you create user SysAdmin, you can leave the SPOOL parameter unspecified, so it defaults to the maximum allocation of the owning user, user DBC.
- Allow an extra 5% of PERM space in user DBC.
- Each time a new user or database is created, specify the maximum amount of spool space that a query submitted by that user can consume
Conclusion
Teradata is a complete and active relational database management system with a data repository as well as analytical system. However, Teradata is priced higher than its competitors and that urges customers to consider alternatives and move away from Teradata. One of the ways to combat this issue is to look at ways to optimize and provide better storage options. Managing storage capacity effectively would give Teradata the advantage it needs to retain and increase customer engagement.