Quote

Engineering

The great liability of the engineer compared to men of other professions is that his works are out in the open where all can see them. His acts, step by step, are in hard substance. He cannot bury his mistakes in the grave like the doctors. He cannot argue them into thin air or blame the judge like the lawyers. He cannot, like the architects, cover his failures with trees and vines. He cannot, like the politicians, screen his shortcomings by blaming his opponents and hope that the people will forget. The engineer simply cannot deny that he did it. If his works do not work, he is damned. That is the phantasmagoria that haunts his nights and dogs his days. He comes from the job at the end of the day resolved to calculate it again. He wakes in the night in a cold sweat and puts something on paper that looks silly in the morning. All day he shivers at the thought of the bugs which will inevitably appear to jolt its smooth consummation.

On the other hand, unlike the doctor his is not a life among the weak. Unlike the soldier, destruction is not his purpose. Unlike the lawyer, quarrels are not his daily bread. To the engineer falls the job of clothing the bare bones of science with life, comfort, and hope. No doubt as years go by people forget which engineer did it, even if they ever knew. Or some politician puts his name on it. Or they credit it to some promoter who used other people’s money with which to finance it. But the engineer himself looks back at the unending stream of goodness which flows from his successes with satisfactions that few professions may know. And the verdict of his fellow professionals is all the accolades he wants.

-Herbert Hoover

Quote

Science

Jessica Flack: I believe that science sits at the intersection of these three things — the data, the discussions and the math. It is that triangulation — that’s what science is. And true understanding, if there is such a thing, comes only when we can do the translation between these three ways of representing the world.

Aside

Microsoft

It’s hard to believe that Microsoft is 41 years old. In that time, its had its ups (think Windows XP with around one billion sales) and its downs (think Windows ME, which lasted for less than 18 months). But one thing that’s clear is Microsoft has cleverly re-invented itself, re-booted and disrupted its own business in a massive way. Some would argue Microsoft is now “cool” again.

Quote

Relational Databases

We were hiring developers for a large, complicated application. I happened to mention to one of the more promising candidates that the application was fairly write-heavy and we might experience some performance concerns later. This developer, with years of experience and skills across multiple technologies that might have been of use to us, replied “you’re going to either need to shard your database or switch to NoSQL.”

That was enough to guarantee they didn’t get the job.

Before relational databases, all databases were NoSQL. That’s why we have relational databases.

Great read on why database design is so important

Quote

25 Things I Wish You Knew About Databases

  1. Oracle will do what you tell it to do. I find that PEBCAK is often the root cause for many issues.
  2. DBAs get paid for performance, but we keep our jobs with recovery.
  3. HA <> DR. If you can’t recover, you can’t keep your job. See previous item.
  4. Memory, CPU, disk, and network are all finite resources. Leave room for growth.
  5. 95% of all workloads will run just fine on modest hardware. Don’t listen to fools that architect crazy solutions for edge cases that won’t happen.
  6. Backups. You need them. Store them someplace safe, and on a different server. See number 2.
  7. Maintenance is mandatory. Find, or make a maintenance window.
  8. But you can always blame the network anyway 🙂
  9. Know the RTO and RPO for your applications. See number 2.
  10. Focus on wait events and logical I/O when performance tuning. They help you find the root cause the fastest.
  11. The only way to know your backup succeeded is to test by doing a restore. See number 2.
  12. Build a recovery strategy BEFORE you build a backup strategy. See number 2.
  13. Baseline for performance. Without baselines and metrics you have no idea if something is truly a problem or not.
  14. Keep your transactions short.
  15. Triggers are awful, awful little creatures.
  16. But NULLs are far worse.
  17. If your DBA can’t work a command line, don’t let them touch your data.
  18. Great database performance starts with great database design.
  19. Deadlocks are often the result of application logic and data access patterns. The engine doesn’t just get “tired” and start deadlocking.
  20. Testing against 10, 100, and 1000 rows is not an accurate test against a production workload.
  21. Application code is responsible for 100% of all performance issues. #hardtruth
  22. Keep as many of your servers configured in the exact same way. This saves time troubleshooting.
  23. Data lasts longer than code. Treat it right.
  24. Don’t install services (SSRS, SSIS, SSAS) onto a server “just in case”. Only install the services that are needed.
  25. If your vendor requires the use of the ‘sa’ account, go find another vendor.
Aside

Genesis of a DBA Universe

In the beginning was the disk array, and all was empty and raw, and Unix moved over the face of the platters. And the DBA said: Let there be Oracle. And there was Oracle. And the environmental variables were set and the disks were striped and mirrored and the OFA was established, and behold spindle was rent asunder from spindle. And the DBA saw that all was in spec.

And it was day and it was evening of the first day.

And the DBA said: Let there be scripts. And sql.bsq brought forth myriad crawling things upon the face of the array. And catalog.sql brought forth all manner of tables and views that swim unseen beneath the waters. And catproc.sql brought forth all the built-in programs and all the hosts of the air, that the users might be given wings and take fight over the data.

And it was day and it was evening of the second day.

And the DBA said: Let there be tablespaces. And there were tablespaces. And the network administrator looked upon the disk array and did see what the tablespaces had wrought upon the disk arrays, and he did gnash his teeth and seek a new work upon the Internet with an engine of search.

And it was day and it was evening of the third day.

And the DBA created users. Male and female he created them. And he said unto the users: Thou mayest create tables and views as thou wilt. Yea, though mayest create even indexes upon the data. Only meddle not with the system tablespace, for it is a holy place, and on the day wherein thou treadest upon it, on that day thy create session shall surely be revoked. And the serpent crept among the users and whispered to them, saying: Thine roles shall not be revoked. Taste ye all of the system tablespace, for ye shall know of b-trees and hints and ye shall be as DBAs. And the users heeded the serpent and filled the system tablespace with crap. And the instance did crash and the client did wax wroth at the DBA. And the DBA did gnash his teeth and partake of the fruit of the vine, for behold the users were permanent employees and the DBA was but a contractor and could not revoke their create session.

And it was day and it was evening of the fourth day.

And the DBA did set default tablespaces and temporary tablespaces and did lock down all that was upon the face of the array with roles and profiles and all manner of quotas, yea even from the rollback segments even unto the archived redo logs.

And it was day and it was evening of the fifth day.

And the DBA created synonyms and links and did tune the server and apply patches upon the face of the database.

And it was day and it was evening of the sixth day.

And on the seventh day the DBA did rest from all the labors of the creation. And his pager did ring and he ceased from resting and did spend his sabbath on the telephone with Oracle support. And by the time the DBA got through to someone who knew whereof they spake behold it was day and it was evening of the eighth day.

And the DBA waxed wroth.

Deferred Segment Creation

Property Description
Parameter type Boolean
Default value true
Modifiable ALTER SESSION, ALTER SYSTEM
Range of values true | false
Basic No

DEFERRED_SEGMENT_CREATION specifies the semantics of deferred segment creation. The design goal here is to prevent hundreds or thousands of segments being created by a 3rd party application that only uses a few of the tables it creates. If set to true, then segments for tables and their dependent objects (LOBs, indexes) will not be created until the first row is inserted into the table. Before creating a set of tables (or installing an application), if it is known that a significant number of them will not be populated, then consider setting this parameter to true. This saves disk space and minimises install time. This feature was first introduced in 11g.

Deferred segment creation allows tables and indexes to be created without physical segments which can be a double-edged sword as it allows objects to be created even if there is no space available for those objects in the specified tablespace. The data dictionary contains the object information so a query of DBA_TABLES, DBA_INDEXES or any of the %TAB% and %IND% tables and views will return the expected information. Querying DBA_SEGMENTS, however, may fail to return data since an object may not have any segments associated with it.

You should note that you can change this default behaviour (i.e. turn off deferred segment creation) by either:
  • at the database level via your init/spfile
  • at the session level via “alter session set deferred_segment_creation = false;”
  • statement by statement via “create table t ( x int ) segment creation immediate;”

Should you use deferred segment creation? That choice is one both the DBA team and development team need to make. Look at all aspects of the argument then decide accordingly. Remember that the ‘right’ answer is the one that’s suited to your environment.

Approximate Query Processing in Oracle Database (12.2)

The growing market for Big Data has created a lot of interest around approximate query processing (AQP) as a means of achieving interactive response times. In my previous post I provided an explanation of one of the more well-known approximate functions – APPROX_COUNT_DISTINCT. Here was an example of how approximate query processing obtains approximate results with negligible deviation from the exact result while dramatically saving processing resources. In this post I will describe how to use some of the other new features related to approximate query processing.

About Approximate Query Processing

Approximate query processing uses SQL functions to provide real-time responses to
explorative queries where approximations are acceptable. A query containing SQL
functions that return approximate results is referred to as an approximate query. Business  intelligence (BI) applications extensively use aggregate functions, including
analytic functions, to provide answers to common business queries. For some types of
queries, when the data set is extremely large, providing exact answers can be resource
intensive.
images (1)
For example, counting the number of unique customer sessions on a website
or establishing the median house price within each post code across a state. In certain
scenarios, these types of queries may not require exact answers because you are more
interested in approximate trends or patterns, which can then be used to drive further
analysis. Approximate query processing is primarily used in data discovery
applications to return quick answers to explorative queries. Users typically want to
locate interesting data points within large amounts of data and then drill down to
uncover further levels of detail. For explorative queries, quick responses are more
important than exact values.

Oracle provides a set of SQL functions that enable you to obtain approximate results with negligible deviation from the exact result. There are additional approximate functions that support materialized view based summary aggregation strategies. The functions that provide approximate results are as follows:

  • APPROX_COUNT_DISTINCT
  • APPROX_COUNT_DISTINCT_DETAIL
  • APPROX_COUNT_DISTINCT_AGG
  • TO_APPROX_COUNT_DISTINCT
  • APPROX_MEDIAN
  • APPROX_PERCENTILE
  • APPROX_PERCENTILE_DETAIL
  • APPROX_PERCENTILE_AGG
  • TO_APPROX_PERCENTILE
Approximate query processing can be used without any changes to your existing
code. When you set the appropriate initialization parameters, Oracle Database
replaces exact functions in queries with the corresponding SQL functions that return
approximate results.

Running Queries Containing Exact Functions Using SQL Functions that Return Approximate Values

The interesting part is that you can utilize the approximate functions without changing code. There are three initialization parameters introduced to control which functions should be treated as an approximate function during runtime:

The initialization parameters are:

  • approx_for_aggregation
  • approx_for_count_distinct
  • approx_for_percentile

To replace only the COUNT(DISTINCT) function in queries with the APPROX_COUNT_DISTINCT function, set the approx_for_count_distinct initialization parameter to TRUE for the current session or for the entire database.

To replace percentile functions with the corresponding functions that return approximate results, set approx_for_percentile to PERCENTILE_CONT, PERCENTILE_DISC, or ALL (replaces all percentile functions) for the current session or for the entire database.

To run all queries using the corresponding SQL functions that return approximate results instead of the specified SQL functions, set the approx_for_aggregation initialization parameter to TRUE for the current session or for the entire database.

When Should Approximate Query Processing (AQP) Be Used?

If you work in a billing department, stop reading right now 🙂

1. Use AQP when you care about interactive response times
Exploratory data analytics is typically an interactive and iterative process: you form an initial hypothesis (e.g., by visualizing and looking at the data), running some queries, modifying your queries based on the previous answers, and so on until you find a satisfactory explanation. You cannot be productive if you have to wait half an hour every time you run a new query to test a hypothesis.

2. Use AQP when you can make perfect decisions with imperfect answers
Exact results have no advantage over approximate ones if they both lead to the exact same conclusion/decision. This obviously depends on the quality of your approximation and the application logic.

3. Use AQP when your data is incomplete or noisy
Believe it not, most of the data that is collected and used in the real world is extremely noisy. So the idea that processing your entire data gives you 100% accurate answers is usually an illusion anyway. In other words, if my data is noisy to begin with, I might as well take advantage of approximation, which uses a controlled degree of noise in my computation in exchange for significant speed ups. This noise in your data can be because of many reasons – human error, missing values, white noise, data extraction errors, data conversion errors.

4. Use AQP when your goal is to predict something
In general, there are many cases where the output of database queries are consumed by predictive analytics or machine learning algorithms, and in those cases, returning smaller samples of the original data can be a great solution for gaining considerable performance benefits.

Additional Reading: Oracle Database New Features Guide 12c Release 2 (12.2)