TPC-DS Query 6
- Pull Request
- Play with the query and results using the Fiddle.
Query 6 looks at how many store purchases occurred during the month of January 2001, where the item cost more than 20% of the average purchase price in that item’s category. Those purchases are then grouped by the home state of the customer making the purchase, and any state where the count is less than 10 is filtered out.
Implementing the query did not require any changes to the Model. I wrote an ad-hoc source, which is used to calculate the average item price of each category, then an in-line join to the all_sales
source in L13-15 below:
It’s worth understanding what L13-15 are doing exactly. all_sales_with_category_avg
is the name of a new ad-hoc source I’m defining in the query. The is
keyword is equivalent to aliasing or assignment. all_sales { ... }
means the new ad-hoc source is based on the all_sales
source, and the curly braces contain whatever extensions I need to make. Finally the -> { ... }
on L15 represents the start of a new query that uses all_sales_with_category_avg
as the source. From there, it’s a simple GROUP BY
with some filters.
Update
Based on some feedback from Lloyd in the Malloy Slack, I refactored Q6 to no longer need to ad-hoc source + join. I can instead use the all
function to create a “level-of-detail” aggregation that can be used to calculate the per-category average item price. This is much more concise. The all
function is really powerful for situations like this, and not something I’m used to reaching for by default, but clearly something that should be in everyone’s toolbelt. Thanks for the tip Lloyd!
Thanks for tuning in! As always, any feedback, comments, or questions are welcome. Drop me a line on the Malloy Slack, username @carl.