Common Analysis Services Design Mistakes and How to Avoid Them Chris Webb www.crossjoin.co.uk

Common Analysis Services Design
Mistakes and How to Avoid Them
Chris Webb
www.crossjoin.co.uk
Who Am I?
• Chris Webb
[email protected]
• Independent Analysis Services and MDX
consultant and trainer
• SQL Server MVP
• Blogger: http://cwebbbi.spaces.live.com
Agenda
•
•
•
•
•
•
•
•
•
Why good cube design is a Good Thing
Using built-in best practices in BIDS
ETL in your DSV
User-unfriendly names
Unnecessary attributes
Parent/child pain
One cube or many?
Over-reliance on MDX
Unused and/or unprocessed aggregations
Why Good Design is Important!
• As if you needed reasons…?
• Good design =
good performance
=
faster initial development
=
easy further development
=
simple maintenance
• This is not an exhaustive list, but a selection of
design problems and mistakes I’ve seen on
consultancy engagements
Best Practices in BIDS
• Don’t ignore the blue squiggly lines in BIDS!
– They sometimes make useful recommendations
about what you’re doing
• Actively dismissing them, with comments, is a
useful addition to documentation
• As always, official ‘best practices’ aren’t
always best practices in all situations
Common Design Mistakes
• Three questions need to be asked:
– What’s the problem?
– What bad things will happen as a result?
– What can I do to fix it (especially after I’ve gone
into production)?
• This is not a name-and-shame session!
Problem: ETL in your DSV
• It’s very likely, when you are working in SSAS, that
you need changes to the underlying relational
structures and data
– Eg you need a new column in a table
• You then have two options:
– Go back to the relational database and/or ETL and
make the change
– Hack something together in the DSV using named
queries and named calculations
• The DSV is the easy option, but…
Consequences: ETL in your DSV
• It could slow down processing performance
– No way to influence the SQL that SSAS generates
– Expensive calculations/joins are better done once
then persisted in the warehouse; you may need
to process more than once
• It makes maintenance much harder
– DSV UI is not great for writing SQL
– Your DBA or warehouse developer certainly won’t
be looking at it
Fix: ETL in your DSV
• Bite the bullet and either:
– Do the necessary work in the underlying tables or
ETL packages
– Create a layer of views instead of using named
queries and calculations
• Use the Replace Table With option to point
the table in the DSV at your new view/table
• No impact on the rest of the cube!
Problem: Unfriendly Names
• Cubes, dimensions and hierarchies need to
have user-friendly names
• However names are often user-unfriendly
– Unchanged from what the wizard suggests, or
– Use some kind of database naming convention
• Designing a cube is like designing a UI
• Who wants a dimension called something like
“Dim Product”….?
Consequences: Unfriendly Names
• Unfriendly names put users off using the cube
– These are the names that users will see in their
reports, so they must be ‘report ready’
– Users need to understand what they’re selecting
• Also encourage users to export data out of
cube to ‘fix’ the names
– And so you end up with stale data, multiple
versions of the truth etc etc etc
Fix: Unfriendly Names
• You can rename objects easily, but:
– This can break calculations on the cube
– It can also break existing queries and reports,
which will need rewriting/rebuilding
– IDs will not change, which makes working with
XMLA confusing
• You should agree the naming of objects with
end users before you build them!
Problem: Unnecessary Attributes
• Wizards often generate attributes on
dimensions that users don’t want or need
• Classic example is an attribute built from a
surrogate key column
– Who wants to show a surrogate key in a report?
Consequences: Unnecessary Attributes
• The more attributes you have:
– The more cluttered and less useable your UI
– The slower your dimension processing
– The harder it is to come up with an effective
aggregation design
Fix: Unnecessary Attributes
• Delete any attributes that your users will
never use
• Merge attributes based on key and name
columns into a single attribute
• Set AttributeHierarchyEnabled to false for
‘property’ attributes like email addresses
• Remember that deleting attributes that are
used in reports or calculations can cause more
problems
Problem: Parent Child Hierarchies
• Parent Child hierarchies are the only way to
model hierarchies where you don’t know the
number of levels in advance
• They are also very flexible, leading some
people to use them more often than they
should
Consequences: Parent Child
• Parent Child hierarchies can lead to slow
query performance
– No aggregations can be built at levels inside the
hierarchy
– Slow anyway
• They can also be a nightmare for
– Scoping advanced MDX calculations
– Dimension security
Fix: Parent Child
• If you know, or can assume, the maximum depth
of your hierarchy, there’s an alternative
• Normal user hierarchies can be made ‘Ragged’
with the HideMemberIf property
– Hides members if their parent has no name, or the
same name as them
• Still has performance issues, but less than
parent/child
• You can use the BIDS Helper “parent/child
naturaliser” to convert the underlying relational
table to a level-based structure
Problem: One Cube or Many?
• When you have multiple fact tables do you
create:
– One cube with multiple measure groups?
– Multiple cubes with one measure group?
• Each has its own pros and cons that need to
be understood
Consequences: One Cube
• Monster cubes containing everything can be
intimidating and confusing for users
• Also tricky to develop, maintain and test
– Often changing one thing breaks another
– Making changes may take the whole cube offline
• Securing individual measure groups is a pain
• If there are few common dimensions between
measure groups and many calculations, query
performance can suffer
Consequences: Multiple Cubes
• If you need to analyse data from many cubes
in one query, options are very limited
• A single cube is the only way to go if you do
need to do this
• Even if you don’t think you need to do it now,
you probably will do in the future!
Fix: One Cube to Multiple
• If you have Enterprise Edition, Perspectives
can help overcome usability issues
• Linked measure groups/dimensions can also
be used to split out more cubes for security
purposes
• If you have one cube, you probably don’t want
to split it up though
Fix: Multiple Cubes to One
• Start again from scratch!
• LookUpCube() is really bad for performance
• Linked measure groups and dimensions have
their own problems:
– Duplicate MDX code
– Structural changes require linked dimensions to
be deleted and recreated
Problem: Over-reliance on MDX
• As with the DSV, it can be tempting to use
MDX calculations instead of making structural
changes to cubes and dimensions
• A simple example is to create a ‘grouping’
calculated member instead of creating a new
attribute
• Other examples include pivoting measures
into a dimension, or doing m2m in MDX
Consequences: Over-reliance on MDX
• MDX should always be your last resort:
• Pure MDX calculations are always going to be
the slowest option for query performance
• They are also the least-easily maintainable
part of a cube
• The more complex calculations you have, the
more difficult it is to make other calculations
work
Fix: Over-reliance on MDX
• Redesigning your cube is a radical option but
can pay big dividends in terms of performance
• Risks breaking existing reports and queries but
your users may be ok with this to get more
speed
Problem: Unused Aggregations
• Aggregations are the most important SSAS
feature for performance
• Most people know they need to build some
and run the Aggregation Design Wizard…
• …but don’t know whether they’re being used
or not
Consequences: Unused Aggregations
• Slow queries!
• If you haven’t built the right aggregations,
then your queries won’t get any performance
benefit
• You’ll waste time processing these
aggregations, and waste disk space storing
them
Fix: Unused Aggregations
• Design some aggregations!
• Rerun the Aggregation Design Wizard and set
the Aggregation Usage property appropriately
• Perform Usage-Based Optimisation
• Design aggregations manually for queries that
are still slow and could benefit from
aggregations
Problem: Unprocessed Aggregations
• Even if you’ve designed aggregations that are
useful for your queries, you need to ensure
they’re processed
• Running a Process Update on a dimension will
drop all Flexible aggregations
Consequences: Unprocessed
Aggregations
• Slow queries! (Again)
Fix: Unprocessed Aggregations
• Run a Process Default or a Process Index on
your cube after you have run a Process Update
on any dimensions
• Note that this will result in:
– Longer processing times overall
– More disk space used
• But it will at least mean that your queries run
faster
Thanks!
Coming up…
P/X001
The Developer Side of the Microsoft Business Intelligence stack
Sascha Lorenz
P/L001
Understanding SARGability (to make your queries run faster)
Rob Farley
P/L002
Notes from the field: High Performance storage for SQL Server
Justin Langford
P/L005
Service Broker: Message in a bottle
Klaus Aschenbrenner
P/T007
Save the Pies for Lunch - Data Visualisation Techniques with SSRS 2008
Tim Kent
#SQLBITS