Attempt to fix animation perf regression introduced by #4128
Parallel.ForEach(IList<T>) uses static range partitioning which may cause load imbalance on hybrid P/E-core CPUs. Use Partitioner.Create(list, loadBalance: true) instead to restore dynamic chunk stealing that aws already used when using Parallel.ForEach(IEnumerable<T>)
* Move AnimDecodeCache lock managed to native
Should prevent crash if cache is accessed concurrently by native and managed at the same time.
* Also pass lock thrugh CalcLocalHierarchyAnimation
* Make sure nested bone merge hierarchies update correctly
* AddPoseOperation skip work when added transform is equal to identity
* SubtractPoseOperation skip work when added transform is equal to identity
* Fast path for BlendUpdateNode when weights are 1
Don't emit expensive pose op just forward the pose with weight 1
* Fast path for Blend2DUpdateNode when weights are 1
Don't emit expensive pose op just forward the pose with weight 1
* Avoid allocation of Actions in MergeDescendants