Allow mixed token-count batches by tracking per-seq indices
and falling back to per-seq recurrent processing when layouts
differ.
Add per-slot conv/delta state access with checkpoint capture,
relax attention layout handling, and reuse projections in mixed
batches to reduce overhead.