Some Weak Idioms Doug Lea SUNY Oswego [email protected] 1 Intro Want good performance for core libraries and runtime systems Internally use some common non-SC-looking idioms Most can be seen as manual “optimizations” that have no impact on user-level consistency But leaks can show up as API usage rules Example: cannot fork a task more than once Example: Publication and transfer (most of this talk) Generalities Some java.util.concurrent code Challenges Cataloging cases; establishing semantics First-class language support 2 Publication and Transfers Class X { int field; X(int f) { field = f; } } For shared var v (other vars thread-local): P: p.field = e; v = p; C: c = v; f = c.field; Use weakest protocol that ensures that C:f is usable, considering: “Usable” can be algorithm- and API-dependent Is write to v final? including: Write Once (null → x), Consume Once (x → null) Is write to x.field final? Is there a unique uninitialized value for field Are reads validated? Consistency with reads/writes of other shared vars Weaker protocols avoid more cache invalidation 3 Avoiding Invalidation on Writes Avoiding the most expensive per-access cache invalidation: storeFence; v = x; storeLoadFence Static single or final Write The single thread issuing final write is structurally determined Example: storeFence; v = x; Dynamic single or final Write Ensuring one writer requires distinguished value Example: [storeFence] CAS(&v, null, x) Validated (including “double-checked”) Don't fence write if reads validate with CAS Example: if (v == null) { … if (CAS(&v, null, x) … } Dependent Don't fence var if accesses nested under another Example: lock; v = x; unlock; 4 ForkJoinTasks class SortTask extends RecursiveAction { final long[] array; final int lo; final int hi; Stealing SortTask(long[] array, int lo, int hi) { this.array = array; this.lo = lo; this.hi = hi; } protected void compute() { if (hi - lo < THRESHOLD) sequentiallySort(array, lo, hi); Base else { int m = (lo + hi) >>> 1; SortTask r = new SortTask(array, m, hi); r.fork(); new SortTask(array, lo, m).compute(); r.join(); merge(array, lo, hi); } } // … Pushing Deque Top Popping } 5 Transferring Tasks Queues perform a form of ownership transfer Push: make task available for stealing or popping needs lightweight store-fence Pop, steal: make task unavailable to others, then run Needs CAS with at least acquire-mode fence Java doesn't provide source-level map to efficient forms So implementation uses JVM intrinsics T1: push(w) -w.state = 17; slot = q; Queue slot publish T2: steal() -w = slot; if (CAS(slot, w, null)) s = w.state; ... consume Task w Int state; Require: s == 17 6 Task Deque Algorithms Deque operations (esp push, pop) must be very fast/simple Competitive with procedure call stack push/pop Current algorithm requires one atomic op per push+{pop/steal} This is minimal unless allow duplicate execs or arbitrary postponement (See Maged Michael et al PPoPP 09) Less than 5X cost for empty fork+join vs empty method calls Uses (resizable, circular) array with base and sp indices Essentially (omitting emptiness, bounds checks, masking etc): Push(t): s = sp++; storeFence; array[s] = t; Pop(t): if (CAS(array[sp-1], t, null)) --sp; Steal(t): if (CAS(array[base], t, null)) ++base; NOT strictly non-blocking but probabilistically so A stalled ++base precludes other steals But if so, stealers try elsewhere (use randomized selection) 7 A variant of classic array push: q[sp++] = t (and not much slower) Sample code Non-public method of ForkJoinWorkerThread void pushTask(ForkJoinTask<?> t) { ForkJoinTask<?>[] q = queue; Per-thread arrayint mask = q.length - 1; based queue with inc before slot write OK power of 2 length int s = sp++; orderedPut(q, s & mask, t); Publish via JVM intrinsic if ((s -= base) == 0) ensuring previous writes pool.signalWork(); commit before slot write (inlined in the actual code) else if (s == mask) growQueue(); Stealers use compareAndSet } Resize if full If queue was empty, wake up others using scalable event queue of this slot from non-null to null to privatize. 8 Improving Language Support Poor Java language support for special-mode accesses Requires intrinsics operating on addresses (not values) These intrinsics have no formal specs Alternatively, source-level control over fences Not very usable in Java, but still, I use them a lot Ideally language constructs should express intent Programmers already live with non-consistency every day IO, Web, mobile, clusters A historical oddity that languages do not incorporate 9 Consistency Issues are Inescapable Occur in remote multicast and message passing Memory model mapping to distributed platforms expensive Many groups don't need strong consistency But encounter anomalies Example (“IRIW”): x,y multicast Node Node Node Node A: B: C: D: send x; send y; receive x; receive y; receive y; receive x; // set x = 1 // set y = 1 // see x=1, y=0 // see y=1, x=0 Full avoidance as expensive as full MM mapping – atomic multicast, distributed transactions Moreso when must tolerate remote failure Occur in local messaging: Processes, Isolates, ... Usually rely on implicit OS-level consistency model 10 Contention in Shared Data Structures Mostly-Write Most producer-consumer exchanges Especially queues Apply combinations of a small set of ideas Mostly-Read Most Maps & Sets Empirically, 85% Java Map calls read-only Structure to maximize concurrent readability Use non-blocking sync via compareAndSet (CAS) Without locking, readers see legal (ideally, linearizable) values Reduce point-wise contention Often, using immutable copy-on-write internals Arrange that threads help each other make progress Apply write-contention techniques from there 11
© Copyright 2024