objectEx01extendsApp{ defindexes(s: String): mutable.HashMap[Char,SortedSet[Int]] = { var indexMap = new mutable.HashMap[Char, SortedSet[Int]]() var i = 0 s.toCharArray.foreach{ c => indexMap.get(c) match { caseSome(result) => indexMap(c) = result + i caseNone => indexMap += (c -> SortedSet{i}) } i += 1 } indexMap }
objectEx02extendsApp{ defindexes(s: String): mutable.HashMap[Char,ListBuffer[Int]] = { var indexMap = new mutable.HashMap[Char, ListBuffer[Int]]() var i = 0 s.toCharArray.foreach{ c => indexMap.get(c) match { caseSome(result) => result += i caseNone => indexMap += (c -> ListBuffer{i}) } i += 1 } indexMap }
objectEx05extendsApp{ var test = new mutable.HashSet[String] withMkToString test += "Hello" test += "Scala" test += "Spark" println(test.mkToString(",")) }
Harry Hacker写了一个从命令行接受一系列文件名的程序。对每个文件名,他都启动一个新的线程来读取文件内容并更新一个字母出现频率映射,声明为:
1
val frequencies = new scala.collection.multable.HashMap[Char,Int] with scala.collection.mutable.SynchronizedMap[Char,Int]
当读到字母c时,他调用
1
frequencies(c) = frequencies.getOrElse(c,0) + 1
为什么这样做得不到正确答案?如果他用如下方式实现呢:
1 2
import scala.collection.JavaConversions.asScalaConcurrentMap val frequencies:scala.collection.mutable.ConcurrentMap[Char,Int] = new java.util.concurrent.ConcurrentHashMap[Char,Int]
Harry Hacker把文件读取到字符串中,然后想对字符串的不同部分用并行集合来并发地更新字母出现频率映射。他用了如下代码:
1 2
val frequencies = new scala.collection.mutable.HashMap[Char,Int] for(c <- str.par) frequencies(c) = frequencies.getOrElse(c,0) + 1
为什么说这个想法很糟糕?要真正地并行化这个计算,他应该怎么做呢?(提示:用aggregate。)
首先,这个想法的糟糕在于并行修改共享变量,结果无法估计。
为了正确使用aggregate,先了解如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
abstractdefaggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B Aggregates the results of applying an operator to subsequent elements.
This is a more general form of fold and reduce. It is similar to foldLeft in that it doesn't require the result to be a supertype of the element type. In addition, it allows parallel collections to be processed in chunks, and then combines the intermediate results.
aggregate splits the collection or iterator into partitions and processes each partition by sequentially applying seqop, starting with z (like foldLeft). Those intermediate results are then combined by using combop (like fold). The implementation of this operation may operate on an arbitrary number of collection partitions (even 1), so combop may be invoked an arbitrary number of times (even 0).
As an example, consider summing up the integer values of a list of chars. The initial value for the sum is 0.First, seqop transforms each input character to an Int and adds it to the sum (of the partition). Then, combop just needs to sum up the intermediate results of the partitions:
List('a', 'b', 'c').aggregate(0)({ (sum, ch) => sum + ch.toInt }, { (p1, p2) => p1 + p2 }) B the typeofaccumulatedresults z the initial value for the accumulated result of the partition - this will typically be the neutral element for the seqop operator (e.g. Nilfor list concatenation or 0for summation) and may be evaluated more than once seqop an operator used to accumulate results within a partition combop an associative operator used to combine results from different partitions
了解aggregate需要传入的参数以后,编码修改如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
package excercises.chapter13
objectEx10extendsApp{ val str = "abcdefghijkwerwseofnwoenroejrwhtowehtokweht"
val frequencies = str.par.aggregate(new collection.immutable.HashMap[Char, Int]())( {(m, c) => m + (c -> (m.getOrElse(c, 0) + 1))}, {(m1, m2) => m1 ++ m2.map{ case (k,v) => k -> (v + m1.getOrElse(k, 0)) }} )
println(frequencies)
}
测试结果如下:
1 2 3
[info] Running excercises.chapter13.Ex10 Map(e -> 7, s -> 1, n -> 2, j -> 2, t -> 3, f -> 2, a -> 1, i -> 1, b -> 1, g -> 1, c -> 1, h -> 4, r -> 3, w -> 6, k -> 2, o -> 5, d -> 1) [success] Total time: 2 s, completed May17, 20197:28:45PM