-A A +A

CS2013-Strawman-PD-Parallel and Distributed Computing (DEPRECATED)

9 posts / 0 new
Last post
sahami
CS2013-Strawman-PD-Parallel and Distributed Computing (DEPRECATED)

The CS2013 Strawman Report comment period is now closed. Please see the CS2013 Ironman report for the latest CS2013 draft to comment on. Forum to comment on "PD-Parallel and Distributed Computing" Knowledge Area in the CS2013 Strawman report. We ask that comments related to specific text in the report please specify the page number and line number(s) of the text being commented on. Line numbers are provided on the far left-hand side of the each page.

Tom Anderson
Distributed systems and concurrency
field_vote: 
0
Your rating: None

This page is very thin and overly focused on the more theoretical topics. Where should a student go to find the key ideas behind distributed file systems? BigTable? Amazon? Distributed authentication? These are no longer graduate-level topics.

Concurrency is NOT primarily a parallelism topic. It’s an SF topic: you can’t build a database or an OS or even a web browser without it, even on a uniprocessor. So if the detailed treatment is kept in this section, then you need a lot of cross pointers.

sahami
Response
field_vote: 
0
Your rating: None

Thank you for the comments. While topics such as BigTable and Amazon may no longer be graduate-level topics, we don't believe they belong in the core undergraduate curriculum at this point. We also want to avoid highlighting any particular commercial application (e.g., Amazon), although we believe the principles can be covered in more general ways.

We will make more explicit the difference in meaning between concurrency and parallelism and include more cross-links to Systems Fundamentals as appropriate.

claremont_colle...
Claremont Colleges draft colleges
field_vote: 
0
Your rating: None

The following comments are from the Programming Languages faculty at the Claremont Colleges (Kim Bruce -- Pomona College and Melissa O’Neill, Chris Stone, & Ben Wiedermann -- Harvey Mudd College):

We were generally impressed with the content of this knowledge area, and in particular the desire to give a comprehensive introduction to the area. That said, we did notice that although the introduction mentions many attributes of parallel programming, especially the difficulties faced within the field, it makes almost no mention of the goals of parallel programming, which are usually one or more of speed, scale, or redundancy. These issues are important for motivating why CS educators should be including parallelism in their curricula.

The content of the knowledge area does an excellent job of distinguishing parallelism and concurrency in some places, but conflates these concepts in others. We think the draft could benefit from a clearer and more consistent distinction throughout. In particular:

Line 74 gives a good succinct description of the distinction, we think the introduction should make this distinction equally clear.

Lines 76 would be more precise if it were "Programming errors in concurrent programming not found in sequential programming"

Many of the topics in the Communication and Coordination subsection (starting on page 123, line 110) are issues of concurrency. Perhaps the subsection could be renamed to incorporate the word concurrency (analogously to the previous subsections which use the words "parallelism" / "parallel")?

sahami
Response
field_vote: 
0
Your rating: None

Thank you for the comments.

We will more explicitly distinguish between parallelism and concurrency in the introduction, including providing working definitions and goals, to make it clear what these terms mean throughout the document.

We will update Line 76 as you have suggested.

alan_fekete
PD suggestions
field_vote: 
0
Your rating: None

This is an important topic, and it is good to see it getting lots of coverage.

I don't agree that sequential consistency is a Core-tier1 topic (line 115). It is a fairly hard model to deal with, and different language runtimes provide different models. I'd rather see this as part of Core-Tier2 in the wider discussion of consistency in shared memory models (line 118). If anything of this is really Core-Tier1, it is the concept "different platforms offer different consistency models for their memory, and it matters for programming on top of the memory".

I think a whole section is needed on PD/Internet-Scale Computing, covering the main ideas of cloud platforms, especially weak consistency, mechanisms for availability in face of frequent failure, elasticity and its implications (this topic is also good for introducing considerastion of the economics of data centers, environmental impacts, etc)

sahami
Response
field_vote: 
0
Your rating: None

Thank you for the comments.

After some discussion we felt that having some discussion of consistency for shared memory programs was important to keep as a Tier1 topic.

We will be adding a new section (knowledge unit) entitled PD/Cloud Computing, which includes Internet-Scale computing, in addition to many other related topics such as resource elasticity, virtualization, and software/platform as a service.

sahami
Feedback from ACM Education Council meeting
field_vote: 
0
Your rating: None

[Comments below are from break-out session on "Parallel and Distributed Computing" at ACM Education Council meeting]

Potential exemplar courses to consider: UCB 61C, CMU Blelloch, UW Data Structures

Heavy on parallel, light on Distributed -- would be good to get input from Distributed Computing community. (Dick Brown suggested).

Distributed Computing overlaps with Communication? Client/server model is distributed; where should be included?

Consider cost for data sitting vs. moving - perhaps this is an issue for SF-System Fundamentals.

Consider power considerations. This can involve multiple constraints -- latency, etc. May be as detailed as RAM refresh power cost.

Consider using the term and concept "cloud". PD seems to be the logical place for that.

Should multicore be here or SF or AR or PBD?

Issue was raised of how to approach multicore arch/deveopment from very start in early course.

[Future consideration] Multiple core systems are different enough to require different/better level of abstractions in programming language. Maybe F#, something like Chapel, Erlang. E.G OpenMP as the right abstraction for fork-join parallel].

Traditional abstract data types (e.g. lists) are sequential. Where should these be addressed in CS2013?

Guidance on how to implement this area at the "99%" of schools (e.g., non-UCB). Premise: material has to exist (e.g. textbook, "nifty assignments") else won't happen! Online modules key to addressing this. SIGCSE 2012 Lightning Round as excellent example (videos/content online?).
Comment: Need a full plan: where does it fit in roadmap (directed graph).

CS2013 needs dependency graph or ontology for tier1/2 content.

Transforming sequential to parallel -- where is this addressed? Should we even teach this? Legacy portfolios with us for quite some time...

Side effects potentially important to call out explicitly (cross-ref/overlap with PL).

Careful about Amdahl's law in tier 2, might get missed. Not a big topic (5 min), and needs the rejoinders (weak scaling, Gustafson observation etc, work span).
Put line 163 into Tier 1. (this pulls along the learning outcomes (174)

There was some potential confusion about topic headers = courses (which they don't).

List of reference books? (e.g. Calvin Kim / Larry Snyder's book) In general, need reference material (not just full course examplars) of all kinds (videos, labs, etc).

Exemplars for a fully-instantiated program would be useful (many sites will be following these guidelines to the letter).

sahami
Response
field_vote: 
0
Your rating: None

Thank you for the extensive comments.

We appreciate the pointers to the potential exemplar courses and will look into them further.

The client/server model appears in Systems Fundamentals. We will provide a cross-reference.

Resource constraints are addressed in Systems Fundamentals and the Parallel and Distributed Computing areas, we will provide more cross-references.

Cloud Computing will be added as a new knowledge unit in the Parallel and Distributed Computing knowledge area.

Multicore processing is addressed in multiple areas (as different levels of depth). In the same way, we'd imagine multicore processing being a topic that shows up (at different levels of facility) in different courses in a curriculum.

Traditional abstract data types are addressed in the Software Development Fundamentals knowledge area.

For guidance on implementation, we will be providing some pointers to fielded "exemplar" classes that cover this material in different ways (and at different levels). These exemplars should help provide more concrete guidance on implementation.

We are considering adding an index to CS2013 to make it more clear how to find related content across different knowledge areas.

Programming without side-effects discussed in Programming Languages knowledge area. It is alluded to in PD with particular models (e.g., map-reduce).

Amdahl's Law also appears in Systems Fundamentals. Coverage in PD area is beyond what would be covered in Systems Fundamentals.

Yes, topic headers are not individual courses. We would expect that there would be different models for covering the PD material (e.g., distribute this material throughout more "traditional" topic courses (such as algorithms), have a dedicated course on parallel and distributed computing, include this material in more introductory systems-related courses, etc.)

We believe that exemplar courses (which will be included in CS2013) will have references to materials such as textbooks. Nevertheless, CS2013 does not want to endorse any particular textbook or materials as part of an undergraduate curriculum.

Log in or register to post comments