-A A +A

CS2013-Strawman-IM-Information Management (DEPRECATED)

3 posts / 0 new
Last post
sahami
CS2013-Strawman-IM-Information Management (DEPRECATED)

The CS2013 Strawman Report comment period is now closed. Please see the CS2013 Ironman report for the latest CS2013 draft to comment on. Forum to comment on "IM-Information Management" Knowledge Area in the CS2013 Strawman report. We ask that comments related to specific text in the report please specify the page number and line number(s) of the text being commented on. Line numbers are provided on the far left-hand side of the each page.

djg
suggestions, especially parallel databases
field_vote: 
0
Your rating: None

[This comment is from Magda Balazinska and Dan Suciu, database
professors at the University of Washington. For convenience, Dan
Grossman is entering it into the public-comment system.]

Overall, the IM KA looks quite reasonable, although very conservative,
which makes sense for a general document like this.

=== Significant omissions ===

The most important shortcoming is the omission of /parallel/ DBMS
topics, which are an essential part of the area. We therefore recommend:

Replace "Distributed Databases" (line 187) with "Parallel and
Distributed Databases", and add appropriate topics; e.g. (1)
architectures of parallel dbms: shared memory, shared disk, shared
nothing; (2) speedup and scaleup; (3) the mapReduce processing model
(4) data replication and weak consistency models.

The above suggestion may still under-represent the idea of MapReduce,
not as one particular system, but as a general approach to large-scale
data processing that is here to stay and not covered.

Relatedly missing as an important elective topic is NoSQL and related
topics like key-value stores and eventual consistency.

To summarize the key omissions:
* parallel databases
* MapReduce
* NoSQL

=== Additional learning outcomes and topics ===

1. The KA overall, particularly the Database Systems KU (line 42), is too
lightweight on understanding core database pieces. In addition to the
learning outcomes on lines 58 and 59, we recommend explicitly
indicating somewhere in the KA something like, "Understand the most
common designs for all the core database system components including
the query optimizer, buffer manager, query executor, operators (and
the operator algorithms both one-pass, two-pass, and indexed-based),
storage manager, access methods, and transaction processor."

2. IM/Query Languages (line 148): expand SQL. This language is
becoming increasingly important, and is not easy to master. It would
be helpful if the document had more topics for SQL (e.g. (1)
selections (2) projections (3) select-project-join (4) aggregates and
group-by, (5) subqueries) to suggest to an instructor to allocate more
lectures for SQL.

=== Places to trim ===

In IM/Relational Databases, the long list of normal forms is no longer
needed. It should suffice to study only BCNF and, possibly, PJNF or
5NF.

alan_fekete
Comments on IM
field_vote: 
0
Your rating: None

Minor formatting:
In electives, outcomes are not classified by {Knowledge, Application, Evaluation};In IM/Indexing, lines 95, 96 have a private comment from/to Robert that should be removed

general comment: the learning outcomes seem too heavy towards Knowledge, and not enough higher-level skills

Major missing content: Information Security. I think this warrants a whole section, and it should be cross-referenced to IAS (but doesn't currently seem to be covered well there). Topics include the value of information about people and organizations, common attacks (including general ones based on finding old or lost storage devices, computing aggregates over sets with small differences, and use of machine learning techniques to de-anonymize data, and others that are technology-specific including SQL injection), and the interaction of SQL's access control with its view-definition mechanisms, to obtain carefully-targeted access.

Missing content in the core (IM/Information Management Concepts): information management as a socio-technical system, especially the advantages and disadvantages of central organizational control over data; the careers/roles associated with information management (including database administrator, data modeller, application develeoper, end-user). I usually take about 30 minutes in the first lecture to deveote to this.

In IM/Data Modeling, I would add the spreadsheet (universal single table) data model, as a good one for provoking evaluation and because it is so widespread in practice. Also, I suggest to add the semantic RDF model of triples (cross-reference CN/Data, Information and Knowledge; p 54, line 216)

in IM/Query Languages, coverage of SQL's view mechanism is really important. Also, (line 154) I'd reword "embedding non-procedural queries in a procedural language" to say "different ways to invoke non-procedural queries from conventional languages (eg LINQ, procedure calls with textual queries as parameters, pre-processor embeddings)". I am not convined that Object Query Language (line 155) has continuing relevance; I'd rather see coverage of more used query languages for other data models such as XPATH for semistructured model or SPARQL for RDF data.

In IM/Transaction Processing, an important topic is "Interaction of transaction management with storage, especially buffering". An important outcome is "Identify appropriate transaction boundaries in application programs". Line 184 is confusing: a transaction *protocol* in my mind determines an isolation level; choosing isolation level is something one does for a specific piece of application code running as a transaction.

In IM/Relational Databases (lines 120-122) agree with the comments from UWashington proposing to remove normal forms other than BCNF (and 1NF which I'd keep but just as part of the relational model, and doesn't need to be seen as "normalization theory")

I agree with UWashington's comments that more is needed on parallel and distributed databases, though I would see replication as the most crucial missing topic (I think NoSQL and MapReduce are not established enough as long-term fundamental content for a curriculum that might last a decade)

Log in or register to post comments