The first will be held in lectures October 24th, the second will be held in the university final exam period: 1-4pm on Thursday Dec. 13th
Each will focus on material from the most recent course material, but the later exam will probably draw on earlier material and discussions.
The format will be four essay style questions, all equally weighted.
The exams are open book, open notes, but no electronics permitted.
Here are the collected 2011 midterm questions
(from the three 'midterms' that were held that spring).
For the 2012
version of the course, questions 1-6 would be suitable for the October
midterm, while 7-13 would be suitable for the December exam.
Other sample questions for the two midterms are listed below.
CSCI 485 Sample Questions: Midterm 1
====================================
Research project
----------------
(i) Briefly explain your research project topic, emphasizing
(ii) what about it is technically challenging with
respect to the course content
(iii) why, at presentation time, will the topic be of
interest to other students in the class
Lab project i
-------------
For the current lab project, the server needs to
know when clients change their location, task,
or activity.
There are two very different viable approaches:
(1) the client sends an update every few seconds,
giving current location/task/activity,
and doesn't require acknowledgements to these
(2) the client only sends an update when they
actually change their location/task/activity,
but then requires an acknowledgement to be
certain the server got it
Question:
(i) when/why would (1) be a superior approach
(ii) when/why would (2) be a superior approach
Lab project ii
--------------
For the current lab project, suppose the clients
needed to be able to send messages to one another,
either directly or routed through the server.
Describe the design issues and alterations this change
would force in general, and with your solution approach
in particular.
Lab project iii
---------------
For the current lab project, suppose the client was meant
to be run on a mobile device mounted in a vehicle, with
the expectation that connectivity would frequently be
lost for short intervals.
Describe how this would change your design and implementation,
with appropriate justification/explanation of your answer.
Threading i
-----------
When threads must communicate, the two primary options are
via message passing or via shared data elements.
(i) Describe circumstances under which message passing
would be the preferable approach and why.
(ii) Describe circumstances under which shared data
(and semaphores) would be preferable and why.
Threading/states ii
-------------------
- you have a read thread and a control thread
- the read thread uses blocking reads, in a loop like
do {
char c;
cin.get(c);
sharedBuffer.enque(c);
} while (!quit);
where the buffer is shared with the control thread
1. where should the logic to decide/set 'quit = true' go,
and why
2. given your answer to (1.),
where should any semaphore(s) go, and why
Blocking vs nonblocking
-----------------------
(i) Are there circumstances under which blocking receives
are preferable to nonblocking? Justify your answer.
(ii) Are there circumstances under which blocking keyboard
input is preferable to nonblocking? Justify your answer.
TCP vs UDP
----------
Explain the different circumstances under which a
tcp (connection-based) communication system would
be preferable to a udp (connectionless) one, and why
Message composition
-------------------
In lectures we discussed formats that would allow
highly flexible yet maintainable message structures.
Suppose you were implementing a simple chat room
application. Provide a preliminary design for the
message structure you would use and justify your
design choices.
Shared data
-----------
One of the design issues that becomes particularly
important as system size increases is the mechanism
for sharing data critical to establishing communication
and validating who you are communicating with.
One of the typical approaches involves the potential users
somehow finding the appropriate site (e.g. through a search
engine) then following a registration/login process, possibly
with more limited access for 'guests'.
Discuss the potential pitfalls/limitations/drawbacks you can
see with such a scheme.
Synchronizing systems
---------------------
One of the issues when dealing with distributed systems
and replicated data is that of synchronization: how can
two servers decide what order different events should
take place in, especially given variations in system clocks
and the time lags involved in communication.
This is particularly true in 'eventually consistent' systems,
where data transactions percolate through to different servers
at different times and in different orders.
We discussed the use of time counters - internal counters which
each server uses to identify time 'steps', and which are sent
as part of each message transmitted. The receiver always updates
their internal counter to be the greater of their current counter
and the timestamp received as part of a message.
Discuss the potential pitfalls/weaknesses you see in such a
synchronization scheme, and suggest potential workarounds.
Data handling
-------------
Suppose you were designing/implementing an application
'from scratch'. Describe the criteria you would use
to decide if a traditional RDBMS was the appropriate
data storage mechanism for your application, and justify
your decision.
Data scaling
------------
Suppose within the first few years after release, the
scale/scope of your data storage needs wildly exceeded
your original design expectations.
Assuming the original design centered around an RDBMS
system, describe
(i) describe (and justify) the approaches you would use
to try to cope with the increased data handling needs
(ii) the criteria you would use to decide when an
RDBMS was simply no longer a practical (i.e.
sufficiently scalable) solution to your problem
Scaling an RDBMS
----------------
One suggestion on large scaling was to grossly denormalize the DB,
actually running several seperate databases, meant to model the
same (or, at least, overlapping) set of logical data but with
completely different internal designs.
Each DB would be specially designed to handle specific kinds of queries.
Discuss the design/implementation difficulties associated with
such an approach.
Client-side storage
-------------------
Suppose most of your users always connect from their 'regular' machine,
be it a desktop, laptop, or phone, at home or at work.
However, a significant minority of your users regularly connect from
public or shared machines.
Discuss the implications this has for the design of your client-side
data storage.
Sharding i
----------
First, describe an application that you think would be highly suitable
for sharding, and why it is highly suitable.
Second, describe a data-driven application whose scale is sufficient
to consider sharding, but where the nature of the data or queries
makes it unsuitable for sharding.
Sharding ii
-----------
Describe what you see as the most important criteria to consider
when investigating whether sharding is the right approach for
your high-volume, high-demand data management solution, and
justify your answer.
Erlang
------
We have spent some time in lectures and labs examining erlang.
Briefly discuss the key strengths and weaknesses of erlang
for use in highly distributed, highly scalable applications.
Distributed rate limiting
-------------------------
In recent lectures and labs we examined some of the issues associated
with distributed rate limiting.
One of the solutions discussed was allowing nodes in the system
to 'trade' capacity for the resource being distributed.
Inter-node communication can become an issue in such a scheme,
especially if nodes are widely seperated geographically
(increasing both the time required to communicate and the
likelihood of failures in links between the nodes).
Consider the following tiered variation of the capacity-trading idea:
Nodes are grouped into clusters which are geographically
very close to one another, each cluster has one lead server.
Nodes within a cluster can trade capacity with one another.
Clusters are grouped into data centres, covering larger
geographical areas, each data centre has one lead server
The clusters (through their lead servers) can trade their
capacities with one another. (The lead servers would then
tell the other nodes within the cluster to scale up or down
their current capacities as appropriate.)
Data centres (through their lead servers) can trade their total
capacities with one another (again with the lead servers telling
their clusters to scale up/down, with the clusters then telling
their nodes to scale up/down).
Discuss the potential problems and benefits associated with such a scheme.
CSCI 485 Sample Questions: Midterm 2
====================================
Cassandra vs Amazon S3
----------------------
In the lectures we discussed Ebay's use of Cassandra for their
write-heavy operations (likes, owns, wants, etc). Compare and
constrast the db/storage architecture used there with the one
used by Amazon's Simple Storage Server.
Consistent Hashing
------------------
Describe the concept of consistent hashing and how it is used
by Amazon, Google, and others as a key part of their scalable
data storage approaches.
Skype
-----
If you had to re-do Skype using a data storage system patterned
after one of the following: Amazon's S3, Google's BigTable, EBay's
use of Cassandra, or Facebook's use of HBase, discuss which you
would choose and why.
Philosophy
----------
Compare and contrast the technical and business philosophies
of either (i) Ebay and Amazon, or (ii) Facebook and Twitter.
In particular, discuss the impact the philosophies have on
the design and operation of their hardware/software systems.
Vector clocks
-------------
Clearly describe and discuss the concept of vector clocks and
give an example of their use in synchronizing actions across
multiple servers.
Optimization
------------
Many of the systems we have examined in recent weeks rely heavily
on partitioning a technical problem into distinct layers or distinct
subproblems and optimizing solutions for each of the layers/subproblems
seperately.
Discuss and give examples of this for each of Twitter, Google, and Ebay.
Multiplayer gaming
------------------
If you were to re-architecture Eve Online's data storage infrastructure,
what are the key approaches you would investigate and why?
Meta-operating systems
----------------------
Many of the systems we have examined in recent weeks include something
like a meta-operating system as a key part of their infrastructure,
for automated monitoring, control, and repair of their networks and
for automated roll-out of code updates.
Compare and contrast the meta-operating system used by
two of the following: Google, EBay, Facebook, or Amazon.
Privacy and Security
--------------------
Describe, compare and constrast the privacy and security issues
associated with Facebook, Skype, and Ebay.
Search engines
--------------
Describe and discuss the search engine and supporting
infrastructure for one of the following systems:
Facebook, Google, Twitter.