CSCI 485 Sample Questions: Midterm 1
====================================

The actual midterm questions will be variations
on the questions listed below.

Research project
----------------
 (i)   Briefly explain your research project topic, emphasizing
 (ii)  what about it is technically challenging with     
       respect to the course content
 (iii) why, at presentation time, will the topic be of
       interest to other students in the class


Lab project i
-------------
  For the current lab project, the server needs to
    know when clients change their location, task,
    or activity.
  There are two very different viable approaches:
    (1) the client sends an update every few seconds,
        giving current location/task/activity,
        and doesn't require acknowledgements to these
    (2) the client only sends an update when they
        actually change their location/task/activity,
        but then requires an acknowledgement to be
        certain the server got it
  Question:
    (i)  when/why would (1) be a superior approach
    (ii) when/why would (2) be a superior approach


Lab project ii
--------------
For the current lab project, suppose the clients
   needed to be able to send messages to one another,
   either directly or routed through the server.
Describe the design issues and alterations this change 
   would force in general, and with your solution approach
   in particular.
   

Lab project iii
---------------
For the current lab project, suppose the client was meant
   to be run on a mobile device mounted in a vehicle, with
   the expectation that connectivity would frequently be
   lost for short intervals.
Describe how this would change your design and implementation,
   with appropriate justification/explanation of your answer.


Threading/states
----------------
  - you have a read thread and a control thread
  - the read thread uses blocking reads, in a loop like
     do {
        char c;
        cin.get(c);
        sharedBuffer.enque(c);
     } while (!quit);
    where the buffer is shared with the control thread
  1. where should the logic to decide/set 'quit = true' go, 
     and why
  2. given your answer to (1.), 
     where should any semaphore(s) go, and why


Blocking vs nonblocking
-----------------------
  (i)  Are there circumstances under which blocking receives
       are preferable to nonblocking?  Justify your answer.
  (ii) Are there circumstances under which blocking keyboard
       input is preferable to nonblocking?  Justify your answer.


TCP vs UDP
----------
  Explain the different circumstances under which a 
  tcp (connection-based) communication system would 
  be preferable to a udp (connectionless) one, and why


Message composition
-------------------
  In lectures we discussed formats that would allow
  highly flexible yet maintainable message structures.

  Suppose you were implementing a simple chat room
  application.  Provide a preliminary design for the 
  message structure you would use and justify your 
  design choices.


Shared data
-----------
  One of the design issues that becomes particularly
  important as system size increases is the mechanism
  for sharing data critical to establishing communication
  and validating who you are communicating with.

  One of the typical approaches involves the potential users
  somehow finding the appropriate site (e.g. through a search
  engine) then following a registration/login process, possibly
  with more limited access for 'guests'.

  Discuss the potential pitfalls/limitations/drawbacks you can
  see with such a scheme.

  
Data handling
-------------
  Suppose you were designing/implementing an application
  'from scratch'.  Describe the criteria you would use
  to decide if a traditional RDBMS was the appropriate
  data storage mechanism for your application, and justify
  your decision.

Data scaling
------------
  Suppose within the first few years after release, the 
  scale/scope of your data storage needs wildly exceeded 
  your original design expectations.

  Assuming the original design centered around an RDBMS
  system, describe 
    (i)  describe (and justify) the approaches you would use
         to try to cope with the increased data handling needs
    (ii) the criteria you would use to decide when an
         RDBMS was simply no longer a practical (i.e.
         sufficiently scalable) solution to your problem


Scaling an RDBMS
----------------
One suggestion on large scaling was to grossly denormalize the DB,
   actually running several seperate databases, meant to model the
   same (or, at least, overlapping) set of logical data but with
   completely different internal designs.
Each DB would be specially designed to handle specific kinds of queries.

Discuss the design/implementation difficulties associated with
   such an approach.


Client-side storage
-------------------
Suppose most of your users always connect from their 'regular' machine,
   be it a desktop, laptop, or phone, at home or at work.
However, a significant minority of your users regularly connect from
   public or shared machines.
Discuss  the implications this has for the design of your client-side
   data storage.

Sharding
--------
First, describe an application that you think would be highly suitable
   for sharding, and why it is highly suitable.
Second, describe a data-driven application whose scale is sufficient
   to consider sharding, but where the nature of the data or queries
   makes it unsuitable for sharding.

Erlang
------
We have spent some time in lectures and labs examining erlang.
   Briefly discuss the key strengths and weaknesses of erlang
   for use in highly distributed, highly scalable applications.

Distributed rate limiting
-------------------------
In recent lectures and labs we examined some of the issues associated
   with distributed rate limiting.
One of the solutions discussed was allowing nodes in the system
   to 'trade' capacity for the resource being distributed.

Inter-node communication can become an issue in such a scheme,
   especially if nodes are widely seperated geographically
   (increasing both the time required to communicate and the
    likelihood of failures in links between the nodes).

Consider the following tiered variation of the capacity-trading idea:
   Nodes are grouped into clusters which are geographically 
       very close to one another, each cluster has one lead server.
       Nodes within a cluster can trade capacity with one another.
   Clusters are grouped into data centres, covering larger
       geographical areas, each data centre has one lead server
       The clusters (through their lead servers) can trade their
       capacities with one another.  (The lead servers would then
       tell the other nodes within the cluster to scale up or down
       their current capacities as appropriate.)
   Data centres (through their lead servers) can trade their total
       capacities with one another (again with the lead servers telling
       their clusters to scale up/down, with the clusters then telling
       their nodes to scale up/down).
Discuss the potential problems and benefits associated with such a scheme.