CSCI 485 Sample Questions: Midterm 1 ==================================== The actual midterm questions will be variations on the questions listed below. Research project ---------------- (i) Briefly explain your research project topic, emphasizing (ii) what about it is technically challenging with respect to the course content (iii) why, at presentation time, will the topic be of interest to other students in the class Lab project i ------------- For the current lab project, the server needs to know when clients change their location, task, or activity. There are two very different viable approaches: (1) the client sends an update every few seconds, giving current location/task/activity, and doesn't require acknowledgements to these (2) the client only sends an update when they actually change their location/task/activity, but then requires an acknowledgement to be certain the server got it Question: (i) when/why would (1) be a superior approach (ii) when/why would (2) be a superior approach Lab project ii -------------- For the current lab project, suppose the clients needed to be able to send messages to one another, either directly or routed through the server. Describe the design issues and alterations this change would force in general, and with your solution approach in particular. Lab project iii --------------- For the current lab project, suppose the client was meant to be run on a mobile device mounted in a vehicle, with the expectation that connectivity would frequently be lost for short intervals. Describe how this would change your design and implementation, with appropriate justification/explanation of your answer. Threading/states ---------------- - you have a read thread and a control thread - the read thread uses blocking reads, in a loop like do { char c; cin.get(c); sharedBuffer.enque(c); } while (!quit); where the buffer is shared with the control thread 1. where should the logic to decide/set 'quit = true' go, and why 2. given your answer to (1.), where should any semaphore(s) go, and why Blocking vs nonblocking ----------------------- (i) Are there circumstances under which blocking receives are preferable to nonblocking? Justify your answer. (ii) Are there circumstances under which blocking keyboard input is preferable to nonblocking? Justify your answer. TCP vs UDP ---------- Explain the different circumstances under which a tcp (connection-based) communication system would be preferable to a udp (connectionless) one, and why Message composition ------------------- In lectures we discussed formats that would allow highly flexible yet maintainable message structures. Suppose you were implementing a simple chat room application. Provide a preliminary design for the message structure you would use and justify your design choices. Shared data ----------- One of the design issues that becomes particularly important as system size increases is the mechanism for sharing data critical to establishing communication and validating who you are communicating with. One of the typical approaches involves the potential users somehow finding the appropriate site (e.g. through a search engine) then following a registration/login process, possibly with more limited access for 'guests'. Discuss the potential pitfalls/limitations/drawbacks you can see with such a scheme. Data handling ------------- Suppose you were designing/implementing an application 'from scratch'. Describe the criteria you would use to decide if a traditional RDBMS was the appropriate data storage mechanism for your application, and justify your decision. Data scaling ------------ Suppose within the first few years after release, the scale/scope of your data storage needs wildly exceeded your original design expectations. Assuming the original design centered around an RDBMS system, describe (i) describe (and justify) the approaches you would use to try to cope with the increased data handling needs (ii) the criteria you would use to decide when an RDBMS was simply no longer a practical (i.e. sufficiently scalable) solution to your problem Scaling an RDBMS ---------------- One suggestion on large scaling was to grossly denormalize the DB, actually running several seperate databases, meant to model the same (or, at least, overlapping) set of logical data but with completely different internal designs. Each DB would be specially designed to handle specific kinds of queries. Discuss the design/implementation difficulties associated with such an approach. Client-side storage ------------------- Suppose most of your users always connect from their 'regular' machine, be it a desktop, laptop, or phone, at home or at work. However, a significant minority of your users regularly connect from public or shared machines. Discuss the implications this has for the design of your client-side data storage. Sharding -------- First, describe an application that you think would be highly suitable for sharding, and why it is highly suitable. Second, describe a data-driven application whose scale is sufficient to consider sharding, but where the nature of the data or queries makes it unsuitable for sharding. Erlang ------ We have spent some time in lectures and labs examining erlang. Briefly discuss the key strengths and weaknesses of erlang for use in highly distributed, highly scalable applications. Distributed rate limiting ------------------------- In recent lectures and labs we examined some of the issues associated with distributed rate limiting. One of the solutions discussed was allowing nodes in the system to 'trade' capacity for the resource being distributed. Inter-node communication can become an issue in such a scheme, especially if nodes are widely seperated geographically (increasing both the time required to communicate and the likelihood of failures in links between the nodes). Consider the following tiered variation of the capacity-trading idea: Nodes are grouped into clusters which are geographically very close to one another, each cluster has one lead server. Nodes within a cluster can trade capacity with one another. Clusters are grouped into data centres, covering larger geographical areas, each data centre has one lead server The clusters (through their lead servers) can trade their capacities with one another. (The lead servers would then tell the other nodes within the cluster to scale up or down their current capacities as appropriate.) Data centres (through their lead servers) can trade their total capacities with one another (again with the lead servers telling their clusters to scale up/down, with the clusters then telling their nodes to scale up/down). Discuss the potential problems and benefits associated with such a scheme.