Problem Set 1 Harvard Extension School CSCI E-92: Principles of Operating Systems Spring 2024 Due: February 11, 2024 at Midnight ET (Eastern Time) Total of 100 Points As described in the syllabus, submit the solution to all problems in this Problem Set using "git" with named branch problem-set-1. 1. (5 Points) Multiprogramming. What is multiprogramming? Tanenbaum 5/e 1.2 (Also in Tanenbaum 3/e 1.1, Tanenbaum 2/e 1.2). In additon, give at least two reasons for having multiprogramming. 2. (5 Points) Protected kernel mode operations. Which of the following instructions should be allowed only in kernel mode? (a) Disable all interrupts. (b) Read the time-of-day clock. (c) Set the time-of-day clock. (d) Change the memory map. (Also in Tanenbaum 4/e 1.12, Tanenbaum 3/e 1.7, Tanenbaum 2/e 1.8). In addition, for each subsection a-d, describe what problems would be caused by allowing the instruction to run when not in kernel mode. 3. (5 Points) Priority inversion problem with priority scheduling vs. round-robin scheduling. In Sec. 2.4.4 (Tanenbaum 4/e Sec. 2.3.4), a situation with a high-priority process, H, and a low-priority process, L, was described, which led to H looping forever. Does the same problem occur if round-robin scheduling is used instead of priority scheduling? Discuss. Tanenbaum 5/e 2.25 (Tanenbaum 4/e 2.26, Tanenbaum 3/e 2.20, Tanenbaum 2/e 2.26) 4. (5 Points) Implementing semaphores using a disable and enable interrupts instructions. How could an operating system that can disable interrupts implement semaphores? Your solution should be a pseudo-code implementation of the up and down semaphore calls that achieves atomicity using di (disable interrupts) and ei (enable interrupts). There is no need to deal with managing resources. Your implementation of up and down cannot invoke any user code that is external to the semaphore. Keep in mind that when interrupts are disabled, the scheduler interrupt (that is able to switch context to run a different process) will not be handled but will be remembered as a pending interrupt. Once interrupts are re-enabled, pending interrupts (including the scheduler interrupt) are then handled. Tanenbaum 4/e 2.31 (Tanenbaum 3/e 2.25, Tanenbaum 2/e 2.23). 5. (10 Points) Implementing counting semaphores using binary semaphores. Show how counting semaphores (i.e., semaphores that can hold an arbitrary value) can be implemented using only binary semaphores and ordinary machine instructions. Tanenbaum 5/e 2.31 (Tanenbaum 4/e 2.32, Tanenbaum 3/e 2.26, Tanenbaum 2/e 2.24). You must include pseudo-code for your solution. By "ordinary machine instructions," we mean all facilities that are available through the C Programming Language without using library functions, for example. Your solution should use binary semaphores wherever mutual exclusion is needed. 6. (70 Points) Simple Shell Implementation Using the C Programming Language on the cscie92.dce.harvard.edu instance, implement a simple shell. Your program should interact with the user through stdin, stdout, and stderr. After outputting to stdout a prompt of "$ ", it will accept a line of text input from stdin and parse that line into white space delimited fields. You must be able to accept lines with up to 256 characters (not including any newline or null termination). White space will be composed of one or more spaces or tabs. From the input line, an integer named "argc" and an array named "argv" will created. The integer argc will contain the count of the number of white space separated fields found in the input line. Any white space before the first field must also be ignored. The array argv will contain a list of pointers to copies of each of the fields found in the input line as null-terminated strings. The C Standard also requires that the argv array contains a final NULL pointer (i.e., argv[argc] is always NULL). A declaration of argc and argv follows: int argc; char **argv; or, depending on the context: char *argv[]; Here is an example. If the input were to contain: echo this is some input string then argc would contain 6 and argv would be an array of seven pointers; the first six would point to null terminated strings and the seventh pointer would be NULL. The first pointer would point to the string "echo", the second pointer would point to the string "this", and so forth. Note that the argv array needs to be dynamically allocated and that each of the strings pointed to by each entry in argv needs to be dynamically allocated. The strings pointed to by argv will never contain any spaces or tabs (except if either or both of the double-quoting or backslash escape character notation extra-credit portions below are implemented). The space allocated for argv should be exactly the required size (i.e., it should occupy (argc+1)*sizeof(char *) bytes). Similarly, the space allocated for each string should be exactly the required size (i.e., each string should occupy the number of bytes in the string + 1 for a null terminating byte). Storage should be allocated using the malloc system call. After argc and argv are created, the shell will scan an array named "commands" for a match for the first string in argv and will then call a different function for each entry in "commands". That is, we will assume that the first word on the line in the name of a program to be invoked by the shell. A declaration of commands with the initialization of five commands as follows: int cmd_date(int argc, char *argv[]); int cmd_echo(int argc, char *argv[]); int cmd_exit(int argc, char *argv[]); int cmd_help(int argc, char *argv[]); int cmd_clockdate(int argc, char *argv[]); struct commandEntry { char *name; int (*functionp)(int argc, char *argv[]); } commands[] = {{"date", cmd_date}, {"echo", cmd_echo}, {"exit", cmd_exit}, {"help", cmd_help}, {"clockdate", cmd_clockdate}}; After a command returns to the shell, all storage allocated for argv and the strings that argv points to should be freed. After the storage is freed, the shell should loop back to prompt the user for another input line. You should implement five commands. (1) "exit" will exit from the shell (i.e., cause the shell to terminate) by calling the exit system call. (2) "echo" will output each of the arguments in argv (except for the first) to stdout with a single space character between the arguments. After the last argument is output, a newline should be output. (3) "help" will output to stdout a brief description of the commands accepted by the shell. (4) "date" will output to stdout the current date and time in the format "January 23, 2014 15:57:07.123456". "date" will call the POSIX system call "gettimeofday" to determine the time and date. "gettimeofday" returns the number of seconds and microseconds since midnight (zero hours) on January 1, 1970 -- this time is referred to as the Unix Epoch. To be clear, no dates can be specified before January 1, 1970, i.e., there are no negative numbers of seconds and/or microseconds. Your output should be printed in the same timezone as that returned by the system call "gettimeofday". That is, the timezone information from gettimeofday should be ignored. (5) Finally, "clockdate" will take a single 64-bit positive integral number (after being parsed it will be stored in a variable of type time_t which is declared in system header file sys/time.h -- this is equivalent to a uint64_t) as its required argument: that argument will represent the number of seconds since the Unix Epoch. When calling your function to create a formatted date, you should pass the number of microseconds since the Unix Epoch as zero. The "clockdate" command will output to stdout the date represented by the argument integer in the format described in (4) above. The "clockdate" command is present to facilitate our ability to test your conversion from seconds and microseconds since the Unix Epoch to a printable human-readable string. Each command should return an integer to indicate if it succeeded or failed and a specific error code on failure. A value of 0 will indicate success and a non-zero value will indicate failure. You should create your own enumerated type of error codes for different errors. If the return value is non-zero, the shell should call a function that returns a pointer to a string that contains a meaningful textual message for the error code (the behavior of this function should be similar to how strerror works in Unix). Then, the shell should output this and any other error messages to stderr. Each command should check that the appropriate number of arguments are specified on the command line. If no arguments are appropriate for a command, then command should guarantee that no arguments have been specified. The "echo" command accepts any number of arguments (including none), so no argument count checking is required. Your translation from seconds and microseconds since zero hours on January 1, 1970 into year, month, day, hours, minutes, seconds, and millionths of seconds must be written from scratch using no system functions. Keep in mind that some years are leap years and others are not. Leap years contain 366 days (February 29th) and all other years contain 365 days. Every year that is evenly divisible by four is a leap year, except that every year divisible by 100 is not a leap year, except that every year divisible by 400 is a leap year. Your code may use only the following systems calls or library functions: malloc free exit fgetc feof ferror fputc fputs fprintf sprintf snprintf vsprintf vsnprintf gettimeofday You are also welcome to use the following library functions declared in string.h: memcpy memmove memcmp memset memchr strcpy strncpy strcat strncat strcmp strncmp strchr strcspn strpbrk strrchr strspn strstr strerror strlen strtol strtoul strtoll strtoull You may use only stdin, stdout, and stderr as I/O streams. (5 Points) Extra credit to be used for the programming portion only: Allow fields on the line to be double-quote delimited strings that can contain spaces or tabs. Ensure that any such field has a matching open double-quote and close double-quote. Also, allow a double-quote to appear within a double-quoted field in either or both of two possible ways: (1) allow two adjacent double-quotes within a double-quoted field to denote a single double-quote within the field, (2) implement the following backslash escape notation to allow special characters within a double-quoted string. (5 Points) Extra credit to be used for the programming portion only: Implement special backslash escape characters for shell command arguments (either outside or inside double-quoted string arguments). If this is implemented, then the backslash character would serve as a prefix for the next character. These two characters would then be replaced with a designated single character in the argument. The character after the backslash would indicate what character would be designated to be used in the argument as follows: Escape sequence Designated replacement character \0 null (ASCII 0) \a alarm (bell) (control-g) (ASCII 7) \b backspace (control-h) (ASCII 8) \e escape (ASCII 27 = 0x1b) \f form-feed (control-l) (ASCII 12 = 0xc) \n newline (line-feed) (control-j) (ASCII 10 = 0xa) \r carriage-return (control-m) (ASCII 13 = 0xd) \t horizontal-tab (control-i) (ASCII 9) \v vertical-tab (control-k) (ASCII 11 = 0xb) If the character following the backslash is not shown in the table above, then the character following the backslash should be used literally. Obviously, this would include the double-quote and backslash characters. (Points Vary) Extra credit to be used for the programming portion only: Add additional built-in commands to your shell. None of the commands you add should require use of files. (20 Points) Extra credit to be used for the programming portion only: Add variables to your shell. You should have a "set" command to define a variable and to output all defined environment variables. If the "set" command is invoked with no arguments, it will display the values of all shell variables. If the "set" command is invoked in the format "set =", then the shell variable is defined with the value as a string. Variable names may consist of letters and digits starting with a letter. In variable names, the underscore character is treated as a letter. You should not impose any constraints on the length of either variable names or of their values. However, if the substitution of the for the results in a command line that exceeds the maximum length that you accept, you should take a reasonable action that ensures the correctness of your program. For example, you might emit an error message and truncate the command line. Even though this assignment states that, "you must be able to accept lines with up to 256 characters..." a better solution might accept arbitrary length lines by expanding the line buffer as necessary. A common way to accomplish this is to double the size of the line buffer whenever additional space is needed, copy the existing line buffer into the new one, and then free the memory allocated for the old line buffer. The value field may optionally be enclosed in double-quotes. If any command line contains a dollar-sign followed by the name of a variable, then the value of the variable is substituted for its name -- this includes a possible substitution at the beginning of the command line where the command name would appear. When this substitution phase is being performed, the name of a variable within a command line is terminated by a non-alphabetic and non-numeric character. It is an error to reference a variable which is not set. You should also implement the shell command "unset" which is invoked in the format "unset " to remove a variable from being defined in the shell. Last revised 6-Feb-24