Problem Set 1

		 Harvard Extension School CSCI E-92:
		   Principles of Operating Systems

			     Spring 2024

	 Due: February 11, 2024 at Midnight ET (Eastern Time)


Total of 100 Points

As described in the syllabus, submit the solution to all problems in
this Problem Set using "git" with named branch problem-set-1.

1. (5 Points) Multiprogramming.  What is multiprogramming?  Tanenbaum
   5/e 1.2 (Also in Tanenbaum 3/e 1.1, Tanenbaum 2/e 1.2).

   In additon, give at least two reasons for having multiprogramming.

2. (5 Points) Protected kernel mode operations.  Which of the
   following instructions should be allowed only in kernel mode?
   (a) Disable all interrupts.
   (b) Read the time-of-day clock.
   (c) Set the time-of-day clock.
   (d) Change the memory map.
   (Also in Tanenbaum 4/e 1.12, Tanenbaum 3/e 1.7, Tanenbaum 2/e 1.8).

   In addition, for each subsection a-d, describe what problems would
   be caused by allowing the instruction to run when not in kernel
   mode.

3. (5 Points) Priority inversion problem with priority scheduling
   vs. round-robin scheduling.  In Sec. 2.4.4 (Tanenbaum 4/e
   Sec. 2.3.4), a situation with a high-priority process, H, and a
   low-priority process, L, was described, which led to H looping
   forever.  Does the same problem occur if round-robin scheduling is
   used instead of priority scheduling?  Discuss.  Tanenbaum 5/e 2.25
   (Tanenbaum 4/e 2.26, Tanenbaum 3/e 2.20, Tanenbaum 2/e 2.26)

4. (5 Points) Implementing semaphores using a disable and enable
   interrupts instructions.  How could an operating system that can
   disable interrupts implement semaphores?

   Your solution should be a pseudo-code implementation of the up and
   down semaphore calls that achieves atomicity using di (disable
   interrupts) and ei (enable interrupts).  There is no need to deal
   with managing resources.  Your implementation of up and down cannot
   invoke any user code that is external to the semaphore.  Keep in
   mind that when interrupts are disabled, the scheduler interrupt
   (that is able to switch context to run a different process) will
   not be handled but will be remembered as a pending interrupt.  Once
   interrupts are re-enabled, pending interrupts (including the
   scheduler interrupt) are then handled.  Tanenbaum 4/e 2.31
   (Tanenbaum 3/e 2.25, Tanenbaum 2/e 2.23).

5. (10 Points) Implementing counting semaphores using binary
   semaphores.  Show how counting semaphores (i.e., semaphores that
   can hold an arbitrary value) can be implemented using only binary
   semaphores and ordinary machine instructions.  Tanenbaum 5/e 2.31
   (Tanenbaum 4/e 2.32, Tanenbaum 3/e 2.26, Tanenbaum 2/e 2.24).

   You must include pseudo-code for your solution.  By "ordinary
   machine instructions," we mean all facilities that are available
   through the C Programming Language without using library functions,
   for example.  Your solution should use binary semaphores wherever
   mutual exclusion is needed.

6. (70 Points) Simple Shell Implementation

   Using the C Programming Language on the cscie92.dce.harvard.edu
   instance, implement a simple shell.  Your program should interact
   with the user through stdin, stdout, and stderr.  After outputting
   to stdout a prompt of "$ ", it will accept a line of text input
   from stdin and parse that line into white space delimited fields.
   You must be able to accept lines with up to 256 characters (not
   including any newline or null termination).  White space will be
   composed of one or more spaces or tabs.  From the input line, an
   integer named "argc" and an array named "argv" will created.  The
   integer argc will contain the count of the number of white space
   separated fields found in the input line.  Any white space before
   the first field must also be ignored.  The array argv will contain
   a list of pointers to copies of each of the fields found in the
   input line as null-terminated strings.  The C Standard also
   requires that the argv array contains a final NULL pointer (i.e.,
   argv[argc] is always NULL).

   A declaration of argc and argv follows:

   int argc;
   char **argv;    or, depending on the context:    char *argv[];

   Here is an example.  If the input were to contain:

   echo     this  is some     input   string

   then argc would contain 6 and argv would be an array of seven
   pointers; the first six would point to null terminated strings and
   the seventh pointer would be NULL.  The first pointer would point
   to the string "echo", the second pointer would point to the string
   "this", and so forth.  Note that the argv array needs to be
   dynamically allocated and that each of the strings pointed to by
   each entry in argv needs to be dynamically allocated.  The strings
   pointed to by argv will never contain any spaces or tabs (except if
   either or both of the double-quoting or backslash escape character
   notation extra-credit portions below are implemented).  The space
   allocated for argv should be exactly the required size (i.e., it
   should occupy (argc+1)*sizeof(char *) bytes).  Similarly, the space
   allocated for each string should be exactly the required size
   (i.e., each string should occupy the number of bytes in the string
   + 1 for a null terminating byte).  Storage should be allocated
   using the malloc system call.

   After argc and argv are created, the shell will scan an array named
   "commands" for a match for the first string in argv and will then
   call a different function for each entry in "commands".  That is,
   we will assume that the first word on the line in the name of a
   program to be invoked by the shell.

   A declaration of commands with the initialization of five commands
   as follows:

   int cmd_date(int argc, char *argv[]);
   int cmd_echo(int argc, char *argv[]);
   int cmd_exit(int argc, char *argv[]);
   int cmd_help(int argc, char *argv[]);
   int cmd_clockdate(int argc, char *argv[]);

   struct commandEntry {
     char *name;
     int (*functionp)(int argc, char *argv[]);
   } commands[] = {{"date", cmd_date},
                   {"echo", cmd_echo},
                   {"exit", cmd_exit},
                   {"help", cmd_help},
                   {"clockdate", cmd_clockdate}};

   After a command returns to the shell, all storage allocated for
   argv and the strings that argv points to should be freed.  After
   the storage is freed, the shell should loop back to prompt the user
   for another input line.

   You should implement five commands.  (1) "exit" will exit from the
   shell (i.e., cause the shell to terminate) by calling the exit
   system call.  (2) "echo" will output each of the arguments in argv
   (except for the first) to stdout with a single space character
   between the arguments.  After the last argument is output, a
   newline should be output.  (3) "help" will output to stdout a brief
   description of the commands accepted by the shell.  (4) "date" will
   output to stdout the current date and time in the format "January
   23, 2014 15:57:07.123456".  "date" will call the POSIX system call
   "gettimeofday" to determine the time and date.  "gettimeofday"
   returns the number of seconds and microseconds since midnight (zero
   hours) on January 1, 1970 -- this time is referred to as the Unix
   Epoch.  To be clear, no dates can be specified before January 1,
   1970, i.e., there are no negative numbers of seconds and/or
   microseconds.  Your output should be printed in the same timezone
   as that returned by the system call "gettimeofday".  That is, the
   timezone information from gettimeofday should be ignored.  (5)
   Finally, "clockdate" will take a single 64-bit positive integral
   number (after being parsed it will be stored in a variable of type
   time_t which is declared in system header file sys/time.h -- this
   is equivalent to a uint64_t) as its required argument: that
   argument will represent the number of seconds since the Unix Epoch.
   When calling your function to create a formatted date, you should
   pass the number of microseconds since the Unix Epoch as zero.  The
   "clockdate" command will output to stdout the date represented by
   the argument integer in the format described in (4) above.  The
   "clockdate" command is present to facilitate our ability to test
   your conversion from seconds and microseconds since the Unix Epoch
   to a printable human-readable string.

   Each command should return an integer to indicate if it succeeded
   or failed and a specific error code on failure.  A value of 0 will
   indicate success and a non-zero value will indicate failure.  You
   should create your own enumerated type of error codes for different
   errors.  If the return value is non-zero, the shell should call a
   function that returns a pointer to a string that contains a
   meaningful textual message for the error code (the behavior of this
   function should be similar to how strerror works in Unix).  Then,
   the shell should output this and any other error messages to
   stderr.

   Each command should check that the appropriate number of arguments
   are specified on the command line.  If no arguments are appropriate
   for a command, then command should guarantee that no arguments have
   been specified.  The "echo" command accepts any number of arguments
   (including none), so no argument count checking is required.

   Your translation from seconds and microseconds since zero hours on
   January 1, 1970 into year, month, day, hours, minutes, seconds, and
   millionths of seconds must be written from scratch using no system
   functions.  Keep in mind that some years are leap years and others
   are not.  Leap years contain 366 days (February 29th) and all other
   years contain 365 days.  Every year that is evenly divisible by
   four is a leap year, except that every year divisible by 100 is not
   a leap year, except that every year divisible by 400 is a leap
   year.

   Your code may use only the following systems calls or library
   functions:
     malloc		free
     exit
     fgetc		feof
     ferror
     fputc		fputs		fprintf
     sprintf		snprintf
     vsprintf		vsnprintf
     gettimeofday
   You are also welcome to use the following library functions
   declared in string.h:
     memcpy		memmove		memcmp		memset
     memchr
     strcpy		strncpy		strcat		strncat
     strcmp		strncmp
     strchr
     strcspn
     strpbrk
     strrchr
     strspn
     strstr
     strerror
     strlen
     strtol		strtoul		strtoll		strtoull     
   You may use only stdin, stdout, and stderr as I/O streams.

   (5 Points) Extra credit to be used for the programming portion
   only: Allow fields on the line to be double-quote delimited strings
   that can contain spaces or tabs.  Ensure that any such field has a
   matching open double-quote and close double-quote.  Also, allow a
   double-quote to appear within a double-quoted field in either or
   both of two possible ways: (1) allow two adjacent double-quotes
   within a double-quoted field to denote a single double-quote within
   the field, (2) implement the following backslash escape notation to
   allow special characters within a double-quoted string.

   (5 Points) Extra credit to be used for the programming portion
   only: Implement special backslash escape characters for shell
   command arguments (either outside or inside double-quoted string
   arguments).  If this is implemented, then the backslash character
   would serve as a prefix for the next character.  These two
   characters would then be replaced with a designated single
   character in the argument.  The character after the backslash would
   indicate what character would be designated to be used in the
   argument as follows:

     Escape sequence	Designated replacement character
     \0			null (ASCII 0)
     \a			alarm (bell) (control-g) (ASCII 7)
     \b			backspace (control-h) (ASCII 8)
     \e			escape (ASCII 27 = 0x1b)
     \f			form-feed (control-l) (ASCII 12 = 0xc)
     \n			newline (line-feed) (control-j) (ASCII 10 = 0xa)
     \r			carriage-return (control-m) (ASCII 13 = 0xd)
     \t			horizontal-tab (control-i) (ASCII 9)
     \v			vertical-tab (control-k) (ASCII 11 = 0xb)

   If the character following the backslash is not shown in the table
   above, then the character following the backslash should be used
   literally.  Obviously, this would include the double-quote and
   backslash characters.

   (Points Vary) Extra credit to be used for the programming portion
   only: Add additional built-in commands to your shell.  None of the
   commands you add should require use of files.

   (20 Points) Extra credit to be used for the programming portion
   only: Add variables to your shell.  You should have a "set" command
   to define a variable and to output all defined environment
   variables.  If the "set" command is invoked with no arguments, it
   will display the values of all shell variables.  If the "set"
   command is invoked in the format "set <variable>=<value>", then the
   shell variable <variable> is defined with the value <value> as a
   string.  Variable names may consist of letters and digits starting
   with a letter.  In variable names, the underscore character is
   treated as a letter.

   You should not impose any constraints on the length of either
   variable names or of their values.  However, if the substitution of
   the <value> for the <variable> results in a command line that
   exceeds the maximum length that you accept, you should take a
   reasonable action that ensures the correctness of your program.
   For example, you might emit an error message and truncate the
   command line.  Even though this assignment states that, "you must
   be able to accept lines with up to 256 characters..." a better
   solution might accept arbitrary length lines by expanding the line
   buffer as necessary.  A common way to accomplish this is to double
   the size of the line buffer whenever additional space is needed,
   copy the existing line buffer into the new one, and then free the
   memory allocated for the old line buffer.

   The value field may optionally be enclosed in double-quotes.  If
   any command line contains a dollar-sign followed by the name of a
   variable, then the value of the variable is substituted for its
   name -- this includes a possible substitution at the beginning of
   the command line where the command name would appear.  When this
   substitution phase is being performed, the name of a variable
   within a command line is terminated by a non-alphabetic and
   non-numeric character.  It is an error to reference a variable
   which is not set.  You should also implement the shell command
   "unset" which is invoked in the format "unset <variable>" to remove
   a variable from being defined in the shell.


			Last revised 6-Feb-24