Experiment Design

The method of obtaining data in this experiment was measurement. This was due to the fact that the software to be analyzed was available, and non-extensive modification was possible in order to provide a setting for measurements to be taken. In addition, measurements would provide the most accurate results, simply by virtue of being based on the actual software platform, rather than a model of the platform.

The two main factors the study is interested in are type of communication used and the number of clients performing the communication. The effects of these two factors would be measured with regards to round-trip time and the amount of data sent over the network during the communication. Round-trip time was chosen because of the idea that this measurement would be more representative of the entire process to be analyzed. In addition, this method avoids the issue of the difference in client and server clocks for recording timestamps. The amount of data transmitted is a secondary measure that have been measured using a network packet capture tool separately from the timed runs. This measure gives insight into bandwidth considerations for each method.

The method chosen for measuring the performance of the system was to insert timing statements into the actual client code itself. Two simple timing statements were added: one right before data was prepared to be sent, and one right after the response was received and ready to be used. The average overhead associated with the timing statements was determined.

Metrics

Two metrics are considered:

  • Round-trip time
  • Size of transmission

The main metric to compute the system's performance is time, in milliseconds, for one set of data to be sent from the client to the server, and back. This metric includes:
  • Time to prepare the data to be sent by the client (serialization or going through each data item)
  • Network travel time to server
  • Time to fully receive the data at the server
  • Time to prepare data to reply to client
  • Network travel time to client
  • Time to fully receive the data at the client

The size of the transmission is a secondary, but important metric. This information is very important, as when dealing with many clients who make small, frequent requests, we must worry about bandwidth usage.

Factors and Parameters

Two factors, each with different levels, are considered:

  • Type of transmission
    • Serialization and ObjectOutputStream
    • Non-serialized with DataOutputStream
  • Number of simultaneous connections
    • 1
    • 5
    • 10

The parameters to be held constant are:
  • Size of the data to be sent
    • 6 floats, 1 boolean
  • Network latency
    • Closed LAN with only client and server connected
  • Data transmission rate.
    • Once every 50 ms per client (similar to an average application use by the framework)

Tools Used

  • Java currentTimeMillis(). Average overhead measured over 1 million measurements: 0.002657 milliseconds.
    • Measuring time
  • Ethereal (http://www.ethereal.com)
    • Measuring transmission size
  • Microsoft Excel
    • Recording measurements
    • Performing ANOVA
    • Performing calculations on data

Initial Runs

A set of 100 runs was initially performed in order to assist in determining, with 95% confidence, how many runs are necessary to ensure we are within a 7% margin of error. The following table aggregates the results of the calculations (see report for full description).

Number of Measurements Needed for 95% Confidence
Number of Simultaneous ConnectionsSerialNon-Serial
1697.7717.5
5761.08626.28
10671.76397.58

The resulting minimum number of runs needed, 762, was rounded up to 1000 to provide a number of data values that could be removed if outliers must be handled by removing data points.

Outliers and Replication Handling

The effects of outliers was analyzed and taken into consideration based on the sample standard deviations and coefficients of variation (COV). These values were computed for the sample sets, and after sorting the data, computed after removing a number of extreme values. These extreme values were values that were one standard deviation above and below the mean for the fastest times. For example, if the set with the lowest numbers begins having 4ms response times, we removed all data before that point for each of the sets, and all data after 17ms for the quickest set.

After removal of these values, the effects on the COV and standard deviations was determined to be almost insignificant. In addition, these removals did not affect the speedup (before/after) in any significant manner. Therefore, these values were left in the data as extremes for server response time, assuming they are part of normal system operation.