Performance Comparison of USB 2.0 vs. SATA II

Analysis of Non-Corresponding Measurements

 

Christopher L Griffis

Embry-Riddle Aeronautical University

 

Project Summary

 

Experimental Design

 

Analysis of Data

 

The Automation Software

 

The Spreadsheet

 

The Report

 

 

Experimental Design

 

Discussion of the experimental design is broken into four parts. First, the overall experiment strategy is presented, followed by the measurement methodology. Next, the insights discovered during initial runs are explained, followed by an explanation of outliers and replication handling.

 

This experimentation was a custom, self-developed approach to investigate the validity of the hypothesis that SATA II performs better than USB 2.0 when transferring data files across a physical connection. Table 1 gives an overview of the strategy used to meet the objective of using a formalized, statistically sound means for confirming the hypothesis.

 

Table 1: Overall Experiment Strategy

 

  1. Understand the basics of statistical methods

 

  1. Identify a topic of interest

 

  1. Conceive an experiment to compare aspects within the topic of interest

 

  1. Investigate the feasibility of the experiment, assessing sources of variation and error

 

  1. Determine if those sources of variation and error are manageable enough to perform the experiment in anticipation of “good” results

 

  1. Define an experiment hypothesis

 

  1. Begin the documentation strategy

 

  1. Execute some preliminary trials to rule out “no-go“ situations

 

  1. Document preliminary efforts.

 

  1. Setup general guidelines for conducting the formal experiment.

 

  1. Determine the informal requirements for the automation software

 

  1. Develop the automation software

 

  1. Test the automation software ensuring that the file is transferred completely and correctly, etc

 

  1. Formalize the experimental setup

 

  1. Run an arbitrary number of runs to get a preliminary std dev and mean for official determination of n replications

 

  1. Calculate the needed number of trials to get the replication mean within a desired error margin

 

  1. Begin the official experiment: execute the automation software to collect data measurements

 

  1. Copy the raw data into working files

 

  1. Process the data in the working files to determine statistical results

 

  1. Clean up the analysis in the working files for presentation of results

 

  1. Prepare final documentation, encapsulating the salient information for explaining and reproducing the experiment into a single document.

 

  1. Develop web page to present the results

 

  1. Develop a presentation to orate the results

 

 

Described next are the details of the measurement methodology. First presented are the important working assumptions used in conducting the experiment. Next, the setup of the System under Test is described, followed by a description of the automation software. Finally, the actual experiment setup is explained.

 

Table 2 presents a list of the many assumptions made with regard to the experimental design.

 

Table 2: Experimental Design Assumptions

 

·    The host computer has an internal SATA hard drive that is assumed to be able to read and provide data continuously and fast enough to saturate the bandwidth of at least the slower of the two databuses. Otherwise, both buses will appear to perform identically. (repeated from Error! Reference source not found.)

 

·    Manually transferring the files systematically introduces uncertainty and error into measurements, increasing variation. It is assumed that systematic errors will be reduced by using a software automated file transfer approach.

 

·    The measured time will always be slightly larger than the actual measured time, due to unavoidable systematic errors resulting from the overhead of the transfer runtime entities. It is assumed that the overhead of the transfer mechanism is small enough relative to the time to transfer the file that it will appear statistically insignificant.

 

 

The two files to be transferred are identified ahead of time, and placed in a folder called “source” in the root directory of the host computer. A similar folder called “target” will be the home of the copied file image. For all trials of a particular file size, both bus connections must be tested. This requires that the first bus be identified and physically connect the host computer to the target drive. Once this happens, a few preliminary settings must be made, and then the automation software can be run.

 

The system under test is running the Windows XP operating system, with the file transfer automation software running via the Java Runtime Environment. Prior to the execution of any experimental trials, all other programs are closed, and the network controller is turned off. This will reduce the likelihood of external activity in the operating system introducing unrelated delays when a file is being transferred (and thereby producing an outlier time measurement). Moreover, to improve the consistency of the measurements, the Windows Task Manager is used to set the java.exe process to “real-time priority” before the transfer automation begins.

 

Table 3 provides a summary step-by-step description of the process used when collecting measurement replications on a particular file size and data bus configuration. After a configuration is tested, the bus connection needs to be swapped with the alternate bus connection and the process repeated.

 

Table 3: Step-by-Step Experimental Process for a Single Trial Run

 

  1. Set up the physical bus connection for the current test configuration.

 

  1. Make sure the network controller of the host system is disabled.

 

  1. Make sure the screensaver and system power-saving features are disabled.

 

  1. Ensure that the source file has been identified and is residing the appropriate source folder

 

  1. Start the automation software program. A dialog box will appear, and the java.exe process will appear in the Windows Task Manger.

 

  1. In the Windows Task Manager, set the java.exe process to “real-time priority.”

 

  1. From “My Computer,” click the icon for the external drive to stimulate it and make sure it is “spinning” (as opposed to idle)—the spin up time will cause a significant delay in the first measured time, producing an outlier. Close the window that opens up after clicking the link to the external disk.

 

  1. Enter in the current trial conditions into the automation software, choosing the source file and the target file to be overwritten.

 

  1. Periodically monitor the measured times echoed in the command window (depicted in Figure 1) to make sure that they are within a reasonable spread from each other. Wild values may indicate one of the earlier experimental procedures was overlooked.

 

  1. Begin the automated file transfer process, and do not “touch” the computer during this time. This will reduce the likelihood of outlier-producing interruptions.

 

  1. When the transfer process is complete, review the raw data log file to make sure the values were stored correctly, and appear reasonable (otherwise, the test trial may need to be repeated).

 

 

Figure 1: Experiment in Progress

 

When the experiment was first being developed, attempts at transferring files were made with much smaller files, e.g. 4 MB. Because of the speed of the buses data transferring capabilities, each transfer would occur almost instantaneously, making the time between requests very close to each other. This caused tremendous variation in the times measured (attributed to externally introduced delays; i.e. the files were transferring to quickly for the system to keep up with itself in a consistent, reproducible manner. It was later found that using files that took at least a couple of seconds to transfer were far better subjects. This resulted in the smaller file size choice of 80 MB.

 

The next step was to determine the official number of trials that will be run for each file transfer configuration. The equation ((z*s)/(e*d))2 (where ‘z’ is the value of the normal distribution for a given CI, ‘s’ is the standard deviation, ‘e’ is the error, and ‘d’ is the mean) will give an indication of the number of trials needed to get a mean value that has a certain confidence of being with 2e% of the true mean. However, use of this equation requires some preliminary estimate for the mean and standard deviation. An initial run using an arbitrary 20 replications resulted in the estimated number of needed replications to be 186. This number, set to give 95% confidence that the true mean is within 3% of the true mean, was rounded to 200 for the purposes of the experiment.

 

Another interesting observation made early on in the experimental design is related to the size of the buffer used in the file transfer automation software. The actual file transfer aspect of the automation code was borrowed from an example off the internet. This example, available at

http://www.java2s.com/Code/Java/File-Input-Output/CopyfilesusingJavaIOAPI.htm

uses a value of 4096 bytes for its data buffer. This ostensibly arbitrary value lead to a curiosity about how it may affect the transfer efficiency, and consequently, other buffer sizes were explored. The number “4096” is obviously an integer power of two, so the other sizes considered were also integer powers of two.

 

For a quick observation, 100 runs for a fixed file size of byte array sizes ranging from two-to-the-power-of 12 through 19 were performed. Based on this “seat-of-pants” check, it was discovered for both USB and SATA, the preferred byte array size was 215, and this was used as the official buffer size for the experiment. Figure 2 and Figure 3 show the summary results of this buffer size comparison. The values are the transfer time in milliseconds, and the size of the file transferred is unknown. However, it is known that the same file was used in every trial for these runs. It should be noted that these findings are somewhat informal, and are presented merely for completeness sake. An additional formal experiment may explore this quirk further.

 

Figure 2: USB 2.0 Performance as Related to Transfer Buffer Size

 

Figure 3: SATA II Performance as Related to Transfer Buffer Size

 

Each run (per file size and bus connection configuration) of the official experimental trials produced 200 replicated measurements of the file transfer time. It was observed that most of the values measured were identical or very near their neighboring values. On occasion, for unknown reasons, a transfer time would appear as almost triple the amount of time of its neighbors. These outliers have considerable impact, because the variation of the mean is affected by the square of the amount of the difference in the outlier measurement as compared to the mean. Viewing the 200 replications in a descending value order showed a clear and obvious disparity between the outliers and the “good” values; one would observe over 150 values very near each other, topped by three or four values that are nearly twice the magnitude of the “good” values. This very obvious discontinuity made it easy to identify the “cutoff point” of where to remove the outliers. The outliers tended to be greater than the “good” values, however on occasion there were some transfer times significantly (and curiously) lower the neighboring values.

 

For all trials, there were no more than four outliers above or below the good values, so a uniform outlier removal policy was applied to all raw data. All data was sorted in descending order and the top and bottom five values were deleted, resulting in 190 replications per run for analysis (it should be noted that because the data values are non-corresponding, descend-order sorting the data does not corrupt the experimental results). This outlier removal policy was highly effective in reducing the variation of the results. To elaborate, consider the improvement using the coefficient of variation as an indicator. The (COV) is a dimensionless measure of the variation as it relates to the mean—the lower the COV, the better. The implemented outlier removal policy reduced the COV’s of the measurements from about 12% down to about 3%.

 

Project Summary

 

Experimental Design

 

Analysis of Data

 

The Automation Software

 

The Spreadsheet

 

The Report