A new method for comparing telerobots for operator preference


A new method for comparing telerobots for operator preference

This chapter begins with an examination of the methods available for comparing telerobots and discusses their applicability to web telerobots. This is followed by a proposal for a new method for comparing telerobots which seeks to measure operator satisfaction.

One method for characterising operator satisfaction is to measure the number of requests to a telerobot in a session. Sessions, however, can vary, for example a session may involve gaining control of the telerobot and few or even no further requests, or a session might last up to five or more hours and contain many requests. By comparing the number of requests, preferences for different web telerobots can be measured. This takes advantage of the large data sets that are acquired with a web telerobot by monitoring everyday use. Alternative methods for comparing telerobots, for example task completion times, require the cooperation of operators to conduct experiments.

Distributions provide a means of summarising data and a model that will produce a particular distribution is known, so that if the data is found to match a known distribution a useful model is available for interpreting the data. The number of requests per session for all of the telerobots was found to fit a Weibull distribution.

The properties of a Weibull distribution and a model that will produce a Weibull distribution are explained in section 6.2. Section 6.3 describes several other social phenomena that can be fitted by a Weibull distribution. A Weibull distribution is a special case of the generalised gamma distribution and in section 6.4 it is established that Weibull is a better fit for the number of requests per session than any other cases. In section 6.5 a Weibull distribution is fitted to three different telerobot/interface combinations. The chi-square goodness of fit test is used to conclude that, to a satisfactory level of significance, the data can be fitted by a Weibull distribution.

Once the distribution type that applies is known, requests per session can then be summarised by the distribution type and distribution parameters, which number two for a Weibull distribution. Where one of the parameters is similar, the comparison between telerobots can be reduced to a single parameter that measures operator preference. In section 6.6 the telerobots are compared and the correlation coefficient is used to test the quality of the comparison.


Existing methods for comparing telerobots

This section contains a list of methods that researchers have used to compare telerobots and measure performance. This is followed by a brief discussion of the applicability of each method for web telerobots. Methods available for comparing are:-

  1. Time to complete a task. This is the common method used by Drascic et al (1989), Hannaford and Wood (1991), Draper et al (1986), Stark (1987), Mclean et al (1994) and Uebel et al (1994) to measure system performance and provide comparisons between systems. It requires a controlled experiment where a number of operators perform the same task. The experimenter then varies either an aspect of the task or an aspect of the interface and asks operators to repeat the task. This method works well in a controlled laboratory environment but cannot be used to monitor ongoing system performance as operators will be performing varying tasks. It also suffers from the disadvantage that it is difficult to define a task that provides a valid comparison. Frequently by varying the task used to compare two systems, the conclusion as to which is the superior can be altered. For example if the task to manoeuvre an object between obstacles without colliding with them is used to compare a supervisory control approach, where an operator specifies target locations and the local controller generates the path, with manual control with video feedback, the results can be changed by altering the distance between the obstacles. If the space between the obstacles is large, small inaccuracies in the path will not matter and manual control will allow the object to be manoeuvred between the obstacles quickly. If however, the clearance between the object and obstacles is sufficiently small it may be very difficult to manoeuvre between the obstacles under manual control and the high path precision achieved by the local controller will allow the task to be completed much more quickly with supervisory control.
  2. Fitts’ Law. A method adopted in several studies including Drascic et al (1986), Cannon (1994) is Fitts’ Law, which relates movement time to a relationship between the distance travelled in reaching the target and the size of the target. It can be given as:- where the values of a and b are found experimentally. Cannon(1994) applies control theory to predict from these values the machine dynamics of the system, human reaction time and neuromuscular lag.
  3. Operator Subjective Assessment. Mclean et al (1994) found this to be closely correlated to the more generally used time to complete a task. Operators are asked to complete a task as quickly as possible and then rate the task in comparison to a separate baseline task which does not involve teleoperation. The baseline task is to manipulate graphic objects with a mouse and the baseline task minimises variability between operators.
  4. Average joint velocity. The period over which the velocity is averaged is the entire session for an operator. The rationale for this is that good operation will consist of purposeful motions, therefore, the more movement, the better. It has the advantage of no defined task being required as it is possible to acquire the data while the telerobot is in general use. However, it offers no predictive value, in that the effect of a change can not be predicted prior to its implementation. Average joint velocity was proposed by Mclean, et al (1994) who found it closely correlated to the time to complete a task.
  5. Average tip velocity. This has been proposed by Mclean et al (1994) and is based on the same rationale as (iv) but the measure is in Cartesian space rather than joint space. Mclean et al found average tip velocity to give similar results to (iv).
  6. Dexterity. This is another measure proposed and tested by McClean et al (1994) and is based on the assumption that good performance is associated with the manipulator being maintained in a state from which a variety of motions can be easily achieved. That is, that the manipulator is best operated away from singularities near which high joint rate to task velocity ratios occur. The dexterity is based on the determinant of the Jacobean. This measure of performance was found by Mclean, et al (1994) to be less well correlated to time to complete a task than the methods discussed in (iv) and (v).
  7. Sum of squares of the contact forces (SSOF). This measure has beens used by Uebel et al (1994) when assessing the effect of bandwidth on a peg insertion task. The contact forces refer to the contact forces between the peg and environment. It has also been used by Hannaford and Wood (1991). SSOF is suitable when interacting with the environment and is given by: - where represents the ith sample of the force component along the jth axis of the end effector coordinate system, n is the total number of samples and is the sampling period. Uebel et al (1994) were able to achieve similar results using this measure of performance as were achieved with the “time to complete a task” method.
  8. Sum of the squares of the moments. This measure is similar to that described in (vii) but the moments applied to a peg during a peg insertion task are measured rather than the axial forces. It has been used Uebel et al (1994) with results similar to those described for method (vii). It has also been used by Hannaford and Wood (1991).

Having discussed the methods available to compare telerobots and measure performance the applicability of each to web telerobotics is considered below. The applicability of each of the methods for web telerobotics is:-

  1. Time to complete a task. This is suitable for structured experiments but it is difficult to coerce the internet population to do act in any particular way.
  2. Fitts’ Law. This model is restricted to manipulative tasks on which the theory is based and for systems where a human is performing rate and position control with constant and short communication delays. With supervisory control an operator specifies a goal rather than controlling rates of movement. Then the local control system, controls the speed of the robot as it attempts to achieve the goal. Therefore the model does not apply to supervisory control systems.
  3. Operator Subjective Assessment. This is suitable for web telerobotics. Operator subjective assessment has been used to guide much of the development of the interface on the telerobots built but the comparison has been limited to different forms of the interface rather than to a base line task.
  4. Average joint velocity. This may be useful for web telerobots. It has the advantage that it can be used with many operators who all have different goals and whose goals are unknown. It is unclear whether it would be superior to measuring the time between operator requests as discussed in section 5.10.
  5. Average tip velocity. The applicability is similar to (iv).
  6. Dexterity. This was found by Mclean et al (1994) to be less well correlated to “time to complete a task” than other methods. It is likely to be less useful in the case of web telerobots due to a lower operator awareness of joint positions than in the Mclean, et al (1994) installation.
  7. Sum of squares of the contact forces (SSOF). The technique is limited to tasks of the peg in-hole-type. It is not suitable for the supervisory control approach employed by web telerobots as the operator specifies goals and direct interaction with the environment is entirely under machine control.
  8. Sum of the squares of the moments. As for method (vii) it is not suitable for web telerobots.

Properties of a Weibull distribution

A Weibull distribution is a two-parameter statistical distribution that is commonly used in reliability engineering (Walpole and Myers 1972:133). Indow (1995) describes a model that produces a Weibull distribution. This model can be applied to reliability engineering or adapted to the requests per session with a telerobot as described below.

Suppose that there is a set of conditions which is infinite and each of the conditions can lead to an event and, when one condition is fulfilled, then the event takes place and that each of the conditions is identically distributed. These conditions are said to be infinite and identically distributed (i.i.d) and the affect of violating these conditions is considered later. In reliability engineering, the event is the failure of a machine, which occurs because of the failure of one of many possible components. In our case, the event is to end the current session operating the telerobot for one of many possible reasons. Then as explained by Indow (1995) it turns out that the distribution of the event does not depend on the distribution of the conditions and that in terms of the latent variable x:-

This is a Weibull distribution, which represents the probability of the event occurring after the stress x is applied. For reliability engineering the event is the failure of a machine and the stress is the time the machine has been in service. For telerobots the event is ending the current session and the stress is the number of requests made to the telerobot.

The expectation, density function, and variance of FW(x) are:-

The form of the density function drastically changes according to whether  < 1,  = 1 or  > 1 as shown in Figure 77.

Three plots with parameter =1showing how the form of the density function changes according to whether  < 1,  = 1 or  > 1.
Figure 77

The shape of is defined by the shape parameter  and the other parameter  defines the scale of so that there is a value q such that:- For  = 1 takes the form of exponential decay. In reliability engineering this corresponds to a machine that fails entirely at random with no dependence on the time the machine has been in service. A machine with components subject to wear is more likely to fail as its time in service is increased this corresponds to the case of  > 1. Electronic components are most likely to fail during burn in with the instantaneous failure rate decreasing as their time in service increases so a machine with these components will have a value of  < 1.

In reliability engineering, x is equated to time (t) and, in the case of a telerobot x is equated to the number of requests made by an operator to the telerobot in a session (s). Setting m and  as representation in the s dimension of  and  in the x dimension gives the Weibull relationship:-

For the telerobot, reliability engineering and in many other cases it is not realistic to assume the (i.i.d) condition. That is a machine is not made of an infinite number of components and different components will have differing failure distributions. For a telerobot there is also not an infinite number of reasons to stop and the reasons will not be identically distributed. Indow (1995:19) has explored what occurs when the (i.i.d) condition is breached. Referring to the number of conditions as N, each of the conditions as  and the event as Z. The first question asked by Indow is how large should N be in order that events from a homogenous parent population of non-Weibull form is approximated by FW(x;,). The worst case tested was that of a parent population with a rectangular distribution of  ie . It was found that at N = 2 or 3, a fit of FW(x) becomes acceptable and at N = 4 almost perfect. In the case of a telerobot, this indicates that where there is 4 or more conditions under which a person might cease making requests to the telerobot and that the probability of these conditions occurring is identically distributed then the requirements for a Weibull distribution of the requests per session statistic are not violated. Even with as few as two conditions a distribution close to Weibull is possible.

The situation of non-identical or heterogeneous distributions of the parent population, that is (1) is not identically distributed to (2), is more complex, as explained by Indow (1995:23). In general, where the distributions are far apart, the situation becomes equivalent to a homogenous parent population of the left most distribution (). This is because FW(x;,) depends on the extreme statistic, that is the min(1, 2,... n). This is illustrated in Figure 78.

Weibull plots of Z’s from a heterogeneous parent population consisting of two groups far apart. From Indow 1995 (Indow 1995:23)
Figure 78

The top graph shows two heterogeneous parent populations of weibull distribution that have very little overlap. Because the parent distributions are also Weibull, the result is completely defined by a single distribution of each type i.e Na=Nb=1 (see Indow for reasons) (Indow 1995:22). The parent population is heterogeneous, but the event population is homogeneous being . The bottom graph in Figure 78 shows the plot of the expected value of the event population i.e. E(Z(1)n). The scales on the graph are chosen to produce a linear relationship for a Weibull distribution. Fitting FW(x;,) to E(Z(1)n) gives the estimated values , which indicates that the series is well approximated by the left most distribution . Where the two parent populations are closer the resulting distribution can vary from Weibull, as discussed in Indow (1995:26).

Another possibility is a heterogeneous event population. A heterogeneous event population consists of two different densities, and . That is, the density at each x is defined as:- This occurs when the results of two different situations are mixed. The example given by Indow is that of remembering words where one group of words is much easier to remember than another and the results from both groups are mixed. Figure 79 shows the heterogeneous event population for the same two distributions as Figure 78 but here the event population is mixed according to Equation 16 (where p = 0.5). Unlike the case of the mixed parent population, illustrated in Figure 78, the heterogeneous event population can not be approximated by one of the groups and varies from the straight line that would be seen for a Weibull distribution. Indow (1995:28) shows this shape to be a common form for heterogeneous event populations.

Weibull plots of Zs from a heterogeneous event population consisting of two groups far apart. (Indow 1995:22)
Figure 79

Social phenomena and the Weibull distribution

The Weibull distribution has been discussed in relation to reliability engineering as that is the context in which it is most familiar to engineers. However it has also found application in explaining social phenomena and this is relevant to an examination of telerobot operator behaviour. Indow (1993) and others have been able to apply Weibull distributions to a remarkably large set of social phenomena.

One example where the Weibull distribution applies is the length of conflicts where Horvath (1992) has shown that the duration of wars and strikes can be fitted by a simple Weibull distribution. He has fitted a Weibull distribution to the duration of 315 wars that took place from 1820 to 1949 and to the duration of the 3317 strikes settled in 1961 in the USA as shown in Figure 80.

Horvath showed the duration of wars in the world from 1820 to 1949 and the duration of strikes in the USA settled in 1961 were well fitted by a simple Weibull distribution (Indow 1993).
Figure 80

In both cases the instantaneous conflict-settling rate drops sharply in the beginning and remains almost constant afterward. Once a conflict is not resolved within its initial phase (a week for a strike and two years for war), it is likely to continue for some time before resolution. Similarly with the telerobot, most operators give up after a few requests but those that make more than a few requests are likely to make more requests.

Indow (1993) presents some other social phenomena where a Weibull distribution was found to apply. These include short and long term memory. Long term memory is measured by asking subjects to recall words belonging to a specific category, which was presidents of the United States in the case examined, and timing how quickly these words are recalled. The first few words are recalled quickly but the rate of recall decreases as the number recalled increases. Short term memory is measured by asking subjects to memorise a list of 20 unrelated nouns and counting the number recalled after various time intervals. Indow (1993) also shows that in advertising the likelihood of remembering a brand name or making a purchase after the commencement of an advertising campaign can be fitted with a Weibull distribution in some circumstances as can the length of courtship before engagement (Indow 1993) and the length of marriage before divorce (Indow 1993).


Fitting a distribution

For a sample of 14,111 sessions on the ABB1400 telerobot from 31 January 1997 to 22 August 1997 with session lengths ranging between 0 and 378 requests to the telerobot, it was found that short sessions predominate with 43% making no further requests to the telerobot after having gained control. A generalised gamma distribution was fitted to the data using a method described by Gran (1992b). A generalised gamma distribution is a three parameter distribution which has the probability density function shown in Equation 17.

When the parameters (a,h,A) of the generalised gamma distribution in Equation 17 are given particular values, a number of well known probability functions appear as special cases. Some of them are listed in Special Cases of the General Gamma Distribution Table 16 and they include the Weibull distribution.

a h A Distribution function
½ 2 sqrt(2) sigma One sided normal
½ 2 sigma Error function
a 1 1 Elementary gamma
n/2 1 2 chi-square, n degrees of freedom
1/2&lta&lt1 1 sqrt(2) sigma Truncated Rice with spectral width epsilon~approx (1-(1-2a)^2)^0.5
1 2 sqrt 2sigma Rayleigh
3/2 2 sqrt 2sigma Maxwell
1 1 A Exponential
1 h > 0 A Two parameter Weibull
1 h < 0 A Frèchet
-> inf 1/{sigma sqrt a} a^{-1/h}e^x Log-normal with parameters (x,sigma)
-> 0 +-1/{a sigma} e^{y+-sigma} Algebraic x^{+-1/sigma -1}
1 x over sigma>&gt1 x Approximate normal (x,sigma)
1 inf A delta-distribution, x = A, a constant

Special Cases of the General Gamma Distribution
Tabel 16

Gran (1992a) also describes a method of estimating the parameters developed by Stacy and Mihram (1965) based on logarithmic moment estimators.

The logarithmic moment estimators are:-

where N is the number of observations.

The parameters are then estimated by solving:-

where , and are the di-, tri- and tetra-gamma functions defined as the successive derivatives of .

Applying Equation 18 to the sample gives , applying Equation 19 gives and applying Equation 20 gives . Combining Equation 22 and Equation 23 and using the "FindRoot" function in the mathematics software package Mathematica gives . Substituting this value back into the Equation 21 gives . Substituting these values into Equation 20 gives . As and , then, according to Special Cases of the General Gamma Distribution Table 16, the data sample is better fitted by the two-parameter Weibull function than any other special case of the generalised gamma function. This provides evidence that it is the appropriate distribution to fit to the data.

Fitting a Weibull distribution

The ABB1400 telerobot

A Weibull distribution was fitted to the data described in section 6.4 The data was collected while the operator interface was of the form shown in Figure 81.

The form of the operator interface during data collection on requests per session.
Figure 81

A Weibull distribution has the form of Equation 15. The chi-square goodness of fit test is used. Applying the test as described by Walpole and Myers (1972:266). The test between observed and expected frequencies is based on the quantity:-

where is a value of the random variable whose sampling distribution is approximated very closely by the chi-square distribution. The symbols and represent the observed and expected frequencies, respectively, for the ith cell. The significance of the test (), is the probability of rejecting the hypothesis that the difference between the observed data and the proposed distribution is due to random sampling when it is in fact due to random sampling. It is the region in the right hand tail of the chi-square distribution. A value of is often considered to be a reasonable level of significance to justify the conclusion that the proposed distribution provides a good fit to the observed data. The number of degrees of freedom in a chi-square goodness of fit test is equal to the number of cells minus the number of quantities obtained from the observed data that are used in the calculations of the expected frequencies. In our case, three values which are total frequency, and the two parameters and are derived from observed data so that the degrees of freedom () equals the number of cells minus 3, ie. . According to Walpole and Myers (1972:266), the decision criteria should not be used unless the expected frequency of each cell is at least 5. This has been achieved by combining adjacent cells where expected frequencies are less than five so that:-

Combining and Equation 24 gives:-

The estimates of the parameters and are obtained by minimising the value of . Before an operator can make a request to the telerobot, they must gain control. They can then make the first request to the telerobot. In 43% of sessions in the sample, the first request to the telerobot was not made. Fitting a simple Weibull to the data set gives:-

A plot of the experimental data and for requests to the telerobot in the range 1 to 141 requests to the telerobot is shown in Figure 82.

The number of requests to the telerobot in a session. A comparison of the fitted Weibull curve with the experimental data from a sample of 14111 sessions showing the range 1 to 141 requests to the telerobot
Figure 82

To the eye the fit is good and would probably be an acceptable approximation in a practical situation but the significance, is very much less than the level of significance of 0.05 chosen as acceptable. This indicates that the differences are unlikely to be produced by random sampling. Therefore, it is concluded that a simple Weibull does not fit the experimental data. There is a possibility that we have a mixture of distributions in the event population as discussed in section 6.2 and similar to that which Indow found to occur with the length of marriage before divorce that was mentioned in section 6.3. For that example, the number of children was a factor that caused a mixture of distributions. This is distinct from a mixture of non-identical distributions in the parent population, which provides another possibility and occurs if there is a difference in the circumstances experienced by groups of telerobot operators. For the telerobot, it is a plausible scenario. Sometimes the telerobot was out of action and not responding to requests and sometimes there were no blocks to manipulate as they had all been knocked to the floor. A proportion of operators would have commenced their sessions when the telerobot was faulty or when there were no blocks and a proportion would have commenced their sessions when the telerobot was working well. This will produce a heterogeneous event population of sessions with a faulty telerobot and sessions with a good telerobot. A plot of contributions to the statistic from each of the bins is shown in Figure 83 for the full range of experimental data ie. .

Contribution to the statistic from each of the bins for a simple Weibull fit as per for the full range of experimental data ie.
Figure 83

This shows the greatest contribution to be in the bins at either end and in particular at the left end where the bins represent a low number of requests to the telerobot. Fitting a Weibull distribution to the data set for sessions with 19 or greater requests to the telerobot gives the following results:-

The level of significance is now well above the minimum value of 0.05 suggested above. Therefore, there is no reason to reject the null hypothesis and it can be concluded that the observed data can be approximated by a Weibull distribution. A plot of contributions to the statistic from each of the bins , when is shown in Figure 84 which no longer shows the obvious pattern apparent in Figure 83 for the case.

Contribution to the statistic from each of the bins for a simple Weibull fit as per Equation 26 for.
Figure 84

The shape parameter is much less than one indicating the instantaneous drop off rate, analogous to the instantaneous failure rate in reliability engineering, decreases quickly with increasing requests to the telerobot. That is, the probability of making at least one more request to the telerobot increases rapidly as more requests are made. This is illustrated in Figure 85, which shows the probability of making at least one more request to the telerobot as the number of requests already made increase. It is created from the same experimental data as Figure 82

The probability of making another request versus session length. As request count increases users become increasingly interested in the telerobot.
Figure 85

Figure 85 shows that only 57% of those that gain control of the telerobot will make at least one request, but 93% of those that have made seven requests will make at least one more.

The IRb6/L2-6 telerobot in Perth

The experimental data analysed above was acquired on the ABB1400 telerobot. A separate data set was acquired on the IRb6/L2-6 telerobot when it had an interface of the type shown in Figure 86.

Interface for the IRb6/L2-6 telerobot. The robot trips out if brought down directly on a block and is recovered by selecting the reset robot radio function.
Figure 86

This data set represents 34,800 sessions from 16 March 1995 to 9 October 1995 where one request or more was made to the telerobot after having gained control. The data covers the period before the functionality of the IRb6/L2-6 was changed, in particular to detect and automatically recover from an overload occurring when the robot is brought down on a block. 67% of sessions had only a single request. The most requests in a single session were 203 and there were 7 sessions with more than 50 requests. The data includes 862 sessions with more than 10 requests to the telerobot.

As for the ABB1400 telerobot data, it was found necessary to exclude the first few requests to the telerobot to fit a Weibull curve with a sufficient level of significance to conclude that the observed data can be approximated by a Weibull distribution. A satisfactory level of significance was achieved for s > 4. The results were:-

A plot of contributions to the statistic from each of the bins , when is shown in Figure 87 which shows no obvious pattern.

Contribution to the statistic from each of the bins for a simple Weibull fit as per Equation 26 for .
Figure 87

A plot of the experimental data and for requests to the telerobot in the range 1 to 52 requests to the telerobot is shown in Figure 88.

The number of requests to the IRb6/L2-6 telerobot in a session. A comparison of the fitted Weibull curve with the experimental data from a sample of 34800 sessions showing the range 1 to 52 requests to the telerobot.
Figure 88

The level of significance is well above the value suggested above of 0.05 as a minimum. Therefore, we have no reason to reject the null hypothesis and can conclude that the observed data can be approximated by a Weibull distribution. For the Perth IRb6/L2-6 telerobot, the Weibull was found to approximate the experimental data where the number of requests to the telerobot was greater than four, whereas, for the ABB1400 telerobot, the Weibull curve was found to approximate the experimental data only after the first 18 requests. It was suggested that there was likely to be a heterogeneous event population due to a proportion of operators commencing their sessions when the telerobot was faulty or when there were no blocks and a proportion commencing their sessions when the telerobot was working well. The data for the ABB1400 telerobot was acquired in a period during which the system had a higher level of reliability than existed during the period the IRb6/L2-6 data was acquired but the table was cleared of blocks more frequently. A possible explanation for the difference in the number of requests that had to be ignored to fit the Weibull curve is that were more occasions with the ABB1400 telerobot when there were few blocks to manipulate so that operators on these occasions would make a few moves before getting bored. The information is not available to quantify the proportion of operators experiencing a faulty telerobot or no blocks.


The Carnegie telerobot

Data was recorded in the period 1/4/97-30/6/97 representing a total of 7,067 sessions with session lengths ranging between 0 and 274 requests to the telerobot. The Carnegie telerobot is also an IRb6/L2 robot but is a five axis rather than six axis robot. The operator interface is shown in Figure 89.

The form of operator interface during data collection on requests per session for the Carnegie telerobot.
Figure 89

As for the IRb6/L2-6 telerobot in Perth, data it was found necessary to exclude the first four requests to the telerobot (ie s > 4) to fit a Weibull curve with a sufficient level of significance in order to conclude that the observed data can be approximated by a Weibull distribution with the difference being due to random sampling error only. The results were:-

A plot of the experimental data and for requests to the telerobot in the range 1 to 52 requests to the telerobot is shown in Figure 90.

The number of requests to the Carnegie Telerobot in a session. A comparison of the fitted Weibull curve with the experimental data from a sample of 7,067 sessions showing the range 1 to 52 requests to the telerobot.
Figure 90

The level of significance is well above the value 0.05 suggested as a minimum. Therefore we have no reason to reject the null hypothesis and conclude that the observed data can be approximated by a Weibull distribution. As for the IRb6/L2-6 telerobot in Perth, the Weibull was found to approximate the experimental data where the number of requests to the telerobot was greater than four whereas for the ABB1400 telerobot the Weibull curve was found to approximate the experimental data only after the first 18 requests.

The method for assessing operator preference

A Weibull distribution has now been fitted to the requests per session for the three telerobots. Even with large data sets, it could be shown that the difference between the measured and calculated distribution is accounted for by random sampling error to a satisfactory level of significance. This enables the data sets to be characterised by two parameters. When comparing telerobots it is necessary only to compare these two parameters rather than the whole data set. It would be even better if this could be reduced to a single parameter.

A comparison of the number of requests in a session, shown Figure 90 reveals a considerably larger proportion of longer sessions with the later ABB1400 Telerobot than the earlier IRb6/L2-6 telerobot. This would suggest a higher level of operator satisfaction with the latter.

Despite the substantial difference between data sets, it can be seen from the estimated parameters shown in Table 17 that the shape parameter was similar for all data sets.

Setting the scale parameters to and overlaying the curves gives the result in Figure 92 which shows the similarity in shape of the fitted distribution despite the substantial scaling differences between data sets.

This implies that the data sets can be approximated by the other data sets in the region where the Weibull curve was fitted when rescaled according to Equation 13 To rescale the IRb6/L2-6 (Perth) data set to match the ABB1400 data set Equation 13 Becomes:-

for q gives the results in Table 18. Similarly, to rescale the IRb6/L2-6 (Perth) data set to match the IRb6/L2 (Carnegie) data set, Equation 27 becomes:-

and for must be recalculated. Then solving for q gives the results in Table 18

Setting the shape parameter to for each data set. Then estimating the scale factor and solving for q as per Equation 27.

Rescaling the IRb6/L2-6 data set according to Equation 27, using the value for q given in Table 18, and overlaying it on the ABB1400 data set gives the graph shown Figure 93 which shows a reasonable match.

The method used for choosing the scaling factor was somewhat arbitrary as it requires choosing a shape parameter for each data set and then estimating a scale parameter from that. If the two data sets are compared directly, there is no need to estimate a shape parameter. This can be done by eye, rescaling one data set, overlaying it on the other and adjusting the scaling factor until the best fit is observed. The region which should be optimised being that where a Weibull distribution has been fitted to both data sets with a satisfactory level of significance. The optimum scaling factor was estimated by eye to be 6.1.

An alternative method of choosing the scaling factor is by minimising the least squares proportional difference between the data sets according to :-

Experimental data exists only at integer values of s therefore interpolation is needed to

determine ... Linear interpolation can be used to provide this data:- ... Equation 29 where:- ...

A value of q was found numerically to minimise Equation 28 using Equation 29 to determine at non-integer values of . The result was q = 6.23. As the Weibull curve was found only to fit the IRb6/L2-6 Perth data set at s >= 5. The lowest request number s in the ABB1400 data set that can be compared is given by which means that only 52% of the ABB1400 data set, which was found to fit the Weibull distribution, is used when a direct comparison is made. This must affect the accuracy of the comparison, making direct comparison less accurate for large values of q.

The same analysis was performed to compare the IRb6/L2-6 (Perth) with the IRb6/L2 (Carnegie) data set by replacing with in Equation 28. The result for this case was q = 7.34 and for this case only 18% of the IRb6/L2 (Carnegie) data set, that was found to fit a Weibull distribution, was used.

The same analysis was used to compare the ABB1400 data set with the IRb6/L2 (Carnegie) data set by replacing with and with . The scaling factor between these data sets is quite small, so the comparison uses nearly all of the experimental data that was found to fit the Weibull distribution and is therefore likely to be more accurate. The result for this case was q = 1.03. This compares with the value of q estimated by the less accurate method of comparing each of the data sets with the IRb6/L2-6 (Perth) data set of q = 6.23/7.34 = 0.85. As an alternative to linear interpolation, using the assumed distribution of the experimental data, a Weibull interpolation can be applied. The Weibull interpolation requires a separate value of  and m for each value of s to apply in the range which is determined by the two non-linear independent equations:-

Equation 30 and Equation 31 can only be solved numerically and must be solved for each number of requests to the telerobot (s) in the experimental data set. This is quite inconvenient. It is also possible to have values for and in the experimental data set for which there is no solution for  and m. This becomes more likely as the sample size becomes smaller and the random sampling error at each number of requests to the telerobot becomes larger. The method could only be applied to the IRb6/L2-6 (Perth) data set by ignoring some long sessions. In practice, ignoring the last few values at the upper end of s makes no difference to the result, as their contribution to the sum of least squares is negligible. To simplify calculation, the maximum value of s evaluated was 42, which excluded nine sessions from a data set of 34,796 sessions.

Weibull interpolation could not be applied to the ABB1400 data set or the IRb6/L2 (Carnegie) data set as there were a large number of intervals for which there was no solution to Equation 30 and Equation 31. Therefore, it could not be used for a direct comparison of these two data sets. The result for a comparison of the IRb6/L2-6 (Perth) data set with the ABB1400 data set was q = 6.44 which is not very different from the result achieved by linear interpolation which was q = 6.23. For a comparison of the IRb6/L2-6 (Perth) data set with the IRb6/L2 (Carnegie) data set the result was q = 7.00 for Weibull interpolation which compares with the result achieved by linear interpolation of q = 7.34. As for the case of linear interpolation, much of the experimental data can not be used. Weibull interpolation is difficult to implement and impossible to apply in cases where there is too much random sampling error, moreover, it is uncertain whether it produces superior results.

Rescaling the IRb6/L2-6 data set according to Equation 27 with a value of q = 6.23 and overlaying it on the ABB1400 data set in Figure 94 shows the similarity between data sets.

It can been that rescaling by this method produces a superior overlay of Weibull curves than was achieved by the method used to estimate the scaling factor for the overlay shown in Figure 93.

Having estimated a scaling factor between data sets, it is desirable to determine the quality of the estimate. The quality of the estimate is affected by the size of the data sets, the degree to which they match the Weibull distribution, and the extent to which they satisfy the assumption that both have the same shape factor .

A method of measuring the quality is the correlation coefficient between the two data sets after rescaling one. The correlation coefficient is a measure of the linear relationship between two variables as described by Walpole and Meyers (1972:303). Where two data sets X and Y are regarded as random variables and the measurements are observations from a joint density function f(x,y), it is assumed that the conditional distribution f(y|x) of Y, for fixed values of X, is normal with the mean and the variance and that X is likewise normally distributed with mean and variance . The random variable Y can then be written in the form:- ... Equation 32

where E is the random error. The correlation coefficient is then a measure of the linear relationship between the two variables X and Y but it is unaffected by the values of  and . For our case Equation 32 becomes:- ...

The correlation coefficient is a measure of the linear relationship between and but from Equation 33 it can be seen this is the same as the linear relationship between and , therefore the correlation coefficient is fairly insensitive to the value of q. At the value of q = 6.28 with linear interpolation, the correlation coefficient (r) is r = 0.939, which means that 0.9392 x 100%= 88% of the variation in can be accounted for by the linear relationship with the variable . The degree of insensitivity of r to q can be seen Figure 95 which shows the correlation coefficient for a range of values of the scaling factor q around the optimum value of q = 6.28.

The correlation coefficient provides a measure of the quality of the estimate of q, which is largely independent of the value estimated. It measures the random error in the data sets, the degree to which they match the Weibull distribution and the extent to which they satisfy the assumption that both have the same shape factor . A value close to unity indicates a high quality estimate of the scaling factor between the data sets. A value close to 0.5 indicates that the estimated scaling factor has little meaning because there is either insufficient data, the data does not fit a Weibull distribution or the two data sets do not have a similar shape parameter.

Advantage has been taken of the similarity in the shape parameter for all data sets in order to reduce the comparison between data sets to a single parameter, that parameter being the scaling parameter of the Weibull distribution. If an assumption is made that humans will make more requests to a telerobot that they prefer, this number can be used to quantify operators’ preference for one telerobot over another. It provides a means for measuring operator satisfaction. Where a value of q > 1 indicates a preference by the human population for the circumstances that apply in case 2 over those that apply in case 1.

Comments