**Chapter 6 Analysis of Factors Affecting Innovation Innovation**

**6.1 Introduction**

### Chapter 6 Analysis of Factors Affecting

identified as users of at least one biotechnology process. The survey achieved a 98% response rate with 180 enterprises being identified as users of at least one biotechnology process, see (see sections 4.2.2 and Marsh (2001b) for further details).

A total of 146 questionnaires were sent out in the 2002 survey. The study
population was defined as ‘all enterprises using modern^{1} biotech processes and/or
all enterprises conducting R&D using biotech processes (modern or traditional).

Sixty one usable responses were received indicating a ‘usable’ response rate of
44%^{2} (see section 4.2.3 and Appendix 3 for further details).

1 List based definition see Appendix 2.

2 The sample frame was adjusted to 138 enterprises by subtracting enterprises that reported that they were not involved in biotech.

PhD Draft 2004 Rev3 Aug 27 Final.doc 30-Aug-04 1:04 PM

**Table 6.1 ** **Hypotheses Tested in the Empirical Analysis **

No. Hypothesis Section

1 *Stock of ideas *

Rate of new ideas production increases with the stock of ideas

NT
2 *Demand pull vs. Technology push *

Investment in innovation is determined jointly by technological opportunity and market demand

6.8

3 *Technological certainty *

Increased technological certainty encourages innovative activity

NT
4 *Market structure *

Barriers to entry, higher profit levels, and economies of scale in R&D generally increase the level of R&D undertaken leading to persistence of innovation

NT

5 *Firm Size *

5a Innovation output increase with enterprise size and the number of ideas workers

5b Innovation rate increase with enterprise size and the number of ideas workers

0

6 *Firm type *

Innovation output and innovation rate vary with firm or organisational type

6.5

6a IO and IR vary with industry group

6b Enterprises that conduct R&D have a higher IO and IR than those that do not

6c Enterprises that use modern biotech processes have a higher IO and IR than those that do not

6d Enterprises that specialise in biotech have a higher IO and IR than those that do not

7 *Appropriability *

R&D spending (and hence innovation) increases as the degree of appropriability of R&D outputs increases

6.8

8 *Spillovers, clustering and alliances *

Innovation increases with the quantity and quality of the interaction between the organisations making up the innovation system

6.6 &

6.7 8a International linkages have a stronger positive effect than

domestic linkages

8b Links with purchasers have a positive effect on IO 8c Links with firms have a positive effect on IO

6.7
6.8
6.6
9 *Institutions *

Institutional factors are important determinants of innovative output

NT Notes: 1. NT – not tested.

2. See Chapter 3 for a detailed introduction to these hypotheses.

PhD Draft 2004 Rev3 Aug 27 Final.doc 30-Aug-04 1:04 PM

6.1.1 Analysis Scope

The scope of the analysis is shaped by the variables covered by the 1998/99 and 2002 biotechnology surveys. In order to maximise the response rate, the 2002 questionnaire was kept short and was limited to questions that could be answered easily by most respondents. For these reasons hypotheses 1, 3, 4 and 9 on the stock of ideas, technological certainty, the role of institutions and market structure (e.g. barriers to entry, profit levels, economies of scale in R&D) are not tested (see Table 6.1). Patent stocks and the number and qualifications of R&D staff can provide an indication of the stock of ideas (hypothesis 1). However patents are also used as an indicator of innovative output, while number of R&D staff is also a size variable, so no separate testing of the effect of stock of ideas was possible.

Hypotheses 2 and 7 are not tested directly but are investigated using data on respondent perceptions of the importance of demand-pull vs. technology-push and appropriability.

6.1.2 Innovation Indicators and Dummy Variables

The analysis is based on several different indicators of innovation output
including the number of new products and/or processes introduced to the market,
the number of products/processes undergoing trials or R&D and the number of
patent applications. Combining product and/or process introductions and patent
applications provides a better overall measure since different organisations exhibit
innovative output in different ways. Patents are a better indicator for enterprises
that concentrate on the creation (and protection) of intellectual property, while the
number of new products and/or processes introduced is a better indicator for more
production-oriented enterprises (many of which have applied for few if any
patents). A possible refinement would be to include data on the number of articles
published in top-ranking refereed journals - possibly one of the best indicators of
innovative output for universities and organisations conducting basic research^{3}.

3Data for this indicator were not included in the 1998/9 or 2002 biotechnology surveys.

PhD Draft 2004 Rev3 Aug 27 Final.doc 30-Aug-04 1:04 PM

In the 2002 survey (Marsh, 2002), data were collected on the percentage of
revenue or sales attributable to biotech products or processes introduced in the last
three years. This innovation metric has been widely used particularly in the
Community Innovation Surveys (CIS)^{4}. It has the advantage that it provides a
measure of the impact of innovation on innovators, whereas new product, process
and patent counts do not distinguish between trivial innovations and those that
have greater significance. Unfortunately this indicator has only limited application
in the New Zealand biotech sector where much innovative activity is concentrated
in R&D and has not reached the market (see 6.5 for analysis results).

The effect of firm size (innovative effort) was unclear when specified as continuous variables. Creation of a series of dummy variables for total expenditure, biotech expenditure and alternative indicators for ‘number of ideas workers’, enabled significant size effects to be identified and allowed the effect of different size categories to be identified. In each case the sample was divided into quartiles. Three dummy variables were used to define membership of quartiles two to four with the first quartile being the constant. Food manufacturers were selected as the constant for industry group; being a reasonably large group expected to have significantly different characteristics to groups such as research organisations and universities.

### 6.2

### Regression Models

The innovative output indicators outlined above are count variables^{5} that do not
have normal distributions so use of the linear regression model could result in
inefficient, inconsistent and biased estimates (Long & Freese, 2001, p. 223).

Fortunately methodology for analysing count variables is well developed; see Cameron and Trivedi (1986) for a comprehensive review. The Poisson Regression Model (PRM) is the count data model that was developed first. It is generally used to model the number of occurrences of an event as a function of some independent variable. Poisson distributions have three distinguishing characteristics. “First the

4 A series of innovation surveys carried out in the European Community, see Muzart (1998) and Tether (2001).

5 with the exception of percentage of revenue or sales attributable to biotech products or processes introduced in the last few years.

PhD Draft 2004 Rev3 Aug 27 Final.doc 30-Aug-04 1:04 PM

Poisson distribution is skewed; traditional regression assumes a symmetric distribution of errors. Second, the Poisson distribution is non-negative; traditional regression might sometimes produce predicted values that are negative. Finally the variance of a Poisson distribution increases as the mean increases; traditional regression assumes a constant variance.”(Simon, n.d.).

The Poisson Regression Model accounts for observed differences among sample members by specifying the count variable as a function of observed independent variables.

In practice the PRM rarely fits due to overdispersion. That is the model underestimates the amount of dispersion in the outcome. The Negative Binomial Regression model (NBRM) addresses the failure of the PRM by adding a parameter α that reflects unobserved heterogeneity among observations…

…the PRM and the NBRM have the same mean structure. That is if the assumptions of the NBRM are correct, the expected rate for a given level of the independent variable will be the same in both models. However, the standard errors in the PRM will be biased downward, resulting in spuriously large z-values and spuriously small p-values (Long & Freese, 2001, p. 243)

Lambert (1992) introduced zero-inflated count models as an alternative means of dealing with overdispersion. In these models the mean structure is adjusted to allow zeros to be generated by two separate processes. Zero-inflated models assume that there are two unobserved groups; an ‘always-zero group’ where individuals have an outcome of zero with a probability of one and a ‘not always-zero group’ where individuals may have a always-zero count, but there is a non-always-zero probability that they have a positive count (Long & Freese, 2001, p. 251).

Use of the Zero-Inflated Negative Binomial Regression Model (ZINB) for the innovation data reported in this chapter implies that there is a group of enterprises that innovate and a group that do not. ZINB enables separate estimation of:

i. the probability that an individual will be in the ‘no innovation group’ given certain characteristics; and

ii. the effect of certain characteristics on innovative output, among the

‘innovators group’.

PhD Draft 2004 Rev3 Aug 27 Final.doc 30-Aug-04 1:04 PM

This form of analysis is well suited to analysis of the 1998/99 dataset where around 50% of the sample did not introduce new products or processes or make patent applications.

Stata^{6} estimates the deviance (D) goodness of fit statistic to enable assessment of
whether use of the PRM is optimal, it is given by:

} )

( {

2 _{max}

2 *L* *L*

*D* =− β −

χ

In this equation the likelihood (L) for a particularβ is compared with *L*_{max}, where

5 Prob > = 0.0000 B

hypothesis that these data are oisson distributed at the 0.01% significance level.

### ( ) ( )

### [ ]

### ∑

=−

−

−

= ^{n}

*i*

*i*
*i*
*i*

*i*

*i**y* *y* *w* *y*

*w*
*L*

1

max {ln 1} ln !

Testing the general model for 1998/99 data gives the following result:

2

χ*D* = 583.361
χ2

ased on this test the PRM does not fit the data at all well; the goodness of fit χ2tells us that, given the model, we can reject the

P

Since the NBRM reduces to the PRM when α =0, the presence of overdispersion
can also be investigated by testing *H*_{0} :α =0. Long and Freese (2001) describe
this test, pointing out that the Negative Binomial Regression:

…estimates ln(α) rather than α … A test of *H*_{0} :lnα =0 corresponds
to testing *H*_{0} :α =1…Since α must be greater than or equal to 0, the
asymptotic distribution of αˆ when α =0 is only half a normal
distribution. That is all values less y of 0. This

To test the

than 0 have a probabilit

requires an adjustment to the usual significance level of the test. (Long &

Freese, 2001, p. 246)

0 ln

0 : α =

*H* Stata (Stata Corporation, 2003) provides a likelihood
ratio test. The test is computed by comparing the log likelihoods:

### (

*M*

*full*

### ) (

*M*

*ercept*

### )

*L*

*LR*=2ln −2ln _{int}

6 Stata Corporation (Stata Corporation, 2003). Stata is a statistical package for analysing data.

Details available at http://www.stata.com

PhD Draft 2004 Rev3 Aug 27 Final.doc 30-Aug-04 1:04 PM

This statistic is sometimes designated as G^{2} (Long & Freese, 2001, p. 83). Test
results for the model are:

Likelihood-ratio test of ln alpha=0: χ^{2} = 249.45
Prob> χ^{2} = 0.000

Since there is significant evidence of overdispersion (G^{2}=249.45, p< 0.01), the
Negative Binomial Regression Model is preferred to the Poisson Regression
Model.

The superior performance of the NBRM can be illustrated by plotting the observed frequencies of different counts of innovative output (IO) against levels predicted by the Poisson and Negative Binomial Regression Models, see Figure 6.1. It can be seen that the PRM under predicts the frequency of IO counts of zero and over predicts IO counts of 1-6. By contrast the NBRM predicts the observed proportion of counts very accurately; particularly for count levels three or more.

**Figure 6.1 ** **Observed vs. Predicted Level of Innovative Output **

observed proportion neg binom prob poisson prob

0 5 10

0 .2 .4 .6

Source: Original Analysis by the author.

ProbabilityofCount

IO Count

PhD Draft 2004 Rev3 Aug 27 Final.doc 30-Aug-04 1:04 PM

Figure 6.2 illustrates model deviations from observed probability levels using four different regression models (Poisson, Negative Binomial, Zero-Inflated Negative Binomial and Zero-Inflated Poisson). It can be seen that NBRM, ZINB and ZIP all produce similar results and are all superior to the PRM. The NBRM is used for ost analysis in this chapter, since it is simpler than ZINB and ZIP while providing similar goodness-of-fit.

The Zero-Inflated Model provides a useful supplement for analysis of the 1998/99
data and can be theoretically justified since there are some enterprises within the
New Zealand ‘biotech sector^{7}’ that for structural reasons^{8} do not innovate. For
other enterprises the failure to innovate in a given time period is a matter of
chance; this is the basis for zero-inflated models. The Negative Binomial version
of the model (ZINB) is preferable to the Poisson version (ZIP) since there seem to
be unobserved sources of heterogeneity that differentiate enterprises.

**Figure 6.2 ** **Comparison of Model Deviations from Observed Levels of IO **
m

-0.05 0 0.05

prmval 0 1

-0.15 -0.1 0.1 0.15 0.2

2 3 4 5 6 7 8

dprm dnbrm dzip dzinb

Source: Original Analysis by the author.

7 As defined by Statistics New Zealand for the purposes of the 1998/99 survey, see Marsh (2001b) for a detailed discussion.

8 For example research by university departments often focuses on peer reviewed publications rather than bringing innovations to the market.

ProbabilityDeviation

PhD Draft 2004 Rev3 Aug 27 Final.doc 30-Aug-04 1:04 PM