### Measurement of technical efficiency

Stochastic frontier analysis (SFA) has been widely used to calculate the TE of grain production at the farm level (Latruffe et al., 2004; Cullinane et al., 2006). We choose the trans-log production function rather than the Cobb-Douglas function in analyzing input factor substitution. This choice is supported by the likelihood ratio test (the value is 26.02 and significant at the 1% level, as shown in Table A1 in the appendix). The function has the following form:

$$\begin{array}{l}\ln \left( {Y_i} \right) = \beta _0 + {\sum \limits_{k = 1}^5} \beta _k\ln \left( {X_{ik}} \right) + \frac{1}{2}{\sum \limits_{k = 1}^5} \beta _{kk}{{{\mathrm{ln}}}}( {X_{ik}})^2 \\\qquad\qquad+ \,{\sum \limits_{j = 1}^5}{\sum \limits_{k = 1}^5} \beta _{jk}\ln \left( {X_{ij}} \right)\ln \left( {X_{ik}} \right) + v_i – u_i\end{array}$$

(1)

where the subscript *i* denotes the *i*th farm household, the subscripts *k*, and *j* both are factor indices, *Y*_{i} denotes the output of household *i*, *X*_{ik} or *X*_{ij} denotes the input of factor *k* or *j* of household *i*, *v*_{i} denotes the random error term, and *u*_{i} denotes the non-negative random variable related to technical invalidity. The TE of grain production is calculated as the aggregate TE of the production of rice wheat and maize, the three major staple crops in the studied regions (Zhang et al., 2019). Such an aggregate grain production efficiency is used primarily because farmers typically consider the inputs and outputs of the farming of all crops as a whole, rather than individually. For example, it is not possible to decompose some inputs according to individual crops, especially in the case of rotation (e.g., wheat and maize rotation). In addition, when using Internet for farming purposes, farmers generally do not distinguish crop types.

The subscripts *k* = 1, 2, 3, 4, 5, and *j* = 1, 2, 3, 4, 5, denoting five inputs, including labor, fertilizer, seedings, pesticide, and other inputs (mainly machinery and irrigation), as presented in Table 1. The value of *u*_{i} is obtained by estimating the above function. The value is then used to calculate the TE, which is calculated using the following formula:

### Econometric strategy

In this study, we use endogenous switching regression (ESR) models to investigate the impacts of Internet access and Internet use on TE. This type of model is employed to address endogeneity issues in our analysis. Although the two-stage least square method and the propensity score matching(PSM) method are also widely used to address endogeneity issues, they are not feasible for this study. The two-stage least square method deals with continuous endogenous explanatory variables, not the discrete ones in our study. The PSM method deals with the endogeneity issues caused by observable factors, but the endogeneity in our paper is mainly caused by unobservable factors.

The endogeneity issue due to circular causality may arise as farmers with higher TE are more likely to have access to the Internet or use it for farming purposes (Yan and Zheng, 2021). Farm households’ decisions on the access and use of the Internet are based on their cost-benefit analysis (Ma et al., 2020). Self-selection bias thus can occur in our analyses if households’ characteristics relevant to their decisions are not taken into account (Hou et al., 2018). The ESR model can solve endogeneity problems caused by unobservable variables when estimating the impact of a binary endogenous variable on outcome variables of interest, giving it an advantage over methods that can only solve endogeneity problems caused by observable variables, such as propensity score matching method and inverse probability weighted regression.

The estimation of ESR models is implemented in two stages. In the first stage, the Internet access or use decision variable *IU* is estimated using a model as follows:

$$\begin{array}{l}IU_i^ \ast = \gamma _iZ_i + \mu _i\\ IU_i = 1\left( {IU_i^ \ast\, > \,0} \right)\end{array}$$

(3)

where \(IU_i^ \ast\) denotes the potential utility of Internet access or use decisions, and households make decisions based on expected income. If households’ expected income \(IU_i^ \ast\) is greater than 0, then *IU*_{i} = 1, otherwise *IU*_{i} = 0. *Z* denotes the observable vectors including household characters and crop planting characters.

In the second stage, the determination equation of TE is established to estimate the efficiency difference caused by accessing or using the Internet and not accessing or not using the Internet:

$$TE_{1i} = \alpha _{1i}X_{1i} + \sigma _{1u}\lambda _{1i} + \varepsilon _{1i}\,if\,IU_i = 1$$

(4a)

$$TE_{0i} = \alpha _{0i}X_{0i} + \sigma _{0u}\lambda _{0i} + \varepsilon _{0i}\,if\,IU_i = 0$$

(4b)

where *TE*_{1} and *TE*_{0} denote the TE of grain production of farmers who access or use the Internet and those who do not, respectively. Vector *X* is the control variable, but is different from vector *Z* in Eq. (3): at least one variable in vector *Z* is not in vector *X*. These variables affect the Internet use decisions of farmers but do not directly affect the TE of grain production. *λ* is the inverse Miles ratio calculated by Eq. (3); *σ*_{1u} = *cov* (*ε*_{1}, *u*), *σ*_{0u} = *cov* (*ε*_{0}, *u*), if *σ*_{1u} and *σ*_{0u} are statistically significant, then it indicates that the use of the Internet by farmers has an impact on the TE, and it is necessary for selective correction.

Based on ESR models (4a) and (4b), the average TE of farmers accessing or using the Internet and those not accessing or not using the Internet can be expressed as Eqs. (5) and (6). Their counterfactual TE is the average TE of farmers who access or use the Internet if they do not access or not use the Internet, and the average TE of farmers who do not access or not use the Internet if they do access or use the Internet, which can be expressed as Eqs. (7) and (8).

$$E\left( {TE_{1i}\left| {IU_i} \right. = 1} \right) = \alpha _{1i}X_{1i} + \sigma _{1u}\lambda _{1i}$$

(5)

$$E\left( {TE_{0i}\left| {IU_i} \right. = 0} \right) = \alpha _{0i}X_{0i} + \sigma _{0u}\lambda _{0i}$$

(6)

$$E\left( {TE_{0i}\left| {IU_i} \right. = 1} \right) = \alpha _{0i}X_{1i} + \sigma _{0u}\lambda _{1i}$$

(7)

$$E\left( {TE_{1i}\left| {IU_i} \right. = 0} \right) = \alpha _{1i}X_{0i} + \sigma _{1u}\lambda _{0i}$$

(8)

The average treatment effect on the treated (ATT) can be expressed by the following formula:

$$ATT = E\left( {TE_{1i}\left| {IU_i} \right. = 1} \right) – E\left( {TE_{0i}\left| {IU_i} \right. = 1} \right)$$

(9)

The average treatment effect on the untreated (ATU) can be expressed by the following formula:

$$ATU = E\left( {TE_{1i}\left| {IU_i} \right. = 0} \right) – E\left( {TE_{0i}\left| {IU_i} \right. = 0} \right)$$

(10)

### Data and variables

#### Data collection

The data used for the analysis were collected from farmers in central China, specifically Hubei, Hunan, and Henan provinces in July 2019. These three provinces are important regions producing grains including rice, wheat, and maize in China, whose grain output accounts for 18.77% of the national total grain production in 2020. We employed a strategy combining stratified sampling and random sampling. A total of 108 villages and 1080 households were selected in the three studied provinces. To begin, counties in each province were divided into six groups based on population and farmland. Then, three towns were chosen at random from each sample county, two villages were chosen at random from each sample town, and ten farm households were chosen at random from each sample village. The data of 855 grain farmers have been used for the analyses.

#### Variables and descriptive analysis

The key explanatory variables in our empirical analyses are *Internet access* and *Internet use*. Farm households gain access to the Internet through broadband, WiFi, or mobile data. The variable “Internet access” is a dummy variable, whose value is 1 if farmers have access to the Internet via one of the three methods and 0 otherwise.

Internet use is indicated by the variable *whether to use the Internet to obtain farming-related information* in our study. This variable is made up of three indicators: “whether to learn agricultural environmental information via the Internet,” “whether to search agricultural product purchase and sales information via the Internet,” and “whether to buy agricultural materials via the Internet.” The variable “Internet use” equals 1 if a farmer conducts any one of the three actives listed above and 0 otherwise.

The key to estimating the ESR model using the two-stage method is to choose appropriate exclusive variables. In other words, at least one variable in the vector Z of the Internet access or use decision equation is not included in the TE decision equation. These variables are also known as instrumental variables (IVs) (Song et al., 2018). The IVs, in our case, should directly affect farmers’ decisions on Internet use but do not directly affect the TE of grain production (Shiferaw et al., 2014). Following Chen (2013), we adopt farmers’ “preference for ICT products” as the IV of *Internet access* and adopt “years of using the Internet” as the IV of *Internet use*. Preference for ICT products has a direct impact on Internet use and fits the correlation requirements of instrumental variables and endogenous explanatory factors. Acceptance, purchasing intent, and use frequency of these new products will be influenced by the preference for ICT products (Donat et al., 2009; Verdegem and Verhoest, 2009). People who have stronger preferences for ICT products are more likely to purchase ICT products including smartphones and computers early, and thus intend to use the Internet more in various ways, e.g., sending and receiving emails, talking, and playing games. Meanwhile, the preference for ICT products is not necessarily related to agricultural production behavior. The preference for ICT products, therefore, can be a satisfactory instrumental variable for Internet access. A person’s historic preference for ICT products has nothing to do with their current agriculture activities (Chen, 2013). Similarly, people who have used the Internet for a longer period are more adept at using it and thus more likely to obtain agricultural-related information via the Internet. Years of using the Internet can be thought of as a valid IV of Internet use because it is based on a consumption decision made years ago and thus has no direct impact on current agricultural production behavior.

In this paper, the preference for ICT products is measured by “whether households owned smartphones or computers in 2013 or before”. This is primarily due to a series of measures implemented by Internet service providers to tap the rural Internet consumption market, which resulted in a much higher growth rate of Internet users in rural areas than in urban areas of China in 2013. Therefore, whether the household owned smartphones or computers in 2013 or before indicates that farmers prefer smartphones and computers over purchasing smartphones or computers to improve the TE of grain production.

In addition, the demographical characteristics of the household head, characteristics of the household, crop farming, and villages are controlled in our models (Zhu et al., 2021; Zheng et al., 2021). Province dummy variables are introduced to control the regional fixation effect. Table 1 presents the definitions and descriptive statistics of all variables used in our models.

Tyler Fields is your internet guru, delving into the latest trends, developments, and issues shaping the online world. With a focus on internet culture, cybersecurity, and emerging technologies, Tyler keeps readers informed about the dynamic landscape of the internet and its impact on our digital lives.