Economy & Markets
60 min read
An In-Depth Evaluation of Max Pooling in Deep Learning
SCIRP Open Access
January 20, 2026•2 days ago

AI-Generated SummaryAuto-generated
The article evaluates the Grouping Adjusted Median Estimator (GAME) for Tobit models, which are related to ReLU in deep learning. Traditional methods like CTMLE and max pooling (MPE) perform poorly with non-normal or heteroscedastic errors. Monte Carlo experiments demonstrate that GAME is robust and outperforms CTMLE and MPE under these conditions, providing more accurate estimations and boundary detections.
1. Introduction
The Rectified Linear Unit activation function (ReLU) [1]-[11], along with its modifications such as leaky ReLU [12]-[15], is widely used in deep learning (DL), especially in two-dimensional (2D) and three-dimensional (3D) image analysis. We focus on input layers in 2D or 3D image analysis, but our arguments easily extend to intermediate layers and multi-dimensional problems. ReLU is given by h( a )=max( 0,a ) and can be expressed as y=h( x ′ w+u ) when random noise is present. For image analysis, x and w represent vectors of pixel values and their corresponding weights, respectively. Analyzing data with ReLU is similar to censored regression or Tobit models [16] (pp. 973-977) that are widely studied in statistics and econometrics [17]-[19]. Tobit was introduced by Tobin [20] in his analysis of household expenditures on durable goods. However, knowledge from these studies has not been fully incorporated into DL research. The results derived from econometrics and statistics resemble those of DL, and incorporating these applications may enhance the performance of DL methods.
Pooling techniques [21]-[26], especially max pooling [27]-[32], have also been widely used. However, max pooling suffers from a serious drawback: when noise is large and signal is small, it often produces incorrect results. Song et al. [33] consider median-pooled gradients and note that “median pooling is suitable to reduce the effect of noise” [33] (p. 140). Nevertheless, their study lacks a solid theoretical framework, and median pooling itself is rarely employed.
Nawata [34] proposes the Grouping Adjusted Median Estimator (GAME), which is robust to non-normality and heteroscedasticity in the Tobit model. Grouping is conceptually similar to pooling, and the asymptotic distribution of GAME has been derived. In this paper, we examine the application of GAME to the Tobit model and present results from Monte Carlo experiments. When a signal represents a small percentage of the area, we can use an appropriate percentile. The concept of the “median” can be generalized to arbitrary percentiles.
2. Models and Game
2.1. Models
In image analysis, it is important to estimate the boundary between two regions, such as target and background. For example, Figure 9.2 in Bishop and Bishop [35] (p. 260) shows contours of cat images in which the cat’s position differs across images. When the positions are corrected, the results become identical.
We consider cases in which the boundary is given by a linear function x ′ i w=0 , where x i are vectors of explanatory variables including a constant term and w are their coefficients (weights), respectively. Since multidimensional splines may be used [36] [37], the boundary can be approximated by a linear function, at least locally.
When noise or an error term exists, the observed dependent variable y i is given by
y i * = x ′ i w+ u i , y i =max( 0, y i * ) , i=1,2,⋯,n ,(1)
where u i is an error term and n is the total number of observations (pixels). For mono-colored grayscale images drawn on white paper with a brush, y i represents the darkness at each point (pixel) and x i represents the location (including its functional transformations) of the point. If the censoring threshold is not zero, use min{ y 1 , y 2 ,⋯, y n } [17].
In statistics and econometrics, the unknown parameters in Equation (1) are usually estimated by the conventional Tobit maximum likelihood estimator (CTMLE), which assumes normal error terms and maximizes the likelihood function:
L( w, σ 2 )= ∏ y i >0 1 σ ϕ[ ( y i − x ′ i w )/σ ] ∏ y i =0 Φ[ − ( y j − x ′ i w )/σ ] , (2)
where ϕ and Φ are the density and distribution functions of the standard normal distribution. However, CTMLE is not consistent and often exhibits large biases when the error terms are non-normal or heteroscedastic [38] (pp. 378-381).
2.2. Grouping Game
Nawata [34] proposes GAME for the Tobit model. The estimation method consists of the following three stages, and Figure 1 is a flowchart of the estimation procedure.
i) Grouping Stage
We assume that the sample space of x i , denoted by S , is bounded (if not, consider a proper bounded subset of S ). We divide S into m non-overlapping groups or windows, { S j }={ S 1 , S 2 ,⋯, S m } , satisfying the following conditions. Pooling typically uses square or rectangular windows, whereas grouping can take on more freeform shapes.
Condition 1. The distance between any two points in each group converges to zero as n goes to infinity.
Condition 2. Let n j and n j + be the numbers of observations in I j and I j + , respectively, where I j + , I j 0 , and I j are the index sets given by I j + ={ i: x i ∈ S j and y i >0 } , I j 0 ={ i: x i ∈ S j and y i =0 } , and I j ={ i: x i ∈ S j }= I j + ∪ I j 0 . { n j }={ n 1 , n 2 ,⋯, n m } increase on the order of n δ and n j + n j >a for some 0<δ≤1 and a>1/2 . There must be enough groups for the model to be estimable (otherwise, change the grouping or replace the median with an appropriate percentile).
In image analysis, pixels are divided into grid windows to form groups. Note that if x i is discrete, the data are grouped by the values taken by x i , and the group sizes become zero. Hereafter, we denote ( x i , y i ) for i∈ I j as ( x ij , y ij ). Let x ¯ j denote the average of x ij ( = ∑ i x ij / n j ), let y ¯ j denote the median of { y ij }={ y i :i∈ I j } , and let Λ + denote the index set defined by Λ + ={ j: n j + n j >a } . To avoid unnecessary complications, the median is defined as the ( n j +1 )/2 -th largest value for odd values and ( n j 2 +1 )-th largest value for even values. Since y ¯ j is the median, it takes positive values for j∈ Λ + . The model and its ordinary least squares (OLS) estimator for { ( x ¯ j , y ¯ j ):j∈ Λ + } are given by:
Figure 1. Flowchart of the Grouping Adjusted Estimator (GAME).
y ¯ j = x ¯ ′ j w+ e j , j∈ Λ + ,(3)
w ^ 1 = ( ∑ j∈ Λ + x ¯ j x ¯ ′ j ) −1 ∑ j∈ Λ + x ¯ ′ j y ¯ j .
ii) Adjusted Stage
If x i is continuous, the effects of group sizes cannot be ignored, especially in high-dimensional spaces. Therefore, the adjustment stage is necessary. We define the adjusted dependent variable y ij ( w ^ 1 ) as y ij ( w ^ 1 )= ( x ij − x ¯ j ) ′ w ^ 1 for i∈ I j . We update the values of y ¯ j to y ¯ j * = median of { y ij ( w ^ 1 ) } for j∈ Λ + . We consider the regression model given by:
y j ( w ^ 1 )= x ¯ ′ j w+ e j * , j∈ Λ + . (4)
Let w ^ 2 be the OLS estimator of Equation (4). Using w ^ 2 , adjust the data again and continue the procedure until the process converges. Since the process becomes a contraction mapping, it converges to the fixed point of the mapping, w ^ + , if the sizes of { S j } are sufficiently small. If the process does not converge, reduce the sizes of { S j } . For details, see Nawata [39].
iii) Final Stage
If necessary, redivide S into m * nonoverlapping and fixed groups { S j * }={ S 1 * , S 2 * ,⋯, S m * * } . Define I j * , I j *+ , n j *+ , and n j * as before, based on { S j * } . We update the adjusted dependent variable as
y ij * = y ij ( w ^ + )= ( x ij − x ¯ j ) ′ w ^ + for i∈ I j * .(5)
In this case, we adjust values in all groups. Consider the index set I j 0* ={ i: y ij ( w ^ + )≥ M j * } , where M j * =median{ y ij * :i∈ I j * } . Define
x ¯ j * ={ ∑ i x ij / n j and y ¯ j * = M j * ,if M j * >0and I j 0* =∅, the value minimizing x ′ ij w ^ + overi∈ I j ,otherwise. (6)
This means that we use the median of { y ij * } if max i { y ij * :i∈ I j 0 }< M j * and 0 otherwise.
Since { y ¯ 1 * , y ¯ 2 * ,⋯, y ¯ m * * } take 0 or positive values like the original data { y 1 , y 2 ,⋯, y n } , and the asymptotic distribution of the median is normal, w is estimated by the GAME w ^ , which maximizes
L ∗ ( w, σ 2 )= ∏ y ¯ j ∗ >0 n j σ ϕ[ n j ( y ¯ j ∗ − x ¯ j ∗ ′ w )/σ ] ∏ y ¯ j ∗ =0 Φ[ − ( y ¯ j ∗ − x ¯ j ∗ ′ w )/σ ] . (7)
Unlike binary cases [40] [41], we can obtain the asymptotic distribution of the estimator, given by
n ( w ^ − w 0 )→N( 0, 1 4f ( 0 ) 2 A −1 ) ,(8)
A= plim n→∞ ∑ y ¯ j ∗ >0 ( n j n x ¯ j ∗ x ¯ j ∗ ′ ) = plim n→∞ ( − 1 n ∂ 2 log L ∗ ∂w∂ w ′ | θ 0 ) .
Here, f( ⋅ ) is the density function of u i and θ ′ 0 =( w ′ 0 , σ 0 2 ) denotes the true parameter values. Variance estimation can be carried out using standard statistical software packages.
3. Monte Carlo Experiments
We consider 2D images where S is the rectangle defined by 0< x 1 ≤1 and 0< x 2 ≤1 . x 1 and x 2 are divided into 300 equidistant grid lines, yielding n = 90,000 points per image. Let x 1κ denote the κ -th grid line in x 1 and x 2ℓ denote the ℓ -th grid line in x 2 . The intersection of x 1κ and x 2ℓ is denoted as x κℓ . We consider the models
y κℓ * = w 0 + w 1 x 1κ + w 2 x 2ℓ + u κℓ and y κℓ =max( 0, y κℓ * ) .(9)
The boundary of the region is given by
x 2 = w 0 * + w 1 * x 1 where w 0 * = w 0 / w 2 and w 1 * =− w 1 / w 2 . (10)
The true parameter values of w 0 , w 1 , and w 2 are 0.0, 1.0, and −1.0, respectively.
We compare three estimators:
1) CTMLE (assuming normal error terms),
2) GAME considered in this paper, and
3) MPE (max pooling estimator), which uses the maximum values within groups.
In contrast to MPE, GAME is based on the asymptotic normality of median.
For the i.i.d. case, we consider two distributions:
Case 1: Standard normal distribution, and
Case 2: Quasi-Cauchy distribution.
Since the estimation procedure failed to converge in many trials under the standard Cauchy distribution, we use the quasi-Cauchy distribution, where the error terms are drawn from U (0.002, 0.998) as a fat-tailed distribution. The variance of the distribution is 101.7.
For the heteroscedastic distributions, we consider the following cases:
Case 3: Heteroscedastic distribution I, u κℓ =[ 1( x 1κ <0.5 )+2∗1( x 1κ ≥0.5 ) ]ε , and
Case 4: Heteroscedastic distribution II, u κℓ =[ 2∗1( x 1κ <0.5 )+1( x 1κ ≥0.5 ) ]ε .
1( ⋅ ) is an indicator function that takes the value 1 if the argument is true and 0 otherwise, and ε is a standard normal random variable. For GAME and MPE, the sample space of ( x 1 , x 2 ) , denoted by S, is divided such that each group contains nine intersection points determined by three neighboring grid lines of x 1 and x 2 . The number of groups is 100 × 100 = 10,000. We perform 1000 trials using EViews 13 for each case. CLMLE and GAME can be efficiently estimated using standard programs without incurring significant computational cost.
Tables 1-4 present the Monte Carlo results. When the error terms follow the i.i.d. normal distribution (Table 1, Case 1), the biases, standard deviations (SDs), and mean squared errors (MSEs) of CTMLE are small. This outcome is quite reasonable, since CTMLE is not only consistent but also efficient in this case. The biases of GAME are very small, but the SDs and MSEs are larger than those of CTMLE. For MPE, the biases, SDs, and MSE are small for w 1 and w 2 , but that for w 0 is large (1.485). Figure 2 illustrates the boundaries of the two regions calculated from the true parameter values and the averages of w 0 * and w 1 * values. The boundaries obtained from the true parameters, CTMLE, and GAME are quite similar and nearly coincide. In contrast, the boundary given by MPE is almost parallel to the other lines.
Table 1. Standard normal distribution: Case 1.
w 0
w 1
w 2
w 0 ∗
w 1 ∗
True
0.0
1.0
−1.0
0.0
1.0
CTMLE
mean
−0.0006
0.9998
−0.9987
0.0005
1.0012
SD
0.0154
0.0189
0.0188
0.0121
0.0193
MSE
0.0001
0.0002
0.0002
0.0001
0.0003
GAME
mean
−0.0005
1.0013
−1.0002
0.0003
1.0013
SD
0.0180
0.0216
0.0235
0.0160
0.0275
MSE
0.0002
0.0003
0.0003
0.0002
0.0005
PME
mean
1.4849
1.0028
−1.0038
1.4797
0.9995
SD
0.0161
0.0211
0.0209
0.0238
0.0214
MSE
2.2051
0.0005
0.0005
2.1903
3.9988
SD: Standard deviation, MSE: Mean squared error.
Figure 2. Boundaries obtained from the true parameter values (True), CTMLE, GAME, and PME for the standard normal distribution: Case 1 in Table 1.
When the error terms follow the quasi-Cauchy distribution (Table 2, Case 2), the biases of CTMLE are large, with values of 2.124, −0.818, and 0.096 for w 0 , w 1 , and w 2 , respectively. In contrast, the biases, SDs, and MSEs of GAME are much smaller than those of CTMLE, with biases of −0.032, 0.073, and −0.075, respectively. MPE performs quite poorly, exhibiting not only very large biases but also very large SDs. Figure 3 illustrates the boundaries obtained from the true parameter values and the averages of the estimated parameters. Both CTMLE and MPE produce incorrect boundaries; however, GAME yields an almost correct boundary much more accurate than that from CTMLE or MPE.
Table 2. Quasi-Cauchy distribution: Case 2.
w 0
w 1
w 2
w 0 ∗
w 1 ∗
True
0.0
1.0
−1.0
0.0
1.0
CTMLE
mean
2.1275
0.1816
−0.9039
2.5734
0.3329
SD
0.4941
0.6007
0.6575
3.4289
1.8383
MSE
4.7704
0.6699
0.4415
18.3794
3.8243
GAME
mean
−0.0319
1.0731
−1.0745
−0.0299
0.9993
SD
0.1429
0.1707
0.1719
0.1390
0.1821
MSE
0.0214
0.0053
0.0351
0.0202
0.0332
PME
mean
84.3342
−22.4730
−46.4981
−0.8000
−0.6911
SD
999.29
730.77
970.44
33.34
17.29
MSE
1005692.4
534582.4
943826.7
1112.2
301.8
Figure 3. Boundaries obtained from the true parameter values (True), CTMLE, GAME, and PME for the quasi-Cauchy distribution: Case 2 in Table 2.
Table 3 and Table 4 present the heteroscedastic distribution results. For Heteroscedastic distribution I (Table 3, Case 3), GAME clearly reduces biases for all parameters. The biases of MPE for w 0 , w 1 , and w 2 are 1.097, 2.248, and −0.019, respectively, which are large for w 0 and w 1 but relatively small for w 2 . Since error term variances depend only on x 1 , this may be reflected in the Monte Carlo results. Figure 4 illustrates the boundaries calculated from the true parameter values and the estimates. The GAME boundary is closer to the true line than that from CTMLE, whereas MPE yields a poor result.
For Heteroscedastic distribution II (Table 4, Case 4), the biases and MSEs of GAME are smaller than those of CTMLE for all parameters. For MPE, although the bias for w 2 is small (−0.0017), the biases for w 0 and w 1 are large, at 3.339 and −2.299, respectively. Figure 5 shows the boundary lines of the two regions. GAME clearly improves upon CTMLE. As before, MPE performs poorly. In particular, P[ y i >0 ] decreases as x 1 increases, and the estimated target and background regions become opposite to the true ones in Case 4.
Table 3. Heteroscedastic distribution I: Case 3.
w 0
w 1
w 2
w 0 ∗
w 1 ∗
True
0.0
1.0
−1.0
0.0
1.0
CTMLE
mean
−0.5100
1.8759
−1.1269
−0.4528
1.6652
SD
0.0159
0.0228
0.0215
0.0202
0.0360
MSE
0.2603
0.7678
0.0166
0.2055
0.4438
GAME
mean
−0.2574
1.4398
−1.1270
−0.2288
1.2783
SD
0.0208
0.0298
0.0306
0.0219
0.0389
MSE
0.0667
0.1943
0.0171
0.0528
0.0790
MPE
mean
1.0967
3.2477
−1.0187
1.0772
3.1915
SD
0.0222
0.0338
0.0326
0.0234
0.1080
MSE
1.2033
5.0533
0.0014
1.1609
4.8144
Figure 4. Boundaries obtained from the true parameter values (True), CTMLE, GAME, and PME for Heteroscedastic distribution I: Case 3 in Table 3.
Table 4. Heteroscedastic distribution II: Case 4.
w 0
w 1
w 2
w 0 ∗
w 1 ∗
True
0.0
1.0
−1.0
0.0
1.0
CTMLE
mean
0.4107
0.3129
−1.0375
0.3958
0.3017
SD
0.0154
0.0189
0.0188
0.0121
0.0193
MSE
0.1689
0.4724
0.0018
0.1568
0.4880
GAME
mean
0.2010
0.6953
−0.9884
0.2032
0.7039
SD
0.0180
0.0216
0.0235
0.0160
0.0275
MSE
0.0407
0.0933
0.0007
0.0416
0.0884
PME
mean
3.3389
−1.2292
−1.0017
3.3362
−1.2284
SD
0.0008
0.0011
0.0011
0.0094
0.0027
MSE
11.1490
4.9702
0.0011
11.1399
4.9684
Figure 5. Boundaries obtained from the true parameter values (True), CTMLE, GAME, and PME for Heteroscedastic distribution II: Case 4 in Table 4.
4. Discussion
ReLU and max pooling are widely used in DL. However, reliable outcomes cannot be obtained if the distribution of error terms is misspecified or if inappropriate methods are applied. The Monte Carlo results support this claim. Both CTMLE and MPE exhibit substantial biases and large MSEs under non-normal (fat-tailed) or heteroscedastic distributions.
GAME is a semiparametric estimator and is consistent under very general assumptions. Although the SDs of GAME are slightly larger than those of CTMLE, GAME clearly outperforms CTMLE under non-normal (fat-tailed) and heteroscedastic distributions. When ε i follows an i.i.d. standard normal distribution, the expected value of ζ=max{ 0, ε 1 , ε 2 ,⋯, ε 9 } is 1.485, which coincides with the bias of MPE for w 0 . The biases for w 1 and w 2 are very small, at 0.003 and -0.004, respectively. The estimated boundary is nearly parallel to the true one. Many studies using ReLU exclude the constant term. This may be related to the fact that the max pooling method yields useful results in standard cases. In the other situations, however, MPE is a poor estimator not only for the constant but also for the slope coefficients. MPE produces highly inaccurate boundaries. In particular, when the variance is a decreasing function of x 1 (Case 4), the boundary slope becomes negative, and the estimated region where P[ y i >0 ] is opposite to the true one.
Figure 6 illustrates an example of max, average, and median pooling, where the maximum, average, and median are calculated from nine cells using ReLU( η )=max{ 0,η } , with η following the standard Cauchy distribution. In this figure, no signal (S) exists; all non-zero values represent random noise (N), and the signal-to-noise ratio (S/N) is zero. Max pooling yields a very high value of 79, as if important information were contained in those cells. Average pooling [42]-[45] also produces relatively large values (9.33). In contrast, median pooling results in zero. Max pooling emphasizes large values in the observations. While this may be useful for detecting very weak signals in the sample space, it can lead to incorrect results when noise is substantial and S/N is low. As high- or ultra-high-resolution 2D or 3D images with considerable noise (small S/N) are becoming increasingly common, careful treatment to eliminate or reduce noise is essential. In DL application areas, such as medical or satellite imaging where high noise is common and robustness is most critical, the approach considered in this paper may be quite useful.
Figure 6. Example of max, average, and median pooling under the Cauchy distribution.
5. Conclusions
The ReLU and max pooling methods are widely used in DL. The Tobit model, commonly applied in econometrics and statistics, is closely related to ReLU. However, the knowledge obtained from the Tobit model has not been fully incorporated into DL. In this paper, we considered ReLU estimation using the Tobit framework combined with a grouping approach. The CTMLE, which assumes i.i.d. normal errors, exhibits large biases under fat-tailed (quasi-Cauchy) or heteroscedastic error distributions. We considered GAME, which combines grouping, adjustment, the median, and a weighted Tobit method. GAME is robust, remaining consistent under both non-normal and heteroscedastic error distributions. Moreover, unlike in binary cases, its asymptotic distribution is obtainable.
Monte Carlo results show that GAME outperforms CTMLE under non-normal or heteroscedastic errors. In the i.i.d. normal case, where CTMLE is efficient, GAME’s loss of efficiency is small; its biases and SDs remain close to those of CTMLE. In contrast, the MPE performs poorly, exhibiting large biases, SDs, and MSEs. In some cases, the region MPE suggests for P[ y i >0 ] is even the opposite of the true region. Max pooling selects the maximum value in each group, which can help detect weak signals but also amplifies noise when the signal-to-noise ratio is low. Thus, max pooling requires caution, and noise-reducing methods such as GAME are essential.
Acknowledgements
The author would also like to thank an anonymous reviewer for his/her helpful comments and suggestions.
Rate this article
Login to rate this article
Comments
Please login to comment
No comments yet. Be the first to comment!
