Hiperparámetros

Entrenamiento online, por lotes y minilotes

Finnoff, W. (1994): "Diffusion approximations for the constant learning rate backpropagation algorithm and resistance to local minima." Neural Computation, 6(2):285–295. DOI https://doi.org/10.1162/neco.1994.6.2.285
Wilson, D. R. & Martinez, T. R. (2003): "The general inefficiency of batch training for gradient descent learning." Neural Networks, 16, 1429–1451. DOI https://doi.org/10.1016/S0893-6080(03)00138-2
Takéhiko Nakama (2009): "Theoretical analysis of batch and on-line training for gradient descent learning in neural networks." Neurocomputing 73(1-3):151-159. DOI https://doi.org/10.1016/j.neucom.2009.05.017
Xu, Z.-B., Zhang, R. & Jing, W.-F. (2009): "When does online BP training converge?" IEEE Transactions on Neural Networks, 20(10):1529–1539. DOI https://doi.org/10.1109/TNN.2009.2025946
Zhang, R., Xu, Z.-B., Huang, G.-B. & Wang, D. (2012): "Global convergence of online BP training with dynamic learning rate." IEEE Transactions on Neural Networks and Learning Systems, 23(2):330–341. DOI https://doi.org/10.1109/TNNLS.2011.2178315

Entrenamiento cíclico

Heskes, T. & Wiegerinck, W. (1996): "A theoretical comparison of batch-mode, on-line, cyclic, and almost-cyclic learning." IEEE Transactions on Neural Networks, 7, 919–925. DOI https://doi.org/10.1109/72.508935

Algoritmos de entrenamiento de redes neuronales multicapa

Momentos

Wang, J., Yang, J. & Wu, W. (2011): "Convergence of cyclic and almost-cyclic learning with momentum for feedforward neural networks." IEEE Transactions on Neural Networks, 22(8):1297–1306. DOI https://doi.org/10.1109/TNN.2011.2159992
Asynchrony begets Momentum, with an Application to Deep Learning, arXiv 2016, https://arxiv.org/abs/1605.09774 (cf. http://stanford.edu/~imit/tuneyourmomentum/theory/ )
YellowFin and the Art of Momentum Tuning, arXiv 2017, https://arxiv.org/abs/1706.03471 (cf. http://dawn.cs.stanford.edu/2017/07/05/yellowfin/ & http://mitliagkas.github.io/async-tuner/ ).

Un tercer término, como en el control PID

Zweiri, Y. H., Whidborne, J. F. & Seneviratne, L. D. (2003): "A three-term backpropagation algorithm." Neurocomputing, 50, 305–318. DOI https://doi.org/10.1016/S0925-2312(02)00569-6

Backpropagation "emocional"

Khashman, A. (2008): "A modified backpropagation learning algorithm with added emotional coefficients." IEEE Transactions on Neural Networks, 19(11):1896–1909. DOI https://doi.org/10.1109/TNN.2008.2002913

Extrapolaciones de pesos

Kamarthi, S. V. & Pittner, S. (1999): "Accelerating neural network training using weight extrapolations." Neural Networks, 12, 1285–1299. DOI https://doi.org/10.1016/S0893-6080(99)00072-6

Teoría de Lyapunov

Yu, X., Efe, M.O. & Kaynak, O. (2002): "A general backpropagation algorithm for feedforward neural networks learning." IEEE Transactions on Neural Networks, 13(1):251–254
Behera, L., Kumar, S. & Patnaik, A. (2006): "On adaptive learning rate that guarantees convergence in feedforward networks." IEEE Transactions on Neural Networks, 17(5):1116–1125
Man, Z.,Wu, H. R., Liu, S. &Yu, X. (2006): "A new adaptive backpropagation algorithm based on Lyapunov stability theory for neural networks." IEEE Transactions on Neural Networks, 17(6):1580–1591.

Sliding mode control-based adaptive learning

Sira-Ramirez, H., & Colina-Morles, E. (1995): "A sliding mode strategy for adaptive learning in Adalines." IEEE Transactions on Circuits and Systems I, 42(12), 1001–1012.
Parma, G. G., Menezes, B. R. & Braga, A. P. (1998): "Sliding mode algorithm for training multilayer artificial neural networks." Electronics Letters, 34(1):97–98.

Aproximaciones sucesivas

Liang,Y. C., Feng, D. P., Lee, H. P., Lim, S. P. &Lee, K. H. (2002): "Successive approximation training algorithm for feedforward neural networks." Neurocomputing, 42, 311–322.

Aprendizaje en dos fases con ascenso del gradiente

Tang, Z., Wang, X., Tamura, H., & Ishii, M. (2003). "An algorithm of supervised learning for multilayer neural networks." Neural Computation, 15, 1125–1142.

Descenso global: TRUST [terminal repeller unconstrained subenergy tunneling]

Barhen, J., Protopopescu, V. & Reister, D. (1997): "TRUST: A deterministic algorithm for global optimization." Science, 276, 1094–1097
Cetin, B. C., Burdick, J. W. & Barhen, J. (1993): "Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks." In Proceedings of IEEE International Conference on Neural Networks (pp. 836–842). San Francisco
Chowdhury, P., Singh, Y. P., & Chansarkar, R. A. (1999). Dynamic tunneling techniquefor efficient training of multilayer perceptrons. IEEE Transactions on Neural Networks, 10(1):48–55.

Atractores terminales (órdenes de magnitud más rápidos)

Zak, M. (1989): "Terminal attractors in neural networks." Neural Networks, 2, 259–274.
Wang, S. D. & Hsu, C. H. (1991): "Terminal attractor learning algorithms for back propagation neural networks."In Proceedings of the International Joint Conference on Neural Networks (pp. 183–189). Seattle, WA.
Jiang, M. &Yu, X. (2001): "Terminal attractor based back propagation learning for feedforward neural networks." In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS) (Vol. 2, pp. 711–714). Sydney, Australia.

Algoritmos robustos (desde el punto de vista estadístico)

White, H. (1989): "Learning in artificial neural networks: A statistical perspective." Neural Computation, 1(4), 425–469
Chen, D. S. & Jain, R. C. (1994): "A robust backpropagation learning algorithm for function approximation." IEEE Transactions on Neural Networks, 5(3):467–479.
Chuang, C. C., Su, S. F. & Hsiao, C. C. (2000): "The annealing robust backpropagation (ARBP) learning algorithm." IEEE Transactions on Neural Networks, 11(5), 1067–1077.
Pernia-Espinoza, A. V., Ordieres-Mere, J. B., Martinez-de-Pison, F. J. & Gonzalez-Marcos, A. (2005): "TAO-robust backpropagation learning algorithm." Neural Networks, 18, 191–204

Sin propagación de errores hacia atrás

Brouwer, R. K. (1997): "Training a feed-forward network by feeding gradients forward rather than by back-propagation of errors." Neurocomputing, 16, 117–126. DOI https://doi.org/10.1016/S0925-2312(97)00020-9

Con neuronas lineales de salida

Manry, M. T., Apollo, S. J., Allen, L. S., Lyle,W. D., Gong,W., Dawson, M. S., et al. (1994): "Fast training of neural networks for remote sensing." Remote Sensing Reviews, 9, 77–96. DOI http://dx.doi.org/10.1080/02757259409532216

Inicialización de los pesos de la red

Kolen, J. F. &Pollack, J. B. (1990). Backpropagation is sensitive to initial conditions. Complex Systems, 4(3), 269–280
Lee,Y., Oh, S. H., & Kim, M.W. (1991): "The effect of initial weights on premature saturation in back-propagation training." In Proc. IEEE International Joint Conference on Neural Networks (Vol. 1, pp. 765–770). Seattle, WA.
Nguyen, D. & Widrow, B. (1990): "Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights." In Proceedings of International Joint Conference on Neural Networks (Vol. 3, pp. 21–26). San Diego, CA.
Wessels, L. F. A. & Barnard, E. (1992): "Avoiding false local minima by proper initialization of connections." IEEE Transactions on Neural Networks, 3(6):899–905
Drago, G. & Ridella, S. (1992): "Statistically controlled activation weight initialization (SCAWI)." IEEE Transactions on Neural Networks, 3(4):627–631
Thimm, G. & Fiesler, E. (1997): "High-order and multilayer perceptron initialization." IEEE Transactions on Neural Networks, 8(2):349–359.
McLoone, S., Brown, M. D., Irwin, G., Lightbody, G. (1998): "A hybrid linear/nonlinear training algorithm for feedforward neural networks." IEEE Transactions on Neural Networks, 9(4):669–684.
Yam, J. Y. F., Chow, T. W. S. (2001): "Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients." IEEE Transactions on Neural Networks, 12(2):430–434

Estimación paramétrica & técnicas de clustering

Denoeux, T., & Lengelle, R. (1993): "Initializing backpropagation networks with prototypes." Neural Networks, 6(3):351–363.
Lehtokangas, M., Saarinen, J., Huuhtanen, P., & Kaski, K. (1995): "Initializing weights of a multilayer perceptron network by using the orthogonal least squares algorithm." Neural Computation, 7, 982–999.
Smyth, S. G. (1992): "Designing multilayer perceptrons from nearest neighbor systems." IEEE Transactions on Neural Networks, 3(2):329–333.
Weymaere, N. & Martens, J. P. (1994): "On the initializing and optimization of multilayer perceptrons." IEEE Transactions on Neural Networks, 5, 738–751.
Lehtokangas, M., Saarinen, J., Huuhtanen, P., & Kaski, K. (1995). Initializing weights of a multilayer perceptron network by using the orthogonal least squares algorithm. Neural Computation, 7, 982–999.
Yam, J. Y. F., & Chow, T.W. S. (2000): "A weight initialization method for improving training speed in feedforward neural network." Neurocomputing, 30, 219–232.
Yam, Y. F., Chow, T. W. S. & Leung, C. T. (1997): "A new method in determining the initial weights of feedforward neural networks." Neurocomputing, 16, 23–32.
Yam, J. Y. F. & Chow, T. W. S. (2001): "Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients." IEEE Transactions on Neural Networks, 12(2):430–434.
Yam, Y. F., Leung, C. T., Tam, P. K. S. & Siu, W. C. (2002): "An independent component analysis based weight initialization method for multilayer perceptrons." Neurocomputing, 48, 807–818.
Costa, P. &Larzabal, P. (1999): "Initialization of supervised training for parametric estimation." Neural Processing Letters, 9, 53–61.

Ajuste de la topología de la red

Poda de redes neuronales (múltiples heurísticas)

Mozer,M. C. & Smolensky, P. (1989): "Using relevance to reduce network size automatically." Connection Science, 1(1):3–16.
Le Cun, Y., Denker, J. S., & Solla, S. A. (1990): "Optimal brain damage." In D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems 2, pp. 598–605). San Mateo, CA: Morgan Kaufmann.
Karnin, E. D. (1990): "A simple procedure for pruning back-propagation trained neural networks." IEEE Transactions on Neural Networks, 1(2):239–242.
Sietsma, J. & Dow, R. J. F. (1991): "Creating artificial neural networks that generalize." Neural Networks, 4, 67–79.
Hassibi, B., Stork, D. G. & Wolff, G. J. (1992): "Optimal brain surgeon and general network pruning." In Proceedings of IEEE International Conference on Neural Networks (pp. 293–299). San Francisco.
Goh, Y. S. & Tan, E. C. (1994): "Pruning neural networks during training by backpropagation." In Proceedings of IEEE Region 10’s Ninth Ann. Int. Conf. (TENCON’94), pp. 805–808. Singapore
Ponnapalli, P. V. S., Ho, K. C. & Thomson, M. (1999): "A formal selection and pruning algorithm for feedforward artificial neural network optimization." IEEE Transactions on Neural Networks, 10(4):964–968
Chandrasekaran, H., Chen, H. H. & Manry, M. T. (2000): "Pruning of basis functions in nonlinear approximators." Neurocomputing, 34, 29–53.
Castellano, G., Fanelli, A. M. & Pelillo, M. (1997): "An iterative pruning algorithm for feedforward neural networks." IEEE Transactions on Neural Networks, 8(3):519–531.
Kanjilal, P. P. & Banerjee, D. N. (1995): "On the application of orthogonal transformation for the design and analysis of feedforward networks." IEEE Transactions on Neural Networks, 6(5):1061–1070.
Teoh, E. J., Tan, K. C. & Xiang, C. (2006): "Estimating the number of hidden neurons in a feedforward network using the singular value decomposition." IEEE Transactions on Neural Networks, 17(6):1623–1629
Zurada, J. M., Malinowski, A. & Usui, S. (1997): "Perturbation method for deleting redundant inputs of perceptron networks." Neurocomputing, 14, 177–193.
Xing, H.-J. & Hu, B.-G. (2009): "Two-phase construction of multilayer perceptrons using information theory." IEEE Transactions on Neural Networks, 20(4):715–721.
Cibas, T., Soulie, F. F., Gallinari, P. & Raudys, S. (1996): "Variable selection with neural networks." Neurocomputing, 12, 223–248.
Stahlberger, A. & Riedmiller, M. (1997): "Fast network pruning and feature extraction using the unit-OBS algorithm." In M. C. Mozer, M. I. Jordan & T. Petsche (Eds.), Advances in Neural Information Processing Systems 9, pp. 655–661. Cambridge, MA: MIT Press.
Levin, A. U., Leen, T. K. & Moody, J. E. (1994): "Fast pruning using principal components." In J. D. Cowan, G. Tesauro & J. Alspector (Eds.), Advances in Neural Information Processing Systems 6, pp. 35–42. San Francisco, CA: Morgan Kaufman.
Tresp, V., Neuneier, R. & Zimmermann, H. G. (1997): "Early brain damage." In M. Mozer, M. I. Jordan & P. Petsche (Eds.), Advances in Neural Information Processing Systems 9, pp. 669–675) Cambridge, MA: MIT Press.
Leung, C. S., Wong, K. W., Sum, P. F. & Chan, L. W. (2001): "A pruning method for the recursive least squared algorithm." Neural Networks, 14, 147–174
Sum, J., Leung, C. S., Young, G. H. & Kan,W. K. (1999): "On the Kalman filtering method in neural network training and pruning." IEEE Transactions on Neural Networks, 10:161–166.
Engelbrecht, A. P. (2001): "A new pruning heuristic based on variance analysis of sensitivity information. "IEEE Transactions on Neural Networks, 12(6):1386–1399.

Crecimiento de redes neuronales (proceso opuesto a su poda): Redes constructivas

Mezard, M., & Nadal, J. P. (1989). "Learning in feedforward layered networks: The tiling algorithm." Journal of Physics A, 22, 2191–2203
Frean, M. (1990): "The upstart algorithm: A method for constructing and training feedforward neural networks." Neural Computation, 2(2), 198–209
Gallant, S. I. (1990): "Perceptron-based learning algorithms." IEEE Transactions on Neural Networks, 1(2), 179–191.
Fahlman, S. E., & Lebiere, C. (1990): "The cascade-correlation learning architecture." In D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems 2, pp. 524–532. San Mateo, CA: Morgan Kaufmann.
Kwok, T. Y., & Yeung, D. Y. (1997): "Objective functions for training new hidden units in constructive neural networks." IEEE Transactions on Neural Networks, 8(5):1131–1148
Lehtokangas, M. (1999): "Modelling with constructive backpropagation." Neural Networks, 12, 707–716.
Moody, J. O., & Antsaklis, P. J. (1996): "The dependence identification neural network construction algorithm." IEEE Transactions on Neural Networks, 7(1):3–13.
Liu, D., Chang, T. S., & Zhang, Y. (2002): "A constructive algorithm for feedforward neural networks with incremental training." IEEE Transactions on Circuits and Systems I, 49(12):1876–1879.
Rathbun, T. F., Rogers, S. K., DeSimio, M. P., & Oxley, M. E. (1997): "MLP iterative construction algorithm." Neurocomputing, 17, 195–216.
Setiono, R., & Hui, L. C. K. (1995): "Use of quasi-Newton method in a feed-forward neural network construction algorithm." IEEE Transactions on Neural Networks, 6(1):273–277.

Google Sites

Report abuse