Hiperparámetros

Entrenamiento online, por lotes y minilotes

  • Finnoff, W. (1994): "Diffusion approximations for the constant learning rate backpropagation algorithm and resistance to local minima." Neural Computation, 6(2):285–295. DOI https://doi.org/10.1162/neco.1994.6.2.285
  • Wilson, D. R. & Martinez, T. R. (2003): "The general inefficiency of batch training for gradient descent learning." Neural Networks, 16, 1429–1451. DOI https://doi.org/10.1016/S0893-6080(03)00138-2
  • Takéhiko Nakama (2009): "Theoretical analysis of batch and on-line training for gradient descent learning in neural networks." Neurocomputing 73(1-3):151-159. DOI https://doi.org/10.1016/j.neucom.2009.05.017
  • Xu, Z.-B., Zhang, R. & Jing, W.-F. (2009): "When does online BP training converge?" IEEE Transactions on Neural Networks, 20(10):1529–1539. DOI https://doi.org/10.1109/TNN.2009.2025946
  • Zhang, R., Xu, Z.-B., Huang, G.-B. & Wang, D. (2012): "Global convergence of online BP training with dynamic learning rate." IEEE Transactions on Neural Networks and Learning Systems, 23(2):330–341. DOI https://doi.org/10.1109/TNNLS.2011.2178315

Entrenamiento cíclico

  • Heskes, T. & Wiegerinck, W. (1996): "A theoretical comparison of batch-mode, on-line, cyclic, and almost-cyclic learning." IEEE Transactions on Neural Networks, 7, 919–925. DOI https://doi.org/10.1109/72.508935


Algoritmos de entrenamiento de redes neuronales multicapa

Momentos

Un tercer término, como en el control PID

Backpropagation "emocional"

  • Khashman, A. (2008): "A modified backpropagation learning algorithm with added emotional coefficients." IEEE Transactions on Neural Networks, 19(11):1896–1909. DOI https://doi.org/10.1109/TNN.2008.2002913

Extrapolaciones de pesos

Teoría de Lyapunov

  • Yu, X., Efe, M.O. & Kaynak, O. (2002): "A general backpropagation algorithm for feedforward neural networks learning." IEEE Transactions on Neural Networks, 13(1):251–254
  • Behera, L., Kumar, S. & Patnaik, A. (2006): "On adaptive learning rate that guarantees convergence in feedforward networks." IEEE Transactions on Neural Networks, 17(5):1116–1125
  • Man, Z.,Wu, H. R., Liu, S. &Yu, X. (2006): "A new adaptive backpropagation algorithm based on Lyapunov stability theory for neural networks." IEEE Transactions on Neural Networks, 17(6):1580–1591.

Sliding mode control-based adaptive learning

  • Sira-Ramirez, H., & Colina-Morles, E. (1995): "A sliding mode strategy for adaptive learning in Adalines." IEEE Transactions on Circuits and Systems I, 42(12), 1001–1012.
  • Parma, G. G., Menezes, B. R. & Braga, A. P. (1998): "Sliding mode algorithm for training multilayer artificial neural networks." Electronics Letters, 34(1):97–98.

Aproximaciones sucesivas

  • Liang,Y. C., Feng, D. P., Lee, H. P., Lim, S. P. &Lee, K. H. (2002): "Successive approximation training algorithm for feedforward neural networks." Neurocomputing, 42, 311–322.

Aprendizaje en dos fases con ascenso del gradiente

  • Tang, Z., Wang, X., Tamura, H., & Ishii, M. (2003). "An algorithm of supervised learning for multilayer neural networks." Neural Computation, 15, 1125–1142.

Descenso global: TRUST [terminal repeller unconstrained subenergy tunneling]

  • Barhen, J., Protopopescu, V. & Reister, D. (1997): "TRUST: A deterministic algorithm for global optimization." Science, 276, 1094–1097
  • Cetin, B. C., Burdick, J. W. & Barhen, J. (1993): "Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks." In Proceedings of IEEE International Conference on Neural Networks (pp. 836–842). San Francisco
  • Chowdhury, P., Singh, Y. P., & Chansarkar, R. A. (1999). Dynamic tunneling techniquefor efficient training of multilayer perceptrons. IEEE Transactions on Neural Networks, 10(1):48–55.

Atractores terminales (órdenes de magnitud más rápidos)

  • Zak, M. (1989): "Terminal attractors in neural networks." Neural Networks, 2, 259–274.
  • Wang, S. D. & Hsu, C. H. (1991): "Terminal attractor learning algorithms for back propagation neural networks."In Proceedings of the International Joint Conference on Neural Networks (pp. 183–189). Seattle, WA.
  • Jiang, M. &Yu, X. (2001): "Terminal attractor based back propagation learning for feedforward neural networks." In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS) (Vol. 2, pp. 711–714). Sydney, Australia.

Algoritmos robustos (desde el punto de vista estadístico)

  • White, H. (1989): "Learning in artificial neural networks: A statistical perspective." Neural Computation, 1(4), 425–469
  • Chen, D. S. & Jain, R. C. (1994): "A robust backpropagation learning algorithm for function approximation." IEEE Transactions on Neural Networks, 5(3):467–479.
  • Chuang, C. C., Su, S. F. & Hsiao, C. C. (2000): "The annealing robust backpropagation (ARBP) learning algorithm." IEEE Transactions on Neural Networks, 11(5), 1067–1077.
  • Pernia-Espinoza, A. V., Ordieres-Mere, J. B., Martinez-de-Pison, F. J. & Gonzalez-Marcos, A. (2005): "TAO-robust backpropagation learning algorithm." Neural Networks, 18, 191–204

Sin propagación de errores hacia atrás

Con neuronas lineales de salida

  • Manry, M. T., Apollo, S. J., Allen, L. S., Lyle,W. D., Gong,W., Dawson, M. S., et al. (1994): "Fast training of neural networks for remote sensing." Remote Sensing Reviews, 9, 77–96. DOI http://dx.doi.org/10.1080/02757259409532216


Inicialización de los pesos de la red

  • Kolen, J. F. &Pollack, J. B. (1990). Backpropagation is sensitive to initial conditions. Complex Systems, 4(3), 269–280
  • Lee,Y., Oh, S. H., & Kim, M.W. (1991): "The effect of initial weights on premature saturation in back-propagation training." In Proc. IEEE International Joint Conference on Neural Networks (Vol. 1, pp. 765–770). Seattle, WA.
  • Nguyen, D. & Widrow, B. (1990): "Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights." In Proceedings of International Joint Conference on Neural Networks (Vol. 3, pp. 21–26). San Diego, CA.
  • Wessels, L. F. A. & Barnard, E. (1992): "Avoiding false local minima by proper initialization of connections." IEEE Transactions on Neural Networks, 3(6):899–905
  • Drago, G. & Ridella, S. (1992): "Statistically controlled activation weight initialization (SCAWI)." IEEE Transactions on Neural Networks, 3(4):627–631
  • Thimm, G. & Fiesler, E. (1997): "High-order and multilayer perceptron initialization." IEEE Transactions on Neural Networks, 8(2):349–359.
  • McLoone, S., Brown, M. D., Irwin, G., Lightbody, G. (1998): "A hybrid linear/nonlinear training algorithm for feedforward neural networks." IEEE Transactions on Neural Networks, 9(4):669–684.
  • Yam, J. Y. F., Chow, T. W. S. (2001): "Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients." IEEE Transactions on Neural Networks, 12(2):430–434

Estimación paramétrica & técnicas de clustering

  • Denoeux, T., & Lengelle, R. (1993): "Initializing backpropagation networks with prototypes." Neural Networks, 6(3):351–363.
  • Lehtokangas, M., Saarinen, J., Huuhtanen, P., & Kaski, K. (1995): "Initializing weights of a multilayer perceptron network by using the orthogonal least squares algorithm." Neural Computation, 7, 982–999.
  • Smyth, S. G. (1992): "Designing multilayer perceptrons from nearest neighbor systems." IEEE Transactions on Neural Networks, 3(2):329–333.
  • Weymaere, N. & Martens, J. P. (1994): "On the initializing and optimization of multilayer perceptrons." IEEE Transactions on Neural Networks, 5, 738–751.
  • Lehtokangas, M., Saarinen, J., Huuhtanen, P., & Kaski, K. (1995). Initializing weights of a multilayer perceptron network by using the orthogonal least squares algorithm. Neural Computation, 7, 982–999.
  • Yam, J. Y. F., & Chow, T.W. S. (2000): "A weight initialization method for improving training speed in feedforward neural network." Neurocomputing, 30, 219–232.
  • Yam, Y. F., Chow, T. W. S. & Leung, C. T. (1997): "A new method in determining the initial weights of feedforward neural networks." Neurocomputing, 16, 23–32.
  • Yam, J. Y. F. & Chow, T. W. S. (2001): "Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients." IEEE Transactions on Neural Networks, 12(2):430–434.
  • Yam, Y. F., Leung, C. T., Tam, P. K. S. & Siu, W. C. (2002): "An independent component analysis based weight initialization method for multilayer perceptrons." Neurocomputing, 48, 807–818.
  • Costa, P. &Larzabal, P. (1999): "Initialization of supervised training for parametric estimation." Neural Processing Letters, 9, 53–61.


Ajuste de la topología de la red

Poda de redes neuronales (múltiples heurísticas)

  • Mozer,M. C. & Smolensky, P. (1989): "Using relevance to reduce network size automatically." Connection Science, 1(1):3–16.
  • Le Cun, Y., Denker, J. S., & Solla, S. A. (1990): "Optimal brain damage." In D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems 2, pp. 598–605). San Mateo, CA: Morgan Kaufmann.
  • Karnin, E. D. (1990): "A simple procedure for pruning back-propagation trained neural networks." IEEE Transactions on Neural Networks, 1(2):239–242.
  • Sietsma, J. & Dow, R. J. F. (1991): "Creating artificial neural networks that generalize." Neural Networks, 4, 67–79.
  • Hassibi, B., Stork, D. G. & Wolff, G. J. (1992): "Optimal brain surgeon and general network pruning." In Proceedings of IEEE International Conference on Neural Networks (pp. 293–299). San Francisco.
  • Goh, Y. S. & Tan, E. C. (1994): "Pruning neural networks during training by backpropagation." In Proceedings of IEEE Region 10’s Ninth Ann. Int. Conf. (TENCON’94), pp. 805–808. Singapore
  • Ponnapalli, P. V. S., Ho, K. C. & Thomson, M. (1999): "A formal selection and pruning algorithm for feedforward artificial neural network optimization." IEEE Transactions on Neural Networks, 10(4):964–968
  • Chandrasekaran, H., Chen, H. H. & Manry, M. T. (2000): "Pruning of basis functions in nonlinear approximators." Neurocomputing, 34, 29–53.
  • Castellano, G., Fanelli, A. M. & Pelillo, M. (1997): "An iterative pruning algorithm for feedforward neural networks." IEEE Transactions on Neural Networks, 8(3):519–531.
  • Kanjilal, P. P. & Banerjee, D. N. (1995): "On the application of orthogonal transformation for the design and analysis of feedforward networks." IEEE Transactions on Neural Networks, 6(5):1061–1070.
  • Teoh, E. J., Tan, K. C. & Xiang, C. (2006): "Estimating the number of hidden neurons in a feedforward network using the singular value decomposition." IEEE Transactions on Neural Networks, 17(6):1623–1629
  • Zurada, J. M., Malinowski, A. & Usui, S. (1997): "Perturbation method for deleting redundant inputs of perceptron networks." Neurocomputing, 14, 177–193.
  • Xing, H.-J. & Hu, B.-G. (2009): "Two-phase construction of multilayer perceptrons using information theory." IEEE Transactions on Neural Networks, 20(4):715–721.
  • Cibas, T., Soulie, F. F., Gallinari, P. & Raudys, S. (1996): "Variable selection with neural networks." Neurocomputing, 12, 223–248.
  • Stahlberger, A. & Riedmiller, M. (1997): "Fast network pruning and feature extraction using the unit-OBS algorithm." In M. C. Mozer, M. I. Jordan & T. Petsche (Eds.), Advances in Neural Information Processing Systems 9, pp. 655–661. Cambridge, MA: MIT Press.
  • Levin, A. U., Leen, T. K. & Moody, J. E. (1994): "Fast pruning using principal components." In J. D. Cowan, G. Tesauro & J. Alspector (Eds.), Advances in Neural Information Processing Systems 6, pp. 35–42. San Francisco, CA: Morgan Kaufman.
  • Tresp, V., Neuneier, R. & Zimmermann, H. G. (1997): "Early brain damage." In M. Mozer, M. I. Jordan & P. Petsche (Eds.), Advances in Neural Information Processing Systems 9, pp. 669–675) Cambridge, MA: MIT Press.
  • Leung, C. S., Wong, K. W., Sum, P. F. & Chan, L. W. (2001): "A pruning method for the recursive least squared algorithm." Neural Networks, 14, 147–174
  • Sum, J., Leung, C. S., Young, G. H. & Kan,W. K. (1999): "On the Kalman filtering method in neural network training and pruning." IEEE Transactions on Neural Networks, 10:161–166.
  • Engelbrecht, A. P. (2001): "A new pruning heuristic based on variance analysis of sensitivity information. "IEEE Transactions on Neural Networks, 12(6):1386–1399.

Crecimiento de redes neuronales (proceso opuesto a su poda): Redes constructivas

  • Mezard, M., & Nadal, J. P. (1989). "Learning in feedforward layered networks: The tiling algorithm." Journal of Physics A, 22, 2191–2203
  • Frean, M. (1990): "The upstart algorithm: A method for constructing and training feedforward neural networks." Neural Computation, 2(2), 198–209
  • Gallant, S. I. (1990): "Perceptron-based learning algorithms." IEEE Transactions on Neural Networks, 1(2), 179–191.
  • Fahlman, S. E., & Lebiere, C. (1990): "The cascade-correlation learning architecture." In D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems 2, pp. 524–532. San Mateo, CA: Morgan Kaufmann.
  • Kwok, T. Y., & Yeung, D. Y. (1997): "Objective functions for training new hidden units in constructive neural networks." IEEE Transactions on Neural Networks, 8(5):1131–1148
  • Lehtokangas, M. (1999): "Modelling with constructive backpropagation." Neural Networks, 12, 707–716.
  • Moody, J. O., & Antsaklis, P. J. (1996): "The dependence identification neural network construction algorithm." IEEE Transactions on Neural Networks, 7(1):3–13.
  • Liu, D., Chang, T. S., & Zhang, Y. (2002): "A constructive algorithm for feedforward neural networks with incremental training." IEEE Transactions on Circuits and Systems I, 49(12):1876–1879.
  • Rathbun, T. F., Rogers, S. K., DeSimio, M. P., & Oxley, M. E. (1997): "MLP iterative construction algorithm." Neurocomputing, 17, 195–216.
  • Setiono, R., & Hui, L. C. K. (1995): "Use of quasi-Newton method in a feed-forward neural network construction algorithm." IEEE Transactions on Neural Networks, 6(1):273–277.