4.3. 使用LARS / LAMB 优化分布式超大batch 训练

编辑:佚名 日期:2024-06-10 07:19 / 人气:

-----------  Configuration Arguments -----------
gpus: 0,1
heter_worker_num: None
heter_workers:
http_port: None
ips: 127.0.0.1
log_dir: log
...
------------------------------------------------
...
+=======================================================================================+
|                        Distributed Envs                      Value                    |
+---------------------------------------------------------------------------------------+
|                       PADDLE_TRAINER_ID                        0                      |
|                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:12464               |
|                     PADDLE_TRAINERS_NUM                        2                      |
|                PADDLE_TRAINER_ENDPOINTS         127.0.0.1:12464,127.0.0.1:43227       |
|                     FLAGS_selected_gpus                        0                      |
+=======================================================================================+
...
+==============================================================================+
|                                                                              |
|                         DistributedStrategy Overview                         |
|                                                                              |
+==============================================================================+
|                          lars=True <-> lars_configs                          |
+------------------------------------------------------------------------------+
|                            lars_coeff          0.0010000000474974513         |
|                     lars_weight_decay          0.0005000000237487257         |
|                               epsilon                   0.0                  |
|             exclude_from_weight_decay                batch_norm              |
|                                                         .b_0                 |
+==============================================================================+
...
W0114 18:07:51.588716 16234 device_context.cc:346] Please NOTE: device: 4, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.0
W0114 18:07:51.593963 16234 device_context.cc:356] device: 4, cuDNN Version: 7.6.
[Epoch 0, batch 0] loss: 0.14651, acc1: 0.00000, acc5: 0.00000
[Epoch 0, batch 5] loss: 1.82926, acc1: 0.00000, acc5: 0.00000
[Epoch 0, batch 10] loss: 0.00000, acc1: 0.00000, acc5: 0.00000
[Epoch 0, batch 15] loss: 0.13787, acc1: 0.03125, acc5: 0.03125
[Epoch 0, batch 20] loss: 0.12400, acc1: 0.03125, acc5: 0.06250
[Epoch 0, batch 25] loss: 0.17749, acc1: 0.00000, acc5: 0.00000
...

现在致电 13988889999 OR 查看更多联系方式 →

Top 回顶部

平台注册入口