Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bert not working properly #68

Open
Rohitkr1997 opened this issue Apr 19, 2022 · 9 comments
Open

Bert not working properly #68

Rohitkr1997 opened this issue Apr 19, 2022 · 9 comments

Comments

@Rohitkr1997
Copy link

Can anyone upload the environment.yml or the versions of keras, tensorflow, nilmtk, nilmtk-contrib as bert requires keras.layers.multi_head_attention and it does not work properly with the versions of keras used after conda installing nilmtk and nilmtk-contrib. upgrading keras and tensorflow causes conflicts after which nilmtk cannot be used.

@Rohitkr1997
Copy link
Author

Or anyone who has working version of bert please can you upload the output of conda list.

@paulfrank1997
Copy link

You need tensorflow2.5.0 or higher version of it. Since keras is already a inner part pf tensorflow2.5.0, you don't need to install keras individually.

@Rohitkr1997
Copy link
Author

I have tried using tensorflow version 2.6.0 but the environment has conflicts which creates problems. Could you please upload your environment.yml file or share the result of conda list so that I have a proper environment where everything works

@Rohitkr1997
Copy link
Author

If you share all the different packages you're using then I could just use your anaconda environment and avoid all the different conflicts that are in my environment.

@paulfrank1997
Copy link

paulfrank1997 commented Apr 26, 2022 via email

@xuuurq
Copy link

xuuurq commented Aug 9, 2022

@paulfrank1997 Hello, I installed tensorflow 2.5.0, and h5py is currently updated to the latest 3.7.0, but an error is reported after running, prompting ImportError: save_model requires h5py, I would like to know which version of h5py you installed. Thank you.Below is the result of running.
`D:\anaconda3\envs\nilmxu\python.exe D:/mywork/nilmtkcontribxu/nilmtk_contrib/disaggregate/fuhefenjie.py
2022-08-09 15:23:30.591863: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-08-09 15:23:30.591978: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-08-09 15:23:32.978920: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
Started training for BERT
Joint training for BERT
............... Loading Data for training ...................
Loading data for redd dataset
2022-08-09 15:23:32.999158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3050 Laptop GPU computeCapability: 8.6
coreClock: 1.5GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2022-08-09 15:23:33.000126: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-08-09 15:23:33.000876: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2022-08-09 15:23:33.001617: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2022-08-09 15:23:33.002352: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2022-08-09 15:23:33.003067: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2022-08-09 15:23:33.003796: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found
2022-08-09 15:23:33.004519: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2022-08-09 15:23:33.005261: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2022-08-09 15:23:33.005365: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Loading building ... 2
Loading data for meter ElecMeterID(instance=2, building=2, dataset='REDD')
Done loading data all meters for this chunk.
Dropping missing values
...............BERT partial_fit running...............
First model training for fridge
2022-08-09 15:23:35.090127: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-09 15:23:35.090573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-08-09 15:23:35.090662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]
Model: "sequential"


Layer (type) Output Shape Param #

conv1d (Conv1D) (None, 99, 16) 80


l_ppool (LPpool) (None, 50, 16) 0


token_and_position_embedding (None, 50, 16, 32) 643168


transformer_block (Transform (None, 50, 16, 32) 10656


flatten (Flatten) (None, 25600) 0


dropout_2 (Dropout) (None, 25600) 0


dense_2 (Dense) (None, 99) 2534499


dropout_3 (Dropout) (None, 99) 0

Total params: 3,188,403
Trainable params: 3,188,403
Non-trainable params: 0


Epoch 1/50
2022-08-09 15:23:46.743583: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
WARNING:tensorflow:Gradients do not exist for variables ['conv1d/kernel:0', 'conv1d/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['conv1d/kernel:0', 'conv1d/bias:0'] when minimizing the loss.
526/526 [==============================] - 259s 472ms/step - loss: 12.1465 - mse: 12.1465 - val_loss: 0.6670 - val_mse: 0.6670

Epoch 00001: val_loss improved from inf to 0.66698, saving model to BERT-temp-weights-74894.h5
Traceback (most recent call last):
File "D:/mywork/nilmtkcontribxu/nilmtk_contrib/disaggregate/fuhefenjie.py", line 54, in
api_res = API(experiment)
File "D:\anaconda3\envs\nilmxu\lib\site-packages\nilmtk\api.py", line 46, in init
self.experiment()
File "D:\anaconda3\envs\nilmxu\lib\site-packages\nilmtk\api.py", line 91, in experiment
self.train_jointly(clf,d)
File "D:\anaconda3\envs\nilmxu\lib\site-packages\nilmtk\api.py", line 240, in train_jointly
clf.partial_fit(self.train_mains,self.train_submeters)
File "D:\mywork\nilmtkcontribxu\nilmtk_contrib\disaggregate\bert.py", line 161, in partial_fit
model.fit(train_x,train_y,validation_data=(v_x,v_y),epochs=self.n_epochs,callbacks=[checkpoint],batch_size=self.batch_size)
File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\engine\training.py", line 1204, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\callbacks.py", line 410, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\callbacks.py", line 1376, in on_epoch_end
self._save_model(epoch=epoch, logs=logs)
File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\callbacks.py", line 1428, in _save_model
self.model.save(filepath, overwrite=True, options=self._options)
File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\engine\training.py", line 2087, in save
signatures, options, save_traces)
File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\saving\save.py", line 147, in save_model
model, filepath, overwrite, include_optimizer)
File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\saving\hdf5_format.py", line 79, in save_model_to_hdf5
raise ImportError('save_model requires h5py.')
ImportError: save_model requires h5py.
Closing remaining open files:C:\Users\xrq\AppData\Local\Temp\nilmtk-meg927ux.h5...doneD:/works/nilmtkcontrib/nilmtk_contrib/redd_low.hdf5...done
`

@paulfrank1997
Copy link

@xuuurq I met the same problem as you did: "ImportError: save_model requires h5py". But after I upgrate hdpy into the latest version by "pip install --upgrade h5py", the problem got solved. The version of h5py I used is 3.6.0, and now everything worked fine.

@xuuurq
Copy link

xuuurq commented Aug 11, 2022

@paulfrank1997 Sorry to bother you again, I think there are a few more questions:

  1. Are you using tensorflow-gpu version 2.5.0?
  2. The bert model in nilmtk-contrib is different from the code in the BERT4NILM paper, which is reflected in the loss function and mask processing. Is the bert model in nilmtk-contrib without mask processing?
  3. In addition, I would like to ask you what do you think of the effect of the bert model in nilmtk-contrib?
    Thank you very much for your answer.

@paulfrank1997
Copy link

paulfrank1997 commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants