You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I'm trying to do task for system calls classification. The code bellow is inspired from a text classification project. My system calls are represented as sequences of integers between 1 and 340. The error I got is: valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples.
I don't know how to resolve it !!
`
df = pd.read_csv("data.txt")
df_test = pd.read_csv("validation.txt")
#split arrays into train and test data (cross validation)
train_text, test_text, train_y, test_y = train_test_split(df,df,test_size = 0.2)
MAX_NB_WORDS = 5700
# get the raw text data
texts_train = train_text.astype(str)
texts_test = test_text.astype(str)
# finally, vectorize the text samples into a 2D integer tensor
tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)
tokenizer.fit_on_texts(texts_train)
sequences = tokenizer.texts_to_sequences(texts_train)
sequences_test = tokenizer.texts_to_sequences(texts_test)
word_index = tokenizer.word_index
type(tokenizer.word_index), len(tokenizer.word_index)
index_to_word = dict((i, w) for w, i in tokenizer.word_index.items())
" ".join([index_to_word[i] for i in sequences[0]])
seq_lens = [len(s) for s in sequences]
MAX_SEQUENCE_LENGTH = 100
# pad sequences with 0s
x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)
#print('Shape of data train:', x_train.shape) #cela a donnée (1,100)
#print('Shape of data test tensor:', x_test.shape)
y_train = train_y
y_test = test_y
print('Shape of label tensor:', y_train.shape)
EMBEDDING_DIM = 32
N_CLASSES = 2
y_train = keras.utils.to_categorical( y_train , N_CLASSES )
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')
embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH,
trainable=True)
embedded_sequences = embedding_layer(sequence_input)
average = GlobalAveragePooling1D()(embedded_sequences)
predictions = Dense(N_CLASSES, activation='softmax')(average)
model = Model(sequence_input, predictions)
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['acc'])
model.fit(x_train, y_train, validation_split=0.1,
nb_epoch=10, batch_size=1)
output_test = model.predict(x_test)
print("test auc:", roc_auc_score(y_test,output_test[:,1]))
`
The text was updated successfully, but these errors were encountered:
Hello, I'm trying to do task for system calls classification. The code bellow is inspired from a text classification project. My system calls are represented as sequences of integers between 1 and 340. The error I got is: valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples.
I don't know how to resolve it !!
`
`
The text was updated successfully, but these errors were encountered: