Suicide Ideation Detection with Deep Learning

This project aims to leverage machine learning to detect potential suicide ideation in text data. Using a dataset of text entries classified as either reflecting suicide ideation or not, we build and train a neural network model to predict the likelihood that a given text indicates suicidal thoughts. The model utilizes pre-trained GloVe embeddings for text representation and a LSTM neural network for classification.

Caution : Download glove.840B.300d.pkl file

Data Preparation

The expected format is a CSV file with at least two columns: one containing the text data and another indicating the class (suicide ideation or not).
				
					"Ex Wife Threatening SuicideRecently I left my wife for good because she has cheated on me twice and lied to me so much that I have decided to refuse to go back to her. As of a few days ago, she began threatening suicide. I have tirelessly spent these paat few days talking her out of it and she keeps hesitating because she wants to believe I'll come back. I know a lot of people will threaten this in order to get their way, but what happens if she really does? What do I do and how am I supposed to handle her death on my hands? I still love my wife but I cannot deal with getting cheated on again and constantly feeling insecure. I'm worried today may be the day she does it and I hope so much it doesn't happen.",suicide
Am I weird I don't get affected by compliments if it's coming from someone I know irl but I feel really good when internet strangers do it,non-suicide
"Finally 2020 is almost over... So I can never hear ""2020 has been a bad year"" ever again. I swear to fucking God it's so annoying",non-suicide
i need helpjust help me im crying so hard,suicide
"I’m so lostHello, my name is Adam (16) and I’ve been struggling for years and I’m afraid. Through these past years thoughts of suicide, fear, anxiety I’m so close to my limit . I’ve been quiet for so long and I’m too scared to come out to my family about these feelings. About 3 years ago  losing my aunt triggered it all. Everyday feeling hopeless , lost, guilty, and remorseful over her and all the things I’ve done in my life,but thoughts like these with the little I’ve experienced in life? Only time I’ve revealed these feelings to my family is when I broke down where they saw my cuts. Watching them get so worried over something I portrayed as an average day made me feel absolutely dreadful. They later found out I was an attempt survivor from attempt OD(overdose from pills) and attempt hanging. All that happened was a blackout from the pills and I never went through with the noose because I’m still so afraid. During my first therapy I was diagnosed with severe depression, social anxiety, and a eating disorder.
I was later transferred to a fucken group therapy for some reason which made me feel more anxious. Eventually before my last session with a 1 on 1 therapy she showed me my results from a daily check up on my feelings(which was a 2 - step survey for me and my mom/dad )
Come to find out as I’ve been putting feeling horrible and afraid/anxious everyday , my mom has been doing I’ve been doing absolutely amazing with me described as “happiest she’s ever seen me, therapy has helped him” 
I eventually was put on Sertaline (anti anxiety or anti depression I’m sorry I forgot) but I never finished my first prescription nor ever found the right type of anti depressant because my mom thought I only wanted the drugs so she took me off my recommended pill schedule after ~3 week and stopped me from taking them. All this time I’ve been feeling worse afraid of the damage/ worry I’ve caused them even more. 
Now here with everything going on, I’m as afraid as I’ve ever been . I’ve relapsed on cutting and have developed severe insomnia . Day after day feeling more hopeless, worthless questioning why am I still here? What’s my motivation to move out of bed and keep going? I ask these to myself nearly every night almost having a break down everytime. 
Please Please Please someone.. anyone help me.
I’m so scared I might do something drastic, I’ve been shaped by fear and anxiety. Idk what to do anymore",suicide
"Honetly idkI dont know what im even doing here. I just feel like there is nothing and nowhere for me. All i can feel is either nothing or unbearably sad. Im ignoring friends every opitunity i can. I feel like im loosing my girlfriend. I only hurt everyone i talk too and i dont cause anything good. Im behind on my education, i feel alone but for the first time its not a feeling ive enjoyed. I have no hopes or dreams. I care about nothing, not family, not friends, not even my girlfriend (i still love her, its complicated and i dont have the words to describe it). 

				
			

Import Statements

This project aims to leverage machine learning to detect potential suicide ideation in text data. Using a dataset of text entries classified as either reflecting suicide ideation or not, we build and train a neural network model to predict the likelihood that a given text indicates suicidal thoughts.
 
  • dataprepare: UDF to load and preprocess the data.
  • create_embedding_matrix: UDFunction to create an embedding matrix using GLoVE embeddings.
  • load_model: Keras function to load a saved model.
  • train_model: UDFunction to train the model.
  • evaluate_model: UDFunction to evaluate the model.
  • predict_text: UDFunction to make predictions with the trained model.
  • pickle: Python module to serialize objects.
  • mlflow: For experiment tracking and model management.
  • mlflow.tensorflow: MLflow integration for TensorFlow.

The model utilizes pre-trained GloVe embeddings for text representation and a LSTM neural network for classification.

main.py

				
					from processing.datapreprocessing import dataprepare
from processing.get_embedding import create_embedding_matrix
from tensorflow.keras.models import load_model
from training.train import train_model
from evaluation.evaluate import evaluate_model
from model_serve import predict_text

import pickle 
import mlflow
import mlflow.tensorflow 

def main():
    # Step 1: Load and preprocess the data
    filepath = 'artifacts/datasets/Suicide_Detection.csv' 
    tokenizer, train_pad, test_pad , train_output , test_output = dataprepare(filepath)
    
    pickle.dump(tokenizer, open('artifacts/tokenizer/tokenizer.pkl', 'wb'))
    embedding_matrix = create_embedding_matrix(tokenizer, 300)
  

    train_model(train_pad, train_output, test_pad, test_output, len(tokenizer.word_index) + 1, embedding_matrix)
    
    accuracy, f1, auc=evaluate_model(test_pad, test_output)

    print(accuracy, f1, auc)
    
    mlflow.set_experiment("Logging11")
    model = load_model("artifacts/trained_model/model.h5")

    runname="runlogging003"

    with mlflow.start_run(run_name=runname):
    
        mlflow.set_tag("version","1.0.0")

    
        mlflow.log_metric("Accuracy", accuracy)
        mlflow.log_metric("f1-score", f1)
        mlflow.log_metric("AUC", auc)

        mlflow.tensorflow.log_model(model, "TextClassifier")

        mlflow.end_run()
    
    



    # sample_text = ['Through these past years thoughts of suicide, fear, anxiety I’m so close to my limit']
    # result = predict_text(sample_text)
    # print(result)

if __name__ ==  "__main__":
    mlflow.set_tracking_uri("http://127.0.0.1:5001")
    main()

#mlflow server --host 0.0.0.0 --port 5001 --backend-store-uri mysql+mysqlconnector://root:@localhost/mysql1 --default-artifact-root $PWD/mlruns
				
			
dataprepare: Takes file path of the dataset and processes the data.The text data is cleaned and in each text entry ,special characters and stopwords are removed and then converted to lowercase. It also computes the length of each text entry.The tokenizer, padded training and test sequences, and their corresponding outputs (labels) are returned. The tokenizer is saved using pickle for future use. create_embedding_matrix : it is designed to create an embedding matrix that maps each word in the tokenizer’s vocabulary to its corresponding GloVe vector train_model : This sets up a process for defining the model architecture itself and then training a neural network model for a binary classification task, using pre-trained GloVe embeddings and an LSTM layer for sequence processing. The training process includes mechanisms for early stopping and learning rate adjustment to enhance training efficiency and performance. evaluate_model :It evaluates the model on the test data, returning accuracy, F1-score, and AUC (Area Under the Curve). The experiment is set up in MLflow with the name “Logging11”, and the trained model is loaded.The tracking URI for the MLflow server is set to http://127.0.0.1:5001. A new run is started in MLflow with the name “runlogging003”. Metrics (accuracy, F1-score, AUC) and the trained model are logged. A tag “version” is also set. The commented-out command at the end of the script is used to start an MLflow server:
  • --host 0.0.0.0 allows the server to be accessible from any IP address.
  • --port 5001 specifies the port for the server.
  • --backend-store-uri sets the backend store URI for the server, using a MySQL database.
  • --default-artifact-root specifies the root directory for storing artifacts.

Model Building

				
					import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.layers import Input, Embedding, LSTM, GlobalMaxPooling1D, Dense

def create_model(vocabulary_size, embedding_dim, embedding_matrix, input_length):
    # Define the input layer
    inputs = Input(shape=(input_length,))
    
    # Create the embedding layer with pretrained weights and set it to be non-trainable
    x = Embedding(vocabulary_size, embedding_dim, embeddings_initializer=tf.keras.initializers.Constant(embedding_matrix), trainable=False)(inputs)
    
    # Add LSTM layer
    x = LSTM(20, return_sequences=True)(x)
    
    # Add Global Max Pooling layer
    x = GlobalMaxPooling1D()(x)
    
    # Add Dense layers
    x = Dense(256, activation='relu')(x)
    outputs = Dense(1, activation='sigmoid')(x)
    
    # Create the model
    model = Model(inputs=inputs, outputs=outputs)
    return model
     


				
			

Training

				
					import tensorflow
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from training.model import create_model
# Import any necessary functions from data_preprocessing

def train_model(train_pad, train_output, test_pad, test_output, vocabulary_size, embedding_matrix):
    model = create_model(vocabulary_size, 300, embedding_matrix, 50)  # Assume 300 is the embedding dim and 50 is the input length
    model.compile(optimizer=tensorflow.keras.optimizers.SGD(0.1, momentum=0.09), loss='binary_crossentropy', metrics=['accuracy'])
    early_stop = EarlyStopping(patience=5)
    reducelr = ReduceLROnPlateau(patience=3)
    r = model.fit(train_pad, train_output, validation_data=(test_pad, test_output),
                  epochs=5, batch_size=256, callbacks=[early_stop, reducelr])
    model.save("artifacts/trained_model/model.h5")
    print("train.py is being read")




				
			

App

				
					import streamlit as st
import pickle
from keras.preprocessing.sequence import pad_sequences
from keras.models import load_model
import plotly.express as px
import pandas as pd
token_form = pickle.load(open('tokenizer/tokenizer.pkl', 'rb'))
model = load_model("model/model.h5")

if __name__ == '__main__':
    st.title('Suicidal Post Detection App ')
    st.subheader("Input the Post content below")
    sentence = st.text_input("Enter your post content here")
    predict_btt = st.button("Predict")
    
    if predict_btt:
        # Define the post
        st.write("Post: " +sentence)
        twt = [sentence]
        twt = token_form.texts_to_sequences(twt)
        twt = pad_sequences(twt, maxlen=50)

        # Predict the ideation
        prediction = model.predict(twt)[0][0]
        st.warning(prediction)
        # Print the prediction
        if(prediction < 0.5):
             st.warning("Non Suicide Post")
        else:
            st.warning("Potential Suicide Post")
        class_label = ["Potential Suicide Post","Non Suicide Post"]
        prob_list = [prediction*100,100-prediction*100]
        prob_dict = {"Potential Suicide Post/Non Suicide Post":class_label,"Probability":prob_list}
        df_prob = pd.DataFrame(prob_dict)
        fig = px.bar(df_prob, x='Potential Suicide Post/Non Suicide Post', y='Probability')
        model_option = "SuicideDetection"
        if prediction <  0.5:
            fig.update_layout(title_text="{} model - prediction probability comparison between Potential Suicide Post and Non Suicide Post".format(model_option))
            st.info("The {} model predicts that there is a higher {} probability that the post content is Non-Potential Suicide Post compared to a {} probability of being Potential Suicide Post".format(model_option,100-prediction*100,prediction*100))
        else:
            fig.update_layout(title_text="{} model - prediction probability comparison between Potential Suicide Post and Non Suicide Post".format(model_option))
            st.info(" The {} model predicts that there a almost equal {} probability that the post content is Potential Suicide Post compared to a {} probability of being Non Suicide Post".format(model_option,prediction*100,100-prediction*100))
        st.plotly_chart(fig, use_container_width=True)

    

				
			

MLFlow & App Output