Introductory discussion and questions about language models.
Hi, I have a question. Hi, I’ve got a question about a use case. My client wants to know if it can create a NLP tool to read corporate documents (e.g. policies) from a Diversity, Equity and Inclusion perspective. For example, could it read the suite of documents, identify language that could be problematic, and suggest better alternatives. I’m thinking of this as being analogous to tools that read legal documents for pre-defined clauses, or other tools that scan for Environment, Social and Governance considerations. Is this something that you could help with. Would appreciate someone connecting with me to learn more. Cheers, John
Hi John. You can definitely build such a system but it would be composed of multiple models.
One way to build the component that flags “problematic” content is similar to a toxicity classifier. You’ll need a labeled dataset of examples of sentences that are “problematic” and others. You can then get the embeddings of these sentences and train a “problematic sentence classifier”. You can find skeleton for building this classifier in the Text Classification notebook at GitHub - cohere-ai/notebooks: Code examples and jupyter notebooks for the Cohere Platform.
Then you can split your documents into sentences and pass the embeddings of these sentences through the classifier and take note of sentences with high “problematic” scores.
Suggesting alternatives is a paraphrasing task and can be done with the generative models. See the Text Summarization notebook in the same github repo above for an example. But for your task you’ll need a little more prompt engineering. Happy to advise on potential prompts once you reach that stage.