ProceZeus
Documentation
You can learn more about ProceZeus by visiting the official documentation page. These docs were generated with mdBook.
Getting Started
Prerequisites
All of the project's services are split into separate Docker images. All application dependencies are contained within the Docker images. The dependencies required to run this project locally are:
docker
docker-compose
Installing
To install Docker and Docker Compose, you can follow the instructions here.
Running the Entire Application Stack
We've developed a script to help with running the entire application with all its components. It's a thin wrapper around docker-compose
, so all docker-compose
will work for it. All you need is:
./cjl build && ./cjl up
If you want to suppress output push the job to the background:
./cjl up -d
Docker isn't always the best at determining diffs between images. If that happens, you can destroy all Docker images on your host with:
./cjl clean [-y]
To run all tests and lints for all services:
./cjl test
To try to fix all linting errors for all services:
./cjl lint-fix
In order to shut down all containers:
./cjl down
Finally, if you want to reset the database (Helps with inconsistent database states), you can run:
./cjl reset-db
The cjl
script also takes any other command that docker-compose
can take.
Running or Testing Specific Services
The following services can run individually or tested against:
- Web Client
- Backend Service
- Machine Learning Service
- Natural Language Processing Service
- PostgreSQL Database
Deployment
Deployment is done via Travis CI and Ansible. The most current version of the master branch is reflected on the Demo Page
Architecture
The following architecture diagram represents the various services and the relationships they have with one another.
Contributing
See CONTRIBUTING.md for details.
Versioning
See the releases tab
Authors
The following is a list of team members that are contributing to the project:
License
This project is licensed under the MIT License - see the LICENSE file for details
Archicture and Infrastructure
Services
The project is split into several modules; one per micro-service. The description of each microservice is shown below:
Backend Service
This module is responsible for responding to the web client's API queries. It is also the primary point of contact for the other micro-services
ML Service
This module is responsible for all things related to predicting outcomes and classifying based on precedent data
Web Client
This module contains the Web UI that users will interact with
Postgresql
This module contains the data persistence layer of our system
NLP Service
This module is responsible for all things related to natural language processing, which the user interacts with
Infrastructure and Continuous Integration/Deployment
The ProceZeus application has many components with a and requires lots of binaries for the various machine learning models. We use docker
and docker-compose
to manage all of that complexity. Although you'll need to have these tools installed to build ProceZeus, we've hidden the gory details from you by providing a handy cjl
script.
cjl
is a thin wrapper around the docker-compose
command, so any command that would work for docker-compose
should also work for cjl
. However, cjl
contains utility functions to automatically lint your code, run tests, reset the database, remove all Docker images, and more! You can read more about the functions available in cjl
in the "Getting Started" section.
docker-compose
configuration files are currently split based on the environment. Running ./cjl build && ./cjl up
will default to a dev environment, which uses the docker-compose.dev.yml
configuration. This can be changed by specifying the environment you like with the COMPOSE_FILE
environment variable. For example, this technique is used in the CI environment to run tests and upload the code coverage reports instead of running the application.
We're using Travis CI for as a continuous integration service to run our tests on our latest changes pushed to Git. The details are in .travis.yml
, but Travis uses a build matrix in order to test each service in a separate process. We've also added an additional constraint where every commit message must begin with a reference to an issue (eg. [#123]
) in order to improve the trackability of our work. If a build fails, GitHub will not let you merge in your changes.
Once a service's tests have completed running, the test line code covereage is uploaded to CodeCov.io. This service ensures that we're always maintaining a reasonable number of tests for our application over time. This check is more informational, and is not required to pass for a build to be merged.
The server at capstone.cyberjustice.ca
us fronted by nginx. It is being used to serve static files and machine learning model binaries required at build time at https://capstone.cyberjustice.ca/data
. The client is currently running at https://capstone.cyberjustice.ca
by proxying the user's request to the server running in the Docker web_client
service. All requests to https://capstone.cyberjustice.ca/api
are also proxied and passed through to the backend_service
for the client's REST API.
We are using the default PostgreSQL docker image as a persistent data store. At the moment, we do not keep track of database migrations, and the database is wiped and rebuilt on every deployment. This can also be done manually on your local machine with ./cjl reset-db
. The services that require DB access run an init.py
script during build time that constructs the database models.
Web Client
Overview
The web client is a Vue.js application that integrates all micro-services and provides a user interface to end users. The main focus of the web client is to 1) provides on-screen chatbot experience to the users and 2) delivery the system features as a whole.
The major technologies we use in web client are:
File Structure
Below is a list of the most important files and directories
-/build // includes all webpack build configuration
-/config // includes all webpack environment configuration
-/src // source code of the vue.js application
-/test // unit test
index.html // main index file
Dockerfile // docker configuration
package.json // application dependencies
When developing on UI and features, you should mostly work on src
folder without touching the other directories and files.
Installation Instruction
The web client does not work if other micro-services are not running concurrently. In production and continuous integration, all services including web client are built in docker; however, when developing in local, the docker does not build web client. The reason is docker doesn't rebuild itself when web client is updated, therefore not very efficient to work in docker environment.
To start work on the web client, please make sure you have installed Node.js 8 (Do not install v9.0+), and follow the following steps:
- If you have not built the docker images for other micro-services yet, run
./cjl up
in the root directory of the repository. For more information, check the main README. - Once the micro-services are up, run
npm install
in web client directory. - When the installation is finished, run
npm run start
to start the application - When the application is running, you can edit the source code. The latest changes will be shown in the browser.
Develop on Components
Under src
directory, you should see the application source code with the following folders:
-/assets // static assets such as images
-/components // reusable components
-/router // url router
-/theme // styling
Vue.js is component based Javascript framework, therefore each .vue
file creates a reusable component. Each component is able to be run independently.
So far in our application, we have:
Landing.vue
: the landing page component is used to handle first-time usersDashboard.vue
: the main component that containsSidebar.vue
component andChat.vue
component. WhenSidebar.vue
component handles the data display on the UI, theChat.vue
component handles all the logic related to the chatbot.Legal.vue
: the legal page component is used to fetch and show the latest Privacy Policy and End User License Agreement.Eventbus.js
: a bus for component communications.
.vue
file usually contains all necessary codes for a component (Javascript, HTML, and CSS). To make our lives easier, all styling is configurated and written in SASS format and stored in the theme
folder. To change the styling of the UI, you only need to edit the corresponding .scss
file without touching the functional codes.
Due to the simplicity of the nature of the application, we did not implement state management architecture. As mentioned above, we use Eventbus.js
to handle component communication. If you want to have major refactoring in the future, you can check out Vuex.
We use ElementUI as the UI library. It is the best library available for Vue.js. For the best practice and code consistency, you should always check if the feature can be implemented using Element component.
Testing
The unit test of the web client is using the default Vue.js unit test library, which is built with Mocha. To test the application locally, run npm run test
.
All unit test files are stored in test/unit
directory. Each .spec.js
file contains the unit tests for the corresponding component. You should always make sure your new changes are well tested. Once you run the test, the test report will be generated in test/unit/converage
. You can open test/unit/converage/icov-report/index.html
to see the visual report.
Due to the scope of the project, we did not implement E2E automation test. To do so, please check Nightwatch.js.
Reference
Backend Service
Run Tests and Lints
export COMPOSE_FILE=ci
./cjl up -d && ./cjl run backend_service
Backend API
Initialize a new conversation
Initializes a new conversation
URL : /new
Method : POST
Data constraints
Provide the user's name and person type.
{
"name": "[unicode 40 chars max]",
"person_type": "(TENANT|LANDLORD)"
}
Success Response
Code : 200 OK
Content examples
{
"conversation_id": 1
}
Error Response
Code : 400 Bad Request
- Invalid person_type provided
Store User Confirmation
Stores the user confirmation or text supplied in order to confirm whether an NLP prediction was accurate. This is request is sent when the user either accepts or rejects an intent classification via interface buttons.
URL : /store-user-confirmation
Method : POST
Data constraints
Provide the conversation id and confirmation text of the user.
{
"conversation_id": 1,
"confirmation": true | false | "$5000"
}
Success Response
Code : 200 OK
Content examples
{
"message": "User confirmation stored successfully"
}
Error Response
Code : 400 Bad Request
Code : 404 Not Found
Send a message
Sends a message to the bot. This will be the message that the bot displays to the user
URL : /conversation
Method : POST
Data constraints
Provide the conversation_id and a message. Message should be empty string for first call.
{
"conversation_id": "[integer]",
"message": "[unicode]"
}
Success Response
Code : 200 OK
Content examples
Simple response containing a message and conversation progress. Messages may contain pipe characters (|) which indicate that a sentence should be split into separate conversation windows. Note that the first message will not contain a pipe character. Each subsequent sentence will begin with a pipe character if it is meant to be split.
Converation progress will be null at the beginning of the conversation, and if the user ends up asking a FAQ question (which have no progress). However, if the user asks a question that requires the bot to resolve facts, progress will indicate a percentage of how far along the conversation is to getting a prediction.
Example 1:
{
"conversation_id": 1,
"message": "Hello Tim Timmens!|What kind of problem are you having?",
"progress": null
}
This is formatted in the web interface as:
Message 1: Hello Tim Timmens!
Message 2: What kind of problem are you having?
Example 2:
{
"conversation_id": 5,
"message": "Oh I see you're having problems with lease termination!|I have a few questions for you.|Do you have a lease?",
"progress": 0
}
This is formatted in the web interface as:
Message 1: Oh I see you're having problems with lease termination!
Message 2: I have a few questions for you.
Message 3: Do you have a lease
Response containing a request for a file.
{
"conversation_id": 1,
"file_request": {
"document_type": "LEASE"
},
"message": "Could you please upload your lease if you have it, Tim Timmens?"
}
Document Types
LEASE: A lease for a dwelling
Error Response
Code : 404 Not Found
Get a conversation history
Gets the message history for a conversation
URL : /conversation/:conversation_id
Method : GET
Success Response
Code : 200 OK
Content examples
{
"claim_category": "NONPAYMENT",
"bot_state": "RESOLVING_FACTS",
"current_fact": {
"name": "landlord_retakes_apartment",
"summary": "Landlord intends to retake dwelling",
"type": "BOOLEAN"
},
"fact_entities": [
{
"fact": {
"name": "apartment_impropre",
"summary": "Dwelling unfit for habitation",
"type": "BOOLEAN"
},
"id": 1,
"value": "false"
},
{
"fact": {
"name": "landlord_relocation_indemnity_fees",
"summary": "Relocation reimbursed following inhabitability",
"type": "BOOLEAN"
},
"id": 2,
"value": "true"
}
],
"files": [],
"id": 1,
"messages": [
{
"enforce_possible_answer": true,
"file_request": null,
"id": 1,
"possible_answers": "[\"Yes\"]",
"relevant_fact": null,
"sender_type": "BOT",
"text": "Hello Bobby! Before we start, I want to make it clear that I am not a replacement for a lawyer and any information I provide you with is not meant to be construed as legal advice. Always check in with your legal professional. You can read more about our terms of use <a href='/legal' target='_blank'>here</a>. Do you accept these conditions?",
"timestamp": "2017-12-20T01:27:35.993932+00:00"
},
{
"enforce_possible_answer": null,
"file_request": null,
"id": 2,
"possible_answers": null,
"relevant_fact": null,
"sender_type": "USER",
"text": "Yes",
"timestamp": "2017-12-20T01:27:39.023317+00:00"
},
{
"enforce_possible_answer": false,
"file_request": {
"document_type": "LEASE"
},
"id": 3,
"possible_answers": null,
"relevant_fact": null,
"sender_type": "BOT",
"text": "I see you're a tenant, Bobby. If you have it on hand, it would be very helpful if you could upload your lease. What issue can I help you with today?",
"timestamp": "2017-12-20T01:27:39.040375+00:00"
},
{
"enforce_possible_answer": null,
"file_request": null,
"id": 4,
"possible_answers": null,
"relevant_fact": null,
"sender_type": "USER",
"text": "I am being kicked out",
"timestamp": "2017-12-20T01:27:40.694884+00:00"
},
{
"enforce_possible_answer": false,
"file_request": null,
"id": 5,
"possible_answers": null,
"relevant_fact": {
"name": "apartment_impropre",
"summary": "Dwelling unfit for habitation",
"type": "BOOLEAN"
},
"sender_type": "BOT",
"text": "Oh yes, I know all about problems with nonpayment. Would you deem the apartment unfit for habitation?",
"timestamp": "2017-12-20T01:27:40.794129+00:00"
},
{
"enforce_possible_answer": null,
"file_request": null,
"id": 6,
"possible_answers": null,
"relevant_fact": {
"name": "apartment_impropre",
"summary": "Dwelling unfit for habitation",
"type": "BOOLEAN"
},
"sender_type": "USER",
"text": "No",
"timestamp": "2017-12-20T01:28:46.591573+00:00"
},
{
"enforce_possible_answer": false,
"file_request": null,
"id": 7,
"possible_answers": null,
"relevant_fact": {
"name": "landlord_relocation_indemnity_fees",
"summary": "Relocation reimbursed following inhabitability",
"type": "BOOLEAN"
},
"sender_type": "BOT",
"text": "Have moving expenses been compensated when the apartment was deemed inhabitable?",
"timestamp": "2017-12-20T01:28:46.652110+00:00"
},
{
"enforce_possible_answer": null,
"file_request": null,
"id": 8,
"possible_answers": null,
"relevant_fact": {
"name": "landlord_relocation_indemnity_fees",
"summary": "Relocation reimbursed following inhabitability",
"type": "BOOLEAN"
},
"sender_type": "USER",
"text": "Yes",
"timestamp": "2017-12-20T01:28:51.529825+00:00"
}
],
"name": "Bobby",
"person_type": "TENANT"
}
Error Response
Code : 400 Bad Request
Code : 404 Not Found
Get a report for a conversation
Retrieves a report for a conversation once at least one prediction has been returned. Will return 404 if a report has not been generated yet.
URL : /conversation/:conversation_id/report
Method : GET
Success Response
Code : 200 OK
Content examples
{
"report": {
"accuracy": 0.8114285714285714,
"curves": {
"additional_indemnity_money": {
"mean": 1477.7728467101024,
"outcome_value": 6038,
"std": 1927.8147997893939,
"variance": 3716469.9022870203
}
},
"data_set": 8,
"outcomes": {
"additional_indemnity_money": 6038,
"landlord_prejudice_justified": true,
"orders_expulsion": true,
"orders_immediate_execution": true,
"orders_resiliation": true,
"tenant_ordered_to_pay_landlord": 3092,
"tenant_ordered_to_pay_landlord_legal_fees": 80
},
"similar_case": 5,
"similar_precedents": [
{
"distance": 2.6080129205467784,
"facts": {
"landlord_relocation_indemnity_fees": 0,
"tenant_dead": false,
"tenant_is_bothered": false,
"tenant_left_without_paying": false,
"tenant_owes_rent": 0,
"tenant_rent_not_paid_more_3_weeks": true
},
"outcomes": {
"additional_indemnity_money": 5850,
"landlord_prejudice_justified": true,
"orders_expulsion": true,
"orders_immediate_execution": true,
"orders_resiliation": true,
"tenant_ordered_to_pay_landlord": 7150,
"tenant_ordered_to_pay_landlord_legal_fees": 74
},
"precedent": "AZ-51412066"
},
{
"distance": 2.6543730465072035,
"facts": {
"landlord_relocation_indemnity_fees": 0,
"tenant_dead": false,
"tenant_is_bothered": false,
"tenant_left_without_paying": false,
"tenant_owes_rent": 2460,
"tenant_rent_not_paid_more_3_weeks": true
},
"outcomes": {
"additional_indemnity_money": 3620,
"landlord_prejudice_justified": true,
"orders_expulsion": true,
"orders_immediate_execution": true,
"orders_resiliation": true,
"tenant_ordered_to_pay_landlord": 2460,
"tenant_ordered_to_pay_landlord_legal_fees": 81
},
"precedent": "AZ-51163532"
},
{
"distance": 2.6969256661279988,
"facts": {
"landlord_relocation_indemnity_fees": 0,
"tenant_dead": false,
"tenant_is_bothered": false,
"tenant_left_without_paying": false,
"tenant_owes_rent": 0,
"tenant_rent_not_paid_more_3_weeks": true
},
"outcomes": {
"additional_indemnity_money": 2463,
"landlord_prejudice_justified": true,
"orders_expulsion": true,
"orders_immediate_execution": true,
"orders_resiliation": true,
"tenant_ordered_to_pay_landlord": 2886,
"tenant_ordered_to_pay_landlord_legal_fees": 83
},
"precedent": "AZ-51395624"
},
{
"distance": 2.719885995641093,
"facts": {
"landlord_relocation_indemnity_fees": 0,
"tenant_dead": false,
"tenant_is_bothered": false,
"tenant_left_without_paying": false,
"tenant_owes_rent": 0,
"tenant_rent_not_paid_more_3_weeks": true
},
"outcomes": {
"additional_indemnity_money": 3180,
"landlord_prejudice_justified": true,
"orders_expulsion": true,
"orders_immediate_execution": true,
"orders_resiliation": true,
"tenant_ordered_to_pay_landlord": 3830,
"tenant_ordered_to_pay_landlord_legal_fees": 74
},
"precedent": "AZ-51395655"
},
{
"distance": 2.7806288394504484,
"facts": {
"landlord_relocation_indemnity_fees": 0,
"tenant_dead": false,
"tenant_is_bothered": false,
"tenant_left_without_paying": false,
"tenant_owes_rent": 3600,
"tenant_rent_not_paid_more_3_weeks": true
},
"outcomes": {
"additional_indemnity_money": 3750,
"landlord_prejudice_justified": true,
"orders_expulsion": true,
"orders_immediate_execution": true,
"orders_resiliation": true,
"tenant_ordered_to_pay_landlord": 3600,
"tenant_ordered_to_pay_landlord_legal_fees": 78
},
"precedent": "AZ-51187376"
}
]
}
}
Error Response
Code : 400 Bad Request
Code : 404 Not Found
Get facts resolved during conversation
Gets only the list of resolved facts for the conversation
URL : /conversation/:conversation_id/resolved
Method : GET
Success Response
Code : 200 OK
Content examples
{
"fact_entities": [
{
"fact": {
"name": "apartment_impropre",
"summary": "Dwelling unfit for habitation",
"type": "BOOLEAN"
},
"id": 1,
"value": "false"
},
{
"fact": {
"name": "landlord_relocation_indemnity_fees",
"summary": "Relocation reimbursed following inhabitability",
"type": "BOOLEAN"
},
"id": 2,
"value": "true"
}
]
}
Error Response
Code : 400 Bad Request
Code : 404 Not Found
Remove a resolved fact
Removes a resolved fact from the conversation
URL : /conversation/:conversation_id/resolved/:fact_id
Method : DELETE
Success Response
Code : 200 OK
Content examples
{
"success": true
}
Error Response
Code : 400 Bad Request
Code : 404 Not Found
Upload a file
Upload a file that serves as evidence for a particular conversation.
URL : /conversation/:conversation_id/files
Method : POST
Headers
Content-Type: multipart/form-data
Data constraints
Provide 'file' form key with file data.
Success Response
Code : 200 OK
Content examples
{
"name": "leaky_pipes.png",
"type": "image/png",
"timestamp": "2017-10-24T00:01:27.806730+00:00"
}
Error Response
Code : 400 Bad Request
Code : 404 Not Found
Get conversation file metadata
Gets a list of file metadata for a conversation
URL : /conversation/:conversation_id/files
Method : GET
Success Response
Code : 200 OK
Content examples
{
"files": [
{
"name": "leaky_pipes.png",
"type": "image/png",
"timestamp": "2017-10-24T00:01:27.000000+00:00"
},
{
"name": "my_least.pdf",
"type": "application/pdf",
"timestamp": "2017-10-24T00:01:30.000000+00:00"
}
]
}
Error Response
Code : 400 Bad Request
Code : 404 Not Found
Get Latest Legal Documents
Obtains information and contents of the latest legal documents
URL : /legal
Method : GET
Success Response
Code : 200 OK
Content examples
[
{
"abbreviation": "EULA",
"html": {
"content": [
{
"subtitle": "TL;DR",
"summary": "no purse as fully me or point. Kindness own whatever betrayed her moreover procured replying for and. Proposal indulged no do do sociable he throwing settling. Covered ten nor comfort offices carried. Age she way earnestly the fulfilled extremely.",
"text": "Prevailed sincerity behaviour to so do principle mr. As departure at no propriety zealously my. On dear rent if girl view. First on smart there he sense. Earnestly enjoyment her you resources. Brother chamber ten old against. Mr be cottage so related minuter is. Delicate say and blessing ladyship exertion few margaret. Delight herself welcome against smiling its for. Suspected discovery by he affection household of principle perfectly he.",
"title": "DESCRIPTION OF SERVICE"
},
{
"subtitle": "TL;DR",
"summary": "Scarcely on striking packages by so property in delicate. Up or well must less rent read walk so be. Easy sold at do hour sing spot. Any meant has cease too the decay. Since party burst am it match. By or blushes between besides offices noisier as.",
"text": "It prepare is ye nothing blushes up brought. Or as gravity pasture limited evening on. Wicket around beauty say she. Frankness resembled say not new smallness you discovery. Noisier ferrars yet shyness weather ten colonel. Too him himself engaged husband pursuit musical. Man age but him determine consisted therefore. Dinner to beyond regret wished an branch he. Remain bed but expect suffer little repair.",
"title": "ACCEPTANCE OF TERMS"
},
{
"subtitle": "TL;DR",
"summary": "Luckily friends do ashamed to do suppose. Tried meant mr smile so. Exquisite behaviour as to middleton perfectly.",
"text": "He my polite be object oh change. Consider no mr am overcame yourself throwing sociable children. Hastily her totally conduct may. My solid by stuff first smile fanny. Humoured how advanced mrs elegance sir who. Home sons when them dine do want to. Estimating themselves unsatiable imprudence an he at an. Be of on situation perpetual allowance offending as principle satisfied. Improved carriage securing are desirous too.",
"title": "MODIFICATION OF TERMS"
},
{
"subtitle": "TL;DR",
"summary": "Improved own provided blessing may peculiar domestic. Sight house has sex never. No visited raising gravity outward subject my cottage mr be. Hold do at tore in park feet near my case.",
"text": "Extremely we promotion remainder eagerness enjoyment an. Ham her demands removal brought minuter raising invited gay. Contented consisted continual curiosity contained get sex. Forth child dried in in aware do. You had met they song how feel lain evil near. Small she avoid six yet table china. And bed make say been then dine mrs. To household rapturous fulfilled attempted on so. ",
"title": "REGISTRATION"
}
],
"header": "End User License Agreement",
"subheader": "Savings her pleased are several started females met. Short her not among being any. Thing of judge fruit charm views do. Miles mr an forty along as he. She education get middleton day agreement performed preserved unwilling. Do however as pleased offence outward beloved by present. By outward neither he so covered amiable greater. Juvenile proposal betrayed he an informed weddings followed. Precaution day see imprudence sympathize principles. At full leaf give quit to in they up."
},
"time_created": "2017-10-26T20:52:41-04:00",
"type": "End User License Agreement",
"version": 1
}
]
Natural Language Processing Service
Run Tests and Lints
./cjl test nlp_service
Warning: Have at the very least 8GB of RAM available to run nlp_service tests.
We recommend an i5 Broadwell CPU and above if you so wish to run tests locally.
The team used atomic commits and pushes while working on Natural Language Processing to run the tests on its continuous integration tool (Travis in this case).
Installing requirements
pip3 install -r requirements_test.txt
NLP API
Classify claim category
Extract a claim category from a user's message. Returns a question based on the claim category found, or a clarification question.
URL : /claim_category
Method : POST
Data constraints
Provide the conversation id and the message.
{
"conversation_id": 1,
"message": "I am being evicted"
}
Success Response
Code : 200 OK
Content examples
{
"message": "I see you're having problems with lease termination. Have you kept up with your rent payments?",
"progress": 0
}
Error Response
Code : 400 Bad Request
- Inputs not provided
Code : 404 Not Found
- Conversation doesn't exist
Submit message
Submits a user input to the NLP service. Returns the next question to ask, or a clarification question.
URL : /submit_message
Method : POST
Data constraints
Provide the conversation id and the message.
{
"conversation_id": 1,
"message": "My rent is $900 per month."
}
Success Response
Code : 200 OK
Content examples
{
"message": "Have you kept up with your rent payments?",
"progress": 10
}
Error Response
Code : 400 Bad Request
- Inputs not provided
Code : 404 Not Found
- Conversation doesn't exist
RASA JSON Tool
The util.parse_dataset.py module and the associated CreateJson class can be used to create json training data for RASA NLU.
Format
[meta]
() = entity_name1, entity_extractor(optional)
{} = entity_name2, entity_extractor(optional)
[regex_features]
name:regex
[entity_synonyms]
entity:synonym1, synonym2
[common_examples: intent_name1]
sentence1
sentence2
[common_examples: intent_name2]
sentence1
sentence2
- [] are reserved characters used to identify sections
- meta section allows for the definition of meta-characters that define entities
- regex_features are simply regex features
- entity_synonyms are simply entity synonyms
- common_examples:intent_name are common examples for a particular intent
Example
[meta]
() = money, ner_duckling
[regex_features]
money:$\d(.)?+|\d(.)?+$
[common_examples: true]
my landlord increased my rent by ($500)
i owe my landlord (40 dollars)
[common_examples: false]
i don't owe my landlord any money
i dont have any debts
no
Command Line Use
python3 -m util.parse_dataset <read_dir> <write_dir>
Example
python3 -m util.parse_dataset ~/Documents/ ~/Documents/Json/
DO NOT FORGET THE '/' AT THE END OF YOUR DIRECTORY
Outlier detection
As of April 10th 2018 the outlier detection is not being used by the NLP service
This is due to a lack of data of what is considered an "outlier answer".
Adding a new claim category to the product
Two kinds of claim categories:
-
Developed claim categories:
- Series of questions that the user answers to resolve facts
- Multiple outcomes dynamically calculated by the ml_service
- a conclusive view with a dashboard containing resolved facts and most similar legal cases to theirs
-
FAQs have:
- one long and developed answer resumed from websites such as Regie du logement, Educaloi or LikeHome.
- Add the claim category to nlp_service/controllers/nlp_controller.py in "conversation.claim_category" inside of the "classify_claim_category" function
- Define the new claim category inside of the class "ClaimCategory" in postgresql_db/models.py
- Define the new category inside of the the *.txt file in nlp_service/rasa/text/category (depending whether or not it is a category belonging to a tenant (category_tenant.txt) or a landlord (category_landlord.txt)) We recommend keeping track of FAQ vs developed categories by writing "faq_AbrievationOfSource_factname"
- Write in nlp_service/services/response_strings.py your response if the claim you wrote is an "FAQ"
- At this stage you should either have a complete FAQ or an empty developed claim category, which you'll have to add facts to! (following section)
Adding a new fact (includes adding new questions)
- Add new fact to postgresql_db/models.py as well as the type of answer you are expecting from it and the summary (displayed definition on the front-end)
- Add your new fact to nlp_service/services/response_strings.py in "fact_questions" by adding the question trying to answer the fact
- If not answerable by a generic "yes or no" add the fact as a {name_of_fact}.txt file in nlp_service/rasa/text/fact/individual
- If answerable by a generic "yes or no", add the fact name to nlp_service/init_rasa.py in "fact_names"
Adding a new outcome or a response (this section is only useful for developed claim categories)
- Add the outcome(s) you want to be checked by the ml_service to the desired developed claim categories in nlp_service/services/fact_service.py in "outcome_mapping"
- Tell the system what to say if the ml_service returns the outcome as "True" (it will happen) or "False" (it won't happen) in nlp_service/services/response_strings.py in "prediction"
Retrain models
The models are retrained every time the project is (re)built.
The training is initialized init.py whenever the train function's force_train parameter inside of nlp_service/rasa/rasa_classifier.py is set to true. The models are loaded in nlp_service/controllers/nlp_controller.py where force_train is initialized as false and initialize_interpreter is initialized as true.
Working with RASA
The team a core part of its Natural Language Processing component RASA NLU. Documentation available here. Active Gitter channel available here.
Configuration:
The team experimented with multiple pipelines and considered Spacy 2.0 by far superior to MITIE. Our config file can be found ~/nlp_service/rasa/config/rasa_config.json
Components:
- nlp_spacy: initializes spacy structures
- tokenizer_spacy: creation of tokens using Spacy
- intent_entity_featurizer_regex: uses regular expressions to aid in intent and entity classification (ONLY SUPPORTED BY NER_CRF)
- ner_crf: entity extractor using conditional random fields
- ner_synonyms: maps two or more entities to be extracted to have the same value
- intent_classifier_sklearn: classifies intents of the text being parsed
- duckling: extraction of pre-trained entities such as money, time, dates, etc.
We do not recommend "ner_spacy" as a replacement to "ner_crf" due to its absence of confidence scores for the entity extraction. We also strongly advise against using more than 1 thread or more than 1 process due to stability issues with duckling.
Achieving results:
Things to know that are not mentioned in RASA documentation:
- Proper usage of the intent_entity_featurizer_regex will often drastically improve intent confidence percentage (up to 40%)
- Regex on sections of common examples that are unique to a specific intent (e.g.Regex on the word "tax" that has an extremely large chance of only appearing when the user wants information concerning his RL-31 slip)
- Regex only actually helps with intent confidence ratio, not entity confidence. (This bit of information was obtained after a conversation with RASA contributors on gitter)
- Working with common examples
- I'm and Im and I am count as different words with Spacy. Avoid using those words in common examples.
- Capitals matter. Lower casing our data sets while continuously lower casing the user's input for NLP improved the confidence percentage drastically
- Avoid fluff (stop words) in the common examples for a proper word vector to be calculated. (e.g. deleting "can you help me with this?" at the end of the common examples for this will alter the vector calculated for the intent's common example.)
- Working with entities
- We strongly suggest using entity_synonyms not only for different variations of the entity you are attempting to extract but also for common spelling mistakes of the entities
Machine Learning Service
0. Table of Contents
1. Overview
The machine learning service is responsible for predicting the outcomes of a user's case.
Outcomes can either be categorized as either being True/False or by a numerical value. Whether a given outcome is boolean or integer is evaluated by a human and then given to the system beforehand (See section 1.6). Therefore, this sub-system makes use of both classifiers and regressors to make predictions. The inputs for both the classifier the and regressor are the facts obtained by the user's inputs. An array of outcomes is then returned.
1.1 Data Representation
The input and output data are all represented numerically despite having the potential to be boolean values. Below illustrates how values are treated:
0 --> False / Null 1 --> True (n > 1) --> True AND Numerical
Numerical Values consist of:
- Dates / Time (in months)
- Money (in $)
1.2 Facts / Input
The inputs are stored in a numpy array consisting of only integers with the possible values listed in section 1.1. Every index of the array represents a different fact/input data point which will be used by the machine learning. The indexes of the facts are determined once the precedents are tagged (they are subject to change orders upon re-tagging the data). An input array will look as such:
[fact_1, fact_2, ..., fact_n]
Here is an example to retrieve the labels for each column:
from feature_extraction.post_processing.regex.regex_tagger import TagPrecedents
indexes = TagPrecedents().get_intent_index()
# print sample of the content
for i in index['outcomes_vector'][:3]:
print(i)
- output:
(0, 'additional_indemnity_money', 'bool')
(1, 'declares_resiliation_is_correct', 'bool')
(2, 'landlord_serious_prejudice', 'bool')
- structure for 'indexes' variable:
{
'outcomes_vector': [
(array_index, column_label, column_type),
(array_index, column_label, column_type)
],
'facts_vector': [
(array_index, column_label, column_type),
(array_index, column_label, column_type)
]
}
1.3 outcomes / output
Similarly to section 1.2, the output will be an array of integers of the size of all the number of outcomes supported by the system. Please refer to section 1.1 for other inquiries.
1.4 Classification
A multiclassifier is used to predict all outcomes. In the background, SkLearn uses a different estimator per outcome in order to perform this task. When obtaining a prediction, ALL outcomes are either classified as True or False. Even the numerical outcomes are classified as such. If an outcome is expected to be a numerical value AND that outcome is True then the input is passed to the appropriate regressor in order to predict the outcome's integer value. If the previous condition isn't met then no further data manipulation is necessary for a given outcome and the classifier's prediction is simply returned for this column.
Adding a new classifier New classifiers will be automatically trained upon adding regexes. See section 1.6.
1.5 Regression
The regressors are only used if the classifier predicted an outcome as True. The reason for this implementation is because the regressors are trained on bias data where we know the outcome was True. Therefore the input data must also be biased towards the same end goal.
During training, only for regression, the average values of every fact of the data set is obtained. The vector will look as such:
[average_column_1, average_column_2, ..., average_column_n]
This vector is kept in binary format and can be retrieved this way:
from util.file import Load
mean_facts_vector = Load.load_binary('model_metrics.bin')['regressor'][<name of the regressor>]['mean_facts_vector']
Regression fine tuning When making a regressive prediction, the user's input is entered as an array of numerical values as in section 1.2.
-
Wherever a 0 is encountered in the user's input, we replace it with the average value of it's column.The purpose of this strategy is to predict more accurate results when the regressor is used. When a prediction is performed with missing input we then replace that missing input with it's average value to get a better fit on the curve.
-
During training, outliers in the dataset are removed. Outliers are determined by:
abs(outcome - average_of_outcomes) > (2 * std_of_outcomes)
Adding a new regressor The regressor's estimators are crafted manually as opposed to using the SkLearn's wrapper as in section 1.4. Because the regressors require much more discreet attention, this approach was necessary. A custom wrapper is instead written, and every new regressor can inherit the AbstractRegressor Class.
- Code new regressor (inherit abstract_regressor.py)
- Update multi_output_regression.py to accomodate new class
1.6 Adding new columns (input/output)
Adding new columns is fairly simply. In the feature_extraction/post_processing/regex/regex_lib.py file simply append your regex to the regex_facts or regex_outcomes list. The syntax is the following:
regex_facts = [
(
<column_label>, [
re.compile(<regex_1>, re.IGNORECASE),
re.compile(<regex_2>, re.IGNORECASE),
re.compile(<regex_n>, re.IGNORECASE)
],
<data_type>),
),
(
<column_label>, [
re.compile(<regex_1>, re.IGNORECASE),
re.compile(<regex_2>, re.IGNORECASE),
re.compile(<regex_n>, re.IGNORECASE)
],
<data_type>),
),
]
Type as many regular expressions as needed to cover all the dataset. Upon tagging the data a percentage of lines tagged will be displayed.
Note: <data_type> are the following strings:
- "BOOLEAN"
- "MONEY"
- "DATE"
The newly added columns in the regex_lib.py file will then automatically be used the next time the machine learning performs its training on the condition that the data has be re-post-processed. Be sure to create a regressor if you want to predict "DATE" or "MONEY" though (See section 1.5).
2. DATA
All persistent machine learning data are stored as binaries. In order to centralize this information it is advised to upload the models on a server. These models may then be fetched in the init.py script in the source directory (do not confuse with __init__.py script). Simply append your download link to the binary_urls list found in this file.
2.1 Accessing binary data
To load any binary files, first make sure it is stored in the binary/data/ folder. This should be performed automatically by the init.py. Then simply use the following:
from util.file import Load
Load.load_binary(<binary_file_name>)
2.2 Saving binary data
To save a binary file use the following:
from util.file import Save
Save().save_binary(<desired_binary_file_name>, model)
The output directory will be binary/data/ by default.
2.3 Global Variables
Some global variables are listed in util/constant.py
2.4 Binary file content
classifier_labels.bin
{
outcome_index_0 <int>: (
column_label <str>,
column_type <str>
),
outcome_index_n <int>: (
column_label <str>,
column_type <str>
),
}
model_metrics.bin
{
'data_set':{
'size': <int>
},
'classifier':{
classifier_name_0 <str>: {
'prediction_accuracy': <float>
},
classifier_name_n <str>: {
'prediction_accuracy': <float>
}
},
'regressor':{
regressor_name_0 <str> :{
'std': <float>,
'variance': <float>,
'mean_facts_vector': <numpy.array>
},
regressor_name_n <str> :{
'std': <float>,
'variance': <float>,
'mean_facts_vector': <numpy.array>
}
}
multi_class_svm_model.bin Used to predict classifier results
from util.file import Load
from sklearn.preprocessing import binarize
model = Load.load_binary("multi_class_svm_model.bin")
classifier_labels = Load.load_binary('classifier_labels.bin')
input_vector = [fact_1, fact_2, fact_n, ...]
data = binarize([input_vector], threshold=0)
prediction = model.predict(data)
precedent_vectors.bin
{
<precedent_id> <str>:{
'outcomes_vector': numpy.array,
'facts_vector': numpy.array,
'file_number': <str>,
'name': AZ-********.txt <str>
}
}
similarity_case_numbers.bin An array of all case numbers. This is used to map the indices (returned by the similarity model) to case numbers
[
'AZ-XXXXXX',
'AZ-XXXXXX',
'AZ-XXXXXX',
'AZ-XXXXXX'
...
]
similarity_model.bin
Case similarity comparator. Uses NearestNeighbour algorithm. Set to return the 5 nearest neighbours.
Input: A vector, which is the concatenation of the vector containing facts and the vector containing outcomes Output: The indices (which have a direct mapping to case numbers using similarity_case_numbers [see above]) of the 5 most similar cases
from util.file import Load
model = Load.load_binary("similarity_model.bin")
facts_vector = [fact_1, fact_2, fact_n, ...]
outcomes_vector = [outcome_1, outcome_2, outcome_n, ...]
input_vector = facts_vector + outcomes_vector
model.kneighbors(input_vector)
*_scaler.bin Every machine learning model requires a scaler to transform the data into values which will exponentially increase training time.
*_regressor.bin Models used to predict regressive results
from util.file import Load
from keras.models import load_model
import os
file_path = os.path.join(Path.binary_directory, '<regressor_name>')
regressor = load_model(file_path)
scaler = Load.load_binary('<your_scaler>')
model = AbstractRegressor._create_pipeline(scaler, regressor)
input_data = [fact_1, fact_2, ..., fact_n]
prediction = model.predict([input_data])
3. Installation Instructions
- Add Cyberjustice Lab username as environment variables:
export CJL_USER={USERNAME}
either to your .bashrc or run it as a command - Add Cyberjustice Lab password as environment variables:
export CJL_PASS={PASSWORD}
either to your .bashrc or run it as a command - Run
pip3 install -r requirements.txt
- Run
pip3 install -r requirements_test.txt
4. File Structure
----| data <all data input and output>
--------| raw
------------| text_bk <extract precedents here>
--------| binary <all saved binarized model/data>
--------| cache <temp files>
--------| test <used for unit testing>
----| feature_extraction <all data manipulation before supervised training>
--------| feature_extraction.py <driver for feature extraction (using 3 drivers above)>
--------| pre_processing
------------| pre_processing_driver
----------------| filter_precedent
--------------------| precedent_directory_cleaner.py
--------| post_processing
------------| post_processing_driver.py <driver for post_processing>
----------------| regex
--------------------| regex_entity_extraction.py
--------------------| regex_lib.py
--------------------| regex_tagger.py
----| model_learning <supervised training>
------------| classifier
----------------| classifier_dirver.py
----------------| multi_output
--------------------| multi_class_svm.py
------------| regression
----------------| regression_driver.py
----------------| single_output_regression
--------------------| abtract_regressor.py
--------------------| tenant_pays_landlord.py
--------------------| additional_indemnity.py
----------------| multi_output
--------------------| multi_output_regression.py
------------| similar_finder
----------------| similar_finder.py
----| util <common tool>
------------| log.py <logging tool>
------------| file.py <file save and load>
------------| constant.py <global variables>
----| web
--------| ml_controller.py
init.py
main.py <driver for the pipeline (feature extraction + model training>
5. ML API
5.1 Predict Outcome
Predict the outcome based on given facts and demands. Returns an array of predicted outcomes as well as similar precedents. The precedents have distances assigned to them. The lower the distance, the more similar it is.
URL : /predict
Method : POST
Data constraints
Provide facts_vector and demands_vector, with key values for each fact/demand.
{
"facts" : {
"absent" : 1,
"apartment_impropre" : 0,
"apartment_infestation" : 1,
"asker_is_landlord" : 1,
"asker_is_tenant" : 1,
"bothers_others" : 1,
"disrespect_previous_judgement" : 1,
"incorrect_facts" : 1,
"landlord_inspector_fees" : 1,
"landlord_notifies_tenant_retake_apartment" : 1,
"landlord_pays_indemnity" : 1,
"landlord_prejudice_justified" : 1,
"landlord_relocation_indemnity_fees" : 1,
"landlord_rent_change" : 1,
"landlord_rent_change_doc_renseignements" : 1,
"landlord_rent_change_piece_justification" : 1,
"landlord_rent_change_receipts" : 0,
"landlord_retakes_apartment" : 1,
"landlord_retakes_apartment_indemnity" : 1,
"landlord_sends_demand_regie_logement" : 0,
"landlord_serious_prejudice" : 1,
"lease" : 1,
"proof_of_late" : 1,
"proof_of_revenu" : 0,
"rent_increased" : 1,
"tenant_bad_payment_habits" : 1,
"tenant_continuous_late_payment" : 1,
"tenant_damaged_rental" : 1,
"tenant_dead" : 1,
"tenant_declare_insalubre" : 1,
"tenant_financial_problem" : 0,
"tenant_group_responsability" : 1,
"tenant_individual_responsability" : 1,
"tenant_is_bothered" : 1,
"lack_of_proof" : 1,
"tenant_landlord_agreement" : 0,
"tenant_lease_fixed" : 1,
"tenant_lease_indeterminate" : 1,
"tenant_left_without_paying" : 0,
"tenant_monthly_payment" : 1,
"tenant_negligence" : 1,
"tenant_not_request_cancel_lease" : 1,
"tenant_owes_rent" : 1,
"tenant_refuses_retake_apartment" : 1,
"tenant_rent_not_paid_less_3_weeks" : 1,
"tenant_rent_not_paid_more_3_weeks" : 0,
"tenant_rent_paid_before_hearing" : 1,
"tenant_violence" : 1,
"tenant_withold_rent_without_permission" : 1,
"violent" : 1
}
}
Success Response
Code : 200 OK
Content examples
{
"outcomes_vector": {
"additional_indemnity_money": "221",
"authorize_landlord_retake_apartment": "0",
"declares_housing_inhabitable": "0",
"declares_resiliation_is_correct": "0",
"landlord_prejudice_justified": "1",
"landlord_retakes_apartment_indemnity": "0",
"landlord_serious_prejudice": "0",
"orders_expulsion": "1",
"orders_immediate_execution": "1",
"orders_landlord_notify_tenant_when_habitable": "0",
"orders_resiliation": "1",
"orders_tenant_pay_first_of_month": "0",
"tenant_ordered_to_pay_landlord": "643",
"tenant_ordered_to_pay_landlord_legal_fees": "80"
},
"probabilities_vector": {
"additional_indemnity_money": "0.93",
"authorize_landlord_retake_apartment": "1.0",
"declares_housing_inhabitable": "1.0",
"declares_resiliation_is_correct": "0.94",
"landlord_prejudice_justified": "0.74",
"landlord_retakes_apartment_indemnity": "1.0",
"landlord_serious_prejudice": "1.0",
"orders_expulsion": "0.88",
"orders_immediate_execution": "0.72",
"orders_landlord_notify_tenant_when_habitable": "1.0",
"orders_resiliation": "0.91",
"orders_tenant_pay_first_of_month": "0.99",
"tenant_ordered_to_pay_landlord": "0.99",
"tenant_ordered_to_pay_landlord_legal_fees": "0.91"
},
"similar_precedents": [
{
"distance": 0.3423500835013649,
"facts": {
"apartment_dirty": false,
"asker_is_landlord": true,
"asker_is_tenant": false,
"bothers_others": false,
"disrespect_previous_judgement": false,
"landlord_inspector_fees": "0.0",
"landlord_notifies_tenant_retake_apartment": false,
"landlord_pays_indemnity": false,
"landlord_relocation_indemnity_fees": "0.0",
"landlord_rent_change": false,
"landlord_rent_change_doc_renseignements": false,
"landlord_retakes_apartment": false,
"landlord_sends_demand_regie_logement": false,
"rent_increased": false,
"signed_proof_of_rent_debt": false,
"tenant_continuous_late_payment": false,
"tenant_damaged_rental": false,
"tenant_dead": false,
"tenant_financial_problem": false,
"tenant_group_responsability": false,
"tenant_individual_responsability": true,
"tenant_is_bothered": false,
"tenant_lease_indeterminate": false,
"tenant_left_without_paying": false,
"tenant_monthly_payment": "900.0",
"tenant_not_paid_lease_timespan": "0.0",
"tenant_owes_rent": "970.0",
"tenant_refuses_retake_apartment": false,
"tenant_rent_not_paid_more_3_weeks": true,
"tenant_sends_demand_regie_logement": false,
"tenant_withold_rent_without_permission": false,
"violent": false
},
"outcomes": {
"additional_indemnity_money": "70.0",
"authorize_landlord_retake_apartment": false,
"declares_housing_inhabitable": false,
"declares_resiliation_is_correct": false,
"landlord_prejudice_justified": true,
"landlord_retakes_apartment_indemnity": false,
"landlord_serious_prejudice": false,
"orders_expulsion": true,
"orders_immediate_execution": true,
"orders_landlord_notify_tenant_when_habitable": false,
"orders_resiliation": true,
"orders_tenant_pay_first_of_month": false,
"tenant_ordered_to_pay_landlord": "970.0",
"tenant_ordered_to_pay_landlord_legal_fees": "88.0"
},
"precedent": "AZ-51211608"
},
{
"distance": 0.3429019324281239,
"facts": {
"apartment_dirty": false,
"asker_is_landlord": true,
"asker_is_tenant": false,
"bothers_others": false,
"disrespect_previous_judgement": false,
"landlord_inspector_fees": "0.0",
"landlord_notifies_tenant_retake_apartment": false,
"landlord_pays_indemnity": false,
"landlord_relocation_indemnity_fees": "0.0",
"landlord_rent_change": false,
"landlord_rent_change_doc_renseignements": false,
"landlord_retakes_apartment": false,
"landlord_sends_demand_regie_logement": false,
"rent_increased": false,
"signed_proof_of_rent_debt": false,
"tenant_continuous_late_payment": false,
"tenant_damaged_rental": false,
"tenant_dead": false,
"tenant_financial_problem": false,
"tenant_group_responsability": false,
"tenant_individual_responsability": true,
"tenant_is_bothered": false,
"tenant_lease_indeterminate": false,
"tenant_left_without_paying": false,
"tenant_monthly_payment": "735.0",
"tenant_not_paid_lease_timespan": "0.0",
"tenant_owes_rent": "873.0",
"tenant_refuses_retake_apartment": false,
"tenant_rent_not_paid_more_3_weeks": true,
"tenant_sends_demand_regie_logement": false,
"tenant_withold_rent_without_permission": false,
"violent": false
},
"outcomes": {
"additional_indemnity_money": "0.0",
"authorize_landlord_retake_apartment": false,
"declares_housing_inhabitable": false,
"declares_resiliation_is_correct": false,
"landlord_prejudice_justified": true,
"landlord_retakes_apartment_indemnity": false,
"landlord_serious_prejudice": false,
"orders_expulsion": true,
"orders_immediate_execution": true,
"orders_landlord_notify_tenant_when_habitable": false,
"orders_resiliation": true,
"orders_tenant_pay_first_of_month": false,
"tenant_ordered_to_pay_landlord": "873.0",
"tenant_ordered_to_pay_landlord_legal_fees": "80.0"
},
"precedent": "AZ-51176404"
},
{
"distance": 0.49114649102172725,
"facts": {
"apartment_dirty": false,
"asker_is_landlord": true,
"asker_is_tenant": false,
"bothers_others": false,
"disrespect_previous_judgement": false,
"landlord_inspector_fees": "0.0",
"landlord_notifies_tenant_retake_apartment": false,
"landlord_pays_indemnity": false,
"landlord_relocation_indemnity_fees": "0.0",
"landlord_rent_change": false,
"landlord_rent_change_doc_renseignements": false,
"landlord_retakes_apartment": false,
"landlord_sends_demand_regie_logement": false,
"rent_increased": false,
"signed_proof_of_rent_debt": false,
"tenant_continuous_late_payment": false,
"tenant_damaged_rental": false,
"tenant_dead": false,
"tenant_financial_problem": false,
"tenant_group_responsability": false,
"tenant_individual_responsability": true,
"tenant_is_bothered": false,
"tenant_lease_indeterminate": false,
"tenant_left_without_paying": false,
"tenant_monthly_payment": "770.0",
"tenant_not_paid_lease_timespan": "0.0",
"tenant_owes_rent": "1360.0",
"tenant_refuses_retake_apartment": false,
"tenant_rent_not_paid_more_3_weeks": true,
"tenant_sends_demand_regie_logement": false,
"tenant_withold_rent_without_permission": false,
"violent": false
},
"outcomes": {
"additional_indemnity_money": "590.0",
"authorize_landlord_retake_apartment": false,
"declares_housing_inhabitable": false,
"declares_resiliation_is_correct": false,
"landlord_prejudice_justified": true,
"landlord_retakes_apartment_indemnity": false,
"landlord_serious_prejudice": false,
"orders_expulsion": true,
"orders_immediate_execution": true,
"orders_landlord_notify_tenant_when_habitable": false,
"orders_resiliation": true,
"orders_tenant_pay_first_of_month": false,
"tenant_ordered_to_pay_landlord": "1360.0",
"tenant_ordered_to_pay_landlord_legal_fees": "81.0"
},
"precedent": "AZ-51212451"
},
{
"distance": 0.49200755901067444,
"facts": {
"apartment_dirty": false,
"asker_is_landlord": true,
"asker_is_tenant": false,
"bothers_others": false,
"disrespect_previous_judgement": false,
"landlord_inspector_fees": "0.0",
"landlord_notifies_tenant_retake_apartment": false,
"landlord_pays_indemnity": false,
"landlord_relocation_indemnity_fees": "0.0",
"landlord_rent_change": false,
"landlord_rent_change_doc_renseignements": false,
"landlord_retakes_apartment": false,
"landlord_sends_demand_regie_logement": false,
"rent_increased": false,
"signed_proof_of_rent_debt": false,
"tenant_continuous_late_payment": false,
"tenant_damaged_rental": false,
"tenant_dead": false,
"tenant_financial_problem": false,
"tenant_group_responsability": false,
"tenant_individual_responsability": true,
"tenant_is_bothered": false,
"tenant_lease_indeterminate": false,
"tenant_left_without_paying": false,
"tenant_monthly_payment": "945.0",
"tenant_not_paid_lease_timespan": "0.0",
"tenant_owes_rent": "1290.0",
"tenant_refuses_retake_apartment": false,
"tenant_rent_not_paid_more_3_weeks": true,
"tenant_sends_demand_regie_logement": false,
"tenant_withold_rent_without_permission": false,
"violent": false
},
"outcomes": {
"additional_indemnity_money": "345.0",
"authorize_landlord_retake_apartment": false,
"declares_housing_inhabitable": false,
"declares_resiliation_is_correct": false,
"landlord_prejudice_justified": true,
"landlord_retakes_apartment_indemnity": false,
"landlord_serious_prejudice": false,
"orders_expulsion": true,
"orders_immediate_execution": true,
"orders_landlord_notify_tenant_when_habitable": false,
"orders_resiliation": true,
"orders_tenant_pay_first_of_month": false,
"tenant_ordered_to_pay_landlord": "1290.0",
"tenant_ordered_to_pay_landlord_legal_fees": "72.0"
},
"precedent": "AZ-51201834"
},
{
"distance": 0.4933548500076463,
"facts": {
"apartment_dirty": false,
"asker_is_landlord": true,
"asker_is_tenant": false,
"bothers_others": false,
"disrespect_previous_judgement": false,
"landlord_inspector_fees": "0.0",
"landlord_notifies_tenant_retake_apartment": false,
"landlord_pays_indemnity": false,
"landlord_relocation_indemnity_fees": "0.0",
"landlord_rent_change": false,
"landlord_rent_change_doc_renseignements": false,
"landlord_retakes_apartment": false,
"landlord_sends_demand_regie_logement": false,
"rent_increased": false,
"signed_proof_of_rent_debt": false,
"tenant_continuous_late_payment": false,
"tenant_damaged_rental": false,
"tenant_dead": false,
"tenant_financial_problem": false,
"tenant_group_responsability": false,
"tenant_individual_responsability": true,
"tenant_is_bothered": false,
"tenant_lease_indeterminate": false,
"tenant_left_without_paying": false,
"tenant_monthly_payment": "800.0",
"tenant_not_paid_lease_timespan": "0.0",
"tenant_owes_rent": "1400.0",
"tenant_refuses_retake_apartment": false,
"tenant_rent_not_paid_more_3_weeks": true,
"tenant_sends_demand_regie_logement": false,
"tenant_withold_rent_without_permission": false,
"violent": false
},
"outcomes": {
"additional_indemnity_money": "0.0",
"authorize_landlord_retake_apartment": false,
"declares_housing_inhabitable": false,
"declares_resiliation_is_correct": false,
"landlord_prejudice_justified": true,
"landlord_retakes_apartment_indemnity": false,
"landlord_serious_prejudice": false,
"orders_expulsion": true,
"orders_immediate_execution": true,
"orders_landlord_notify_tenant_when_habitable": false,
"orders_resiliation": true,
"orders_tenant_pay_first_of_month": false,
"tenant_ordered_to_pay_landlord": "0.0",
"tenant_ordered_to_pay_landlord_legal_fees": "92.0"
},
"precedent": "AZ-51391660"
}
]
}
Error Response
Code : 400 Bad Request
- Inputs not provided
Code : 404 Not Found
- Conversation doesn"t exist
5.2 Get Fact Weights
Get the weights of every outcome sorted by descending order of importance
URL: /weights
Method: GET
Data constraints
None
Success Response
Code : 200 OK
Content examples
{
"additional_indemnity_money": {
"important_facts": [
"asker_is_landlord",
"tenant_withold_rent_without_permission",
"tenant_refuses_retake_apartment",
"tenant_monthly_payment",
"tenant_not_paid_lease_timespan"
],
"additional_facts": [
"tenant_financial_problem",
"tenant_owes_rent",
"asker_is_tenant",
"tenant_damaged_rental",
"tenant_individual_responsability",
"signed_proof_of_rent_debt",
"tenant_lease_indeterminate",
"tenant_dead",
"tenant_is_bothered",
"bothers_others"
]
}
}
5.3 Get Anti Facts
Get the anti facts
Left hand side always initialized to 1 and right hand side to 0
URL: /antifacts
Method: GET
Data constraints
None
Success Response
Code : 200 OK
Content examples
{
"tenant_rent_not_paid_less_3_weeks": "tenant_rent_not_paid_more_3_weeks",
"tenant_lease_fixed": "tenant_lease_indeterminate",
"not_violent": "violent",
"tenant_individual_responsability": "tenant_group_responsability"
}
5.4 Get Machine Learning Statistics
Get the ml stats
Used to obtain:
- Size of data set
- Variance of regression outcomes
- Standard deviation of regression outcomes
- Mean of regression outcomes
- Prediction accuracy of each classifier
URL: /statistics
Method: GET
Data constraints
None
Success Response
Code : 200 OK
Content examples
{
"classifier": {
"additional_indemnity_money": {
"prediction_accuracy": 79.8400199975003
},
"authorize_landlord_retake_apartment": {
"prediction_accuracy": 99.48756405449319
},
"declares_housing_inhabitable": {
"prediction_accuracy": 99.95000624921884
},
"declares_resiliation_is_correct": {
"prediction_accuracy": 91.83852018497687
},
"landlord_prejudice_justified": {
"prediction_accuracy": 81.07736532933383
},
"landlord_retakes_apartment_indemnity": {
"prediction_accuracy": 99.72503437070367
},
"landlord_serious_prejudice": {
"prediction_accuracy": 96.35045619297588
},
"orders_expulsion": {
"prediction_accuracy": 91.55105611798525
},
"orders_immediate_execution": {
"prediction_accuracy": 84.32695913010873
},
"orders_landlord_notify_tenant_when_habitable": {
"prediction_accuracy": 100
},
"orders_resiliation": {
"prediction_accuracy": 93.48831396075491
},
"orders_tenant_pay_first_of_month": {
"prediction_accuracy": 98.05024371953506
},
"tenant_ordered_to_pay_landlord": {
"prediction_accuracy": 83.82702162229721
},
"tenant_ordered_to_pay_landlord_legal_fees": {
"prediction_accuracy": 90.32620922384702
}
},
"data_set": {
"size": 40003
},
"regressor": {
"additional_indemnity_money": {
"mean": 1477.7728467101024,
"std": 1927.8147997893939,
"variance": 3716469.9022870203
},
"tenant_pays_landlord": {
"mean": 2148.867088064977,
"std": 2129.510243010276,
"variance": 4534813.8750856845
}
}
}
6. Using the Command Line
* denotes optional arguments
From the source directory JusticeAi/src/ml_service/ you may run:
- Pre Processing python main.py -pre [number of files | empty for all]
- Post Processing i. Each fact and outcome is listed with their number of occurences ii. % of tagged lines is displayed iii. python3 main.py -post [number of files | empty for all]
- Training **Note: Always train svm before the sf and the svr Testing results are displayed: i. classifier: accuracy, F1, precision, recall ii. regression: absolute error, r2 arguments: i. --svm: classifier ii. --svr: regressor iii. --sf: similarity finder iv. --all: classifier, regressor, similarity finder python3 main.py -train [data size | empty for all] --svm* --sf* --svr* --all*
PostgreSQL Database
Connect via command line
Bring up all services:
./cjl up -d
Connect via psql
:
./cjl run --rm postgresql_db "psql -h postgresql_db -U postgres"
The above command will prompt you to enter the database password.
SQL script backup via pg_dump
:
export PGPASSWORD=$(printf '%s' "$POSTGRES_PASSWORD")
./cjl run --rm -e PGPASSWORD='$PGPASSWORD' postgresql_db "pg_dump -h postgresql_db -U postgres -p 5432"
Optical Character Recognition Service
Run Tests and Lints
export COMPOSE_FILE=ci
./cjl up -d && ./cjl run ocr_service
OCR API
Extract Text
From provided image data, returns the text extracted from this data as a string.
URL : /ocr/extract_text
Method : POST
Headers : multipart/form-data
Data constraints
Provide the 'file' key with image data data as the value.
Success Response
Code : 200 OK
Error Response
Code : 400 Bad Request
- No file key or no image data provided
Web Client Service
Run Tests and Lints
export COMPOSE_FILE=ci
./cjl up -d && ./cjl run web_client
Technologies
The following technologies are in use in this service:
Bootstrap
Bootstrap is an open source front end framework developed by Twitter. It contains styling for various common web components, such as forms and inputs, as well as providing a convenient grid system that greatly facilitates web page styling and layout.
- Alternatives: Foundation Framework, pure.css, skeleton
- Reason Chosen:
- Team member’s past experiences
- Industry standard
Vue.js
(Vue.js)[https://vuejs.org/] is an open source front end framework for building single page applications. It leverages component based architecture that allows for the creation of an interactive website. Its primary purpose will be to power the visible portion of the chatbot, displaying messages, sending messages to the server, and prompting the user for various interactions such as answering questions or providing files to use as evidence.
- Alternatives: AngularJS, Angular 4, ReactJs
- Reason Chosen:
- Low learning curve
- High performance
- Small footprint and minimal API
Beta Server
Requires the following installed on the host system:
- SQLite
- Python3
SETUP
pip install -r requirements.txt
FLASK_APP=app.py flask run
REST API DOCUMENTATTION
POST /question
Inserts a new user-generated question. Returns that user's ID.
Example request payload
{
"question": "Is it okay for a landlord to ask for a security deposit?"
}
Success response payload
{
"id": "5cd8a900-8a18-41b3-abb8-bf0307918afc"
}
Error response status codes
415
if request does not contain valid JSON422
if thequestion
key is not present422
if thequestion
value is too long
PUT /email
Updates a user's email address based on their ID.
If an ID is provided, that ID's record is updated. If no ID is provided, a new record is created.
Example request payload
{
"id": "5c17bfd0-87d0-4493-a312-f3f32323fff2",
"email": "test@test.com"
}
Success response payload
{
"id": "5c17bfd0-87d0-4493-a312-f3f32323fff2"
}
Error response status codes
415
if request does not contain valid JSON422
if theemail
key is not present422
if theemail
value is too long
PUT /subscription
Updates a user's subscription status based on their ID. 1
is subscribed, 0
is not subscribed.
If an ID is provided, that ID's record is updated. If no ID is provided, a new record is created.
Example request payload
{
"id": "5c17bfd0-87d0-4493-a312-f3f32323fff2",
"is_subscribed": 1
}
Success response payload
{
"id": "5c17bfd0-87d0-4493-a312-f3f32323fff2"
}
Error response status codes
415
if request does not contain valid JSON422
if theis_subscribed
key is not present422
if theis_subscribed
key is not an integer
PUT /legal
Updates a user's status on whether they are a legal professional based on their ID. 1
is a legal professional, 0
is not.
If an ID is provided, that ID's record is updated. If no ID is provided, a new record is created.
Example request payload
{
"id": "5c17bfd0-87d0-4493-a312-f3f32323fff2",
"is_legal_professional": 1
}
Success response payload
{
"id": "5c17bfd0-87d0-4493-a312-f3f32323fff2"
}
Error response status codes
415
if request does not contain valid JSON422
if theis_subscribed
key is not present422
if theis_subscribed
key is not an integer
Troubleshoot
This page presents users and devs alike with an amalgam of quick fixes for the various issues we ran into.
Docker
Reset
Docker remove all images:
docker rmi $(docker images -a -q)
Docker remove all Containers:
docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)
Docker remove all volumes:
docker volume rm $(docker volume ls -f)
Reset #2
Docker remove all except volumes:
sudo docker system prune
Docker remove all volumes:
sudo docker volume prune
Docker won't build without super user permissions
- Add the docker group if it doesn't already exist:
sudo groupadd docker
- Add the $USER you'd like to use to the docker group
sudo gpasswd -a $USER docker
- a. Log yourself into the new docker group:
newgrp docker
- b. Log out and log in to the user you just added to the docker group.
- Test if you can run docker without su privileges by typing:
docker run hello-world
PostgreSQL
DB reset
In root directory
./cjlean db-reset
DB reset alternative
- Get into postgres container:
docker exec -it <CONTAINER_ID> bash
- Enter postgres command line:
psql postgres postgres
(if asked for password enter DEV_PASS_NOT_SECRET) - Type in order:
DROP SCHEMA public CASCADE;
CREATE SCHEMA public;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO public;
- Type:
\q
to exit psql
Environment variable errors
- Ensure when building that you did not build with root
- Ensure the environment variables are set up in ~/.bashrc
- If you built with root,
./cjl clean
with root and./cjl build --no-cache
out of root
We've noticed that sometimes the database takes ~30 sec to create the models at runtime and be ready to accept connections, but the application services have thrown an error due to the requirement of creating a database connection. If you simply ./cjl down && ./cjl up
, this problem goes away, but it may be worth having the application servers stall and wait as they perform a health check on the database before attempting to create a database connections.