- Feb 05, 2025
-
-
Robert Bossy authored
-
Robert Bossy authored
-
Robert Bossy authored
-
- Jul 05, 2024
-
-
Marine Courtin authored
Fix for the bug encountered by Stephan + rewriting of the whole building of the bioagressors' table (with consequences on the alvisNLP plans for the annotation).
-
Marine Courtin authored
If the date still contains YYYY or MM, we try to look for the missing info the metadata of the bulletin. The original, purely text-based normalisation is still available under nav:features:ISO{0} (could be used in the rdf export to attach a prediction quality).
-
Marine Courtin authored
Preferrable to do this early so I don't have to keep repercuting this change in the various exports.
-
- Jun 12, 2024
-
-
BERNARD Stephan authored
-
- Jun 06, 2024
-
-
BERNARD Stephan authored
Ajout d'une version de `src/alvisNLP/utils/query_endpoints.py` compatible avec l'hébergement de FCU, PPDO et des BSV sur le serveur openstack du CATI CODEX.
-
- May 27, 2024
-
-
BERNARD Stephan authored
-
BERNARD Stephan authored
Adaptation des définitions du dataset en vue d'une migration vers opendata.inrae.fr/bsv-def et opendata.inrae.fr/bsv-res.
-
- May 22, 2024
-
-
BERNARD Stephan authored
-
- May 15, 2024
-
-
BERNARD Stephan authored
-
BERNARD Stephan authored
-
- Jan 23, 2024
-
-
Marine Courtin authored
-
Marine Courtin authored
The list of manual bioagressors is integrated, with all that it entails (lemmatizing vernacular names, looking for entity types...). It's not pretty but it works.
-
- Jan 08, 2024
-
-
Marine Courtin authored
-
Marine Courtin authored
Document `./2014/ProgrammeLaboVert_cle058f1c.html` has been removed as it's not part of the vitials corpus. I don't know why it was there in the first place, perhaps in an older version of the triplestore it was put there ?
-
Marine Courtin authored
- Sorted the entries - Added some entries that introduce noise in the annotation
-
Marine Courtin authored
Removing mentions inside of other mentions E.g "phytoplasma solani" is removed when "candidatus phytoplasma solani" is detected
-
Marine Courtin authored
Case folding is acceptable on every token's first caracter e.g Candidatus Phytoplasma Solani -> Candidatus phytoplasma solani is detected.
-
Marine Courtin authored
For timespans of the form "du 12 au 15 mai": - new normalisation in @predicted-ISO where the first month corresponds to the second month
-
Marine Courtin authored
-
Marine Courtin authored
-
Marine Courtin authored
- manual removal of a label - code is exported
-
- Jan 05, 2024
-
-
Marine Courtin authored
- compound-dates of the form \d\d/\d\d could be interpreted as both month-year and day-month, day-month is preferred. - dates containing \n or \t are removed (e.g JUIN\n25) - dates containing both \ and a whitespace are removed (04/08 19)
-
Marine Courtin authored
Entities are represented as their own nodes. Link between the actual tokens and the entity nodes are encoded as semantic relations with label `entity`.
-
- Jan 04, 2024
-
-
Marine Courtin authored
Also remove the duplicates between NCBI and taxref prioritising the taxref one. We could check that the norms are equivalent but don't do it currently.
-
Marine Courtin authored
-
Marine Courtin authored
- Noisy "maturity" label is dealt with - Noisy Baggiolini codes only kept when they're unambiguously a stage
-
Marine Courtin authored
- Check the normalisation - Add to the entities layer with appropriate features for rdf export
-
Marine Courtin authored
-
- Dec 18, 2023
-
-
Marine Courtin authored
-
Marine Courtin authored
-
Marine Courtin authored
-
Marine Courtin authored
-
Marine Courtin authored
Resources are nammed according to the patterns we defined together. With one exception : we don't put the labels of the normalisation in the URI.
-
Marine Courtin authored
-
Marine Courtin authored
-
- Dec 14, 2023
-
-
Marine Courtin authored
-
Marine Courtin authored
-