Methodology

How We Score Ingredient Safety

Our methodology for the 43,000+ ingredient safety assessments powering theformulator.ai

Every formulation generated on theformulator.ai includes per-ingredient safety scores across six hazard axes. This page explains how those scores are calculated, what data sources feed them, and why we built a proprietary system instead of licensing an existing one.


Why existing safety databases fall short for formulators

The two most widely referenced ingredient safety databases — EWG Skin Deep and SkinSafe — were designed for consumers, not formulators. This creates fundamental problems when R&D teams try to use them as data inputs for formulation decisions.

EWG Skin Deep scores approximately 11,500 ingredients. Its methodology applies a data gap penalty: ingredients with limited safety studies receive elevated hazard scores. For a formulator working with novel or niche raw materials, this means an absence of negative evidence gets treated as evidence of harm. The scoring algorithm is opaque — weights and methodology are not published. Exposure context (whether an ingredient appears in a leave-on serum at 2% or a rinse-off shampoo at 0.5%) is not factored into the score. And the database is consumer-facing, with framing that tends toward alarmism rather than risk-based assessment.

SkinSafe takes a different approach, focusing on allergen and irritant avoidance based on Mayo Clinic data. While clinically grounded for contact dermatitis, it does not cover the full hazard spectrum that regulatory affairs teams need — carcinogenicity, reproductive toxicity, and endocrine disruption are outside its scope.

Neither system provides per-market regulatory status. A formulator developing for both EU and China needs to know that an ingredient permitted in Europe may be prohibited or restricted under NMPA — this cross-market view does not exist in consumer-facing databases. Neither system updates automatically from primary regulatory sources. Neither system adjusts scores based on product type or exposure duration.

We needed a scoring system that:

  • Draws from the same primary sources regulatory authorities use
  • Treats absence of data honestly — as a data gap, not a hazard signal
  • Accounts for exposure context (leave-on vs rinse-off)
  • Covers the full hazard spectrum across six independent axes
  • Provides per-market regulatory intelligence alongside safety scores
  • Updates from primary sources automatically, not via periodic manual review

Six axes of hazard assessment

Each ingredient is scored 0–10 on six independent hazard axes. A composite score is calculated, but all six axis scores are always visible to the formulator — because a single number hides critical information.

01

Carcinogenicity

What it measures

Evidence of cancer-causing potential from chronic or repeated exposure.

Primary sources

IARC Monographs (Group 1, 2A, 2B classifications), ECHA CLP Annex VI (H350, H351 hazard statements), NTP Report on Carcinogens.

Scoring approach

IARC Group 1 (known carcinogen) maps to the highest severity range. Group 2A (probable) and 2B (possible) map progressively lower. When multiple authoritative sources provide different classifications for the same ingredient, the highest score is retained — we never downgrade a carcinogenicity assessment. Any ingredient classified as a known carcinogen receives a composite score floor regardless of how clean its other five axes are.

02

Developmental & Reproductive Toxicity (DART)

What it measures

Risk of harm to fertility, fetal development, or lactation.

Primary sources

ECHA CLP Annex VI (H360, H361, H362 hazard statements), SCCS opinions on specific cosmetic ingredients.

Scoring approach

H360 (may damage fertility or the unborn child) maps to the highest severity. H361 (suspected) maps to moderate severity. H362 (may cause harm to breastfed children) is scored based on exposure route relevance to cosmetic use.

03

Sensitization

What it measures

Potential to cause allergic contact dermatitis on repeated exposure.

Primary sources

ECHA CLP (H317 — skin sensitizer classification), NACDG (North American Contact Dermatitis Group) patch test prevalence data, CIR safety assessments.

Scoring approach

The ECHA H317 classification provides a regulatory baseline. NACDG prevalence rates add clinical weight — a sensitizer that causes positive patch test reactions in 5% of patients is scored higher than one affecting 0.3%. Strong sensitizers receive a composite score floor to prevent other clean axes from masking the risk.

04

Systemic Toxicity

What it measures

Potential for organ damage from single or repeated exposure via dermal, oral, or inhalation routes.

Primary sources

ECHA CLP (H370, H371 for single exposure; H372, H373 for repeated exposure), SCCS safety opinions.

Scoring approach

"Causes organ damage" classifications map to high severity. "May cause organ damage" maps lower. Repeated-exposure classifications are scored based on exposure route relevance — an inhalation-only hazard is less relevant for a topical cream than for a spray product.

05

Irritation

What it measures

Potential to cause non-allergic skin or eye irritation on contact.

Primary sources

CIR safety assessments (clinical patch test data), Zein Number dissolution data (for surfactants specifically), ECHA CLP (H315 skin irritation, H319 eye irritation).

Scoring approach

For surfactants, the Zein Number provides a quantitative protein denaturation measure that correlates directly with skin irritation potential — this is more predictive than binary classification. CIR clinical data provides human-relevant irritation thresholds. High irritation scores receive a modest composite floor to ensure highly irritating ingredients are flagged even when all other axes are clean.

06

Endocrine Disruption

What it measures

Potential to interfere with hormonal systems (estrogen, androgen, thyroid, steroidogenesis pathways).

Primary sources

ECHA Endocrine Disruptor assessment list, REACH SVHC (Substances of Very High Concern) identifications with ED concern, EU Community Rolling Action Plan (CoRAP) evaluations.

Scoring approach

Confirmed endocrine disruptors on the ECHA list score highest. Ingredients under active assessment score at moderate severity. SVHC identification with endocrine disruption concern adds weight to the score.

EXTENDED SAFETY AXES

Beyond the composite — product-relevant and environmental safety data backed by REACH IUCLID study values

Supplementary, Product-Category-Specific Axes

These axes are not part of the human safety composite. They surface conditionally — only when relevant to the product being formulated. Ecotoxicity measures environmental impact and is always reported separately.

07

Aquatic / Ecotoxicity

What it measures

Impact on aquatic organisms — algae, daphnia, fish. Bioaccumulation potential and biodegradation rate.

Primary sources

REACH IUCLID registered substance dossiers. EC50 (algae growth inhibition), LC50 (daphnia and fish acute toxicity), BCF (bioconcentration factor), biodegradation percentage. 26,800+ dossiers, 14.4M study endpoints.

Scoring approach

EC50 and LC50 values mapped to GHS aquatic toxicity categories. BCF above 500 flags bioaccumulation concern. Biodegradation below 60% in 28 days flags environmental persistence. Shown for rinse-off products and sunscreens. Not part of human safety composite — environmental impact is a separate dimension.

Shown for

Shampoos, cleansers, rinse-off products, sunscreens (reef-safe)

08

Eye Irritation

What it measures

Potential to cause eye irritation or serious eye damage on direct or incidental contact.

Primary sources

REACH IUCLID Draize eye irritation studies (OECD TG 405), registered substance dossier data.

Scoring approach

Draize eye scores mapped to GHS eye damage categories. Scores reflect severity and reversibility of observed effects. Only shown when the product is applied near the eye area, where incidental contact is expected.

Shown for

Eye creams, eye serums, mascaras, eyeliners

09

Skin Irritation (REACH)

What it measures

Potential to cause skin irritation from prolonged contact. Distinct from the core Irritation axis which incorporates CIR assessments and Zein Number data.

Primary sources

REACH IUCLID Draize skin irritation studies — Primary Irritation Index (PII) from OECD TG 404 studies on registered substances.

Scoring approach

Draize PII mapped to severity bands. PII below 0.5 is negligible, 0.5 to 2.3 is mild, above 2.3 is classified irritant. Provides a second, independent irritation signal from in vivo regulatory studies rather than panel assessments.

Shown for

Leave-on products, baby and children formulations

10

Oral Toxicity

What it measures

Acute toxicity risk from ingestion — relevant for products applied to or near the mouth where incidental ingestion occurs.

Primary sources

REACH IUCLID LD50 oral values from OECD TG 401 and TG 423 acute oral toxicity studies.

Scoring approach

LD50 values mapped to GHS acute toxicity categories. Category 1 (LD50 at or below 5 mg/kg) is highest concern. Category 5 (LD50 between 2,000 and 5,000 mg/kg) is low concern. Above 5,000 mg/kg is not classified.

Shown for

Lip products (lipstick, lip balm, lip gloss), oral care (toothpaste, mouthwash)

11

Genotoxicity

What it measures

Potential to cause DNA damage — gene mutations, chromosomal aberrations, or DNA strand breaks.

Primary sources

REACH IUCLID in vitro studies (Ames test, chromosomal aberration assay) and in vivo studies (micronucleus test, comet assay) from registered substance dossiers. Only Klimisch reliability 1-2 studies used.

Scoring approach

Scored based on proportion of positive results across available studies. In vivo positive results carry greater weight than in vitro positives. A positive in vivo result with a negative in vitro is more concerning than the reverse. Requires positive rate above 15% to score — prevents isolated false positives from triggering flags.

Shown for

Baby products, children's formulations

12

Phototoxicity

What it measures

Potential to cause adverse skin reactions when the ingredient is exposed to UV light — relevant for UV-absorbing compounds.

Primary sources

REACH IUCLID phototoxicity studies, SCCS scientific opinions on UV filters.

Scoring approach

Based on phototoxicity study results from registered substance dossiers and SCCS evaluations. Scored when data is available for UV-absorbing ingredients.

Shown for

Sunscreens, SPF products, UV-protective formulations


Why multi-axis scoring matters

A single composite score can mask critical safety signals. Two ingredients can land at the same composite value with completely different underlying hazard profiles. Our system always shows both the composite and the individual axis breakdown — so the formulator decides what matters for their product.

Same score, different story

Benzophenone-3 (Oxybenzone)

Composite 1.50·GREEN·HIGH confidence

234246810CarcinogenicityReproductive /DARTSensitizationSystemicToxicityIrritationEndocrineDisruption

Phenoxyethanol

Composite 1.50·GREEN·HIGH confidence

274246810CarcinogenicityReproductive /DARTSensitizationSystemicToxicityIrritationEndocrineDisruption

Benzophenone-3 and Phenoxyethanol both carry a composite score of 1.50 — both in the GREEN band. But the underlying hazard profiles are entirely different.

Benzophenone-3 shows signals across carcinogenicity (2), developmental toxicity (3), and endocrine disruption (4) — a broad, low-level concern pattern driven primarily by its endocrine activity.

Phenoxyethanol concentrates its risk in irritation (7) and endocrine disruption (4), with a minor DART signal (2). For a product applied near the eyes or on compromised skin, this irritation spike matters.

A single composite score treats these as identical. Our 6-axis system shows they are not.

Green doesn't mean zero concern

Salicylic Acid

Composite 1.90·GREEN·HIGH confidence

474246810CarcinogenicityReproductive /DARTSensitizationSystemicToxicityIrritationEndocrineDisruption

Salicylic Acid scores 1.90 — comfortably in the GREEN band. But the radar chart reveals three elevated axes: irritation at 7 (classified H318 — causes serious eye damage), DART at 4 (developmental/reproductive toxicity signal), and endocrine disruption at 4.

For a general-purpose exfoliant in a rinse-off cleanser, this profile is acceptable. For a leave-on product targeted at pregnant consumers, the DART signal at 4 is information a formulator needs to see — and a composite score of 1.90 would never surface it.

We don't make the decision. We surface the data. The formulator decides.

Data-driven ingredient substitution

Propylene Glycol

Composite 0.80·GREEN·HIGH confidence

24246810CarcinogenicityReproductive /DARTSensitizationSystemicToxicityIrritationEndocrineDisruption

Propanediol

Composite 0.40·GREEN·HIGH confidence

2246810CarcinogenicityReproductive /DARTSensitizationSystemicToxicityIrritationEndocrineDisruption

Propylene Glycol and Propanediol are functionally interchangeable humectants and solvent carriers. Both are GREEN. But Propylene Glycol carries an irritation score of 4 — a moderate signal from ECHA CLP classification — while Propanediol shows no irritation concern.

For sensitive skin formulations, this difference matters. Our system makes it visible so the substitution decision is informed by data, not marketing.

Same ingredient, different context

Static safety databases show you the same score regardless of what you're making. A preservative scores identically whether it goes into a rinse-off shampoo or a leave-on eye cream — but the risks are not identical. Our system adapts which safety axes it surfaces based on the product being formulated.

Rinse-off Shampoo

Phenoxyethanol

1.50·GREEN·HIGH confidence

224744246810CarcinogenicityDARTSensitizationSystemicToxicityIrritationEndocrineEcotoxCore axisContext axis

In a rinse-off product, the ecotoxicity axis activates — Phenoxyethanol's aquatic impact data from REACH studies is surfaced so the formulator can assess environmental profile alongside human safety.

Eye Cream

Phenoxyethanol

1.50·GREEN·HIGH confidence

224745246810CarcinogenicityDARTSensitizationSystemicToxicityIrritationEndocrineEyeIrritationCore axisContext axis

For eye-area application, the eye irritation axis activates. Phenoxyethanol's Draize eye irritation score from REACH OECD TG 405 studies shows moderate irritation potential — critical information for products applied near the eye where incidental contact is expected.

Baby Body Lotion

Phenoxyethanol

1.50·GREEN·HIGH confidence

2247434246810CarcinogenicityDARTSensitizationSystemicToxicityIrritationEndocrineSkinIrritationREACHGenotoxEcotoxCore axisContext axis

Baby products trigger the most comprehensive safety view. Skin irritation, genotoxicity, and ecotoxicity all surface. For Phenoxyethanol, the genotoxicity score of 0 provides reassurance, while the irritation score of 7 on the core axis remains the primary concern — exactly the kind of nuance a single composite score would hide.

The composite score for Phenoxyethanol is 1.50 in all three contexts — that doesn't change. What changes is the supplementary intelligence the formulator sees. This is what context-aware safety scoring means: not a different number, but a more complete picture when the product demands it.

Scores derived from 13 authoritative sources including ECHA CLP Annex VI, IARC/NTP carcinogen classifications, CIR safety assessments, ESSCA clinical patch test data, and EPA CompTox predictions, REACH IUCLID registered substance dossiers (26,800+ dossiers, 14.4M study endpoints). Full methodology and source hierarchy published on this page.


Five principles that guide our scoring

  1. 1

    No data gap penalty

    If an ingredient has no studies addressing a particular hazard axis, that axis scores 0 — not an elevated score. Absence of evidence is not evidence of harm. This is the single largest methodological difference between our system and EWG Skin Deep, which inflates scores for under-studied ingredients. A novel botanical extract with limited toxicology literature should not receive a higher hazard score than a well-studied petrochemical with confirmed clean safety data.

  2. 2

    Exposure context matters

    The same ingredient at the same concentration poses different risks in a leave-on facial serum (hours of skin contact) versus a rinse-off shampoo (30 seconds of contact). Our scoring applies an exposure modifier based on product type. This is standard practice in regulatory toxicology — SCCS and CIR both evaluate safety in the context of exposure — but is absent from consumer-facing databases.

  3. 3

    Source hierarchy

    Not all safety data is equal. Peer-reviewed regulatory assessments (ECHA CLP, SCCS opinions, CIR final reports) carry more weight than preliminary findings or supplier-provided data. Our system enforces a clear hierarchy: regulatory authority classifications first, then peer-reviewed clinical data, then curated literature. Scores from higher-tier sources are never downgraded by lower-tier data.

  4. 4

    Confidence tiering

    Every safety score carries a confidence level: high (direct ECHA, CIR, or IARC classification), medium (extracted from peer-reviewed literature or CIR group assessments), or low (baseline from COSING registration with no hazard evidence found). Only medium and high confidence scores are displayed to users. Low confidence ingredients show "No hazard data available in our sources" rather than a green score that implies safety has been confirmed.

  5. 5

    Conservative on carcinogenicity

    Carcinogenicity is treated asymmetrically. A known carcinogen classified by IARC or NTP cannot have its composite score pulled below a floor by clean scores on other axes. When multiple sources provide different carcinogenicity classifications, the highest (most conservative) score is retained. This reflects the irreversible nature of carcinogenic harm compared to reversible irritation.


Where our data comes from

SourceWhat we extractCoverage
ECHA CLP Annex VIH-statement hazard classifications mapped to 6 axes3,831 cosmetic-relevant ingredients
CIR (Cosmetic Ingredient Review)Safety assessment conclusions, clinical patch test data2,654 assessment reports
IARC MonographsCarcinogen group classifications (1, 2A, 2B, 3)811 substances classified
NTP Report on CarcinogensKnown and reasonably anticipated carcinogens256 substances
NACDG Patch Test DataSensitization prevalence rates (% positive reactions)Top allergens with prevalence data
SCCS Scientific OpinionsEU-specific safety evaluations for cosmetic ingredientsReferenced per ingredient
PubMed / PMCHazard signals extracted from peer-reviewed literature19,000+ papers mined
REACH SVHC ListSubstances of very high concern (ED, CMR, PBT flags)213 SVHC-flagged ingredients
REACH IUCLID DossiersLD50, EC50, LC50, Draize PII, LLNA SI, BCF, biodegradation, genotoxicity results from registered substance dossiers26,800+ dossiers, 14.4M study endpoints, 3,242 matched INCIs
ESSCA Clinical Patch Test DataContact allergen prevalence rates from the European Surveillance System on Contact AllergiesTop 30 allergens with clinical prevalence data
EPA CompTox / ToxCastIn vitro high-throughput screening predictions for reproductive toxicity, carcinogenicity, and systemic effects2,099 cosmetic-relevant ingredients
PubChem GHS ClassificationsGHS hazard classifications (H-statements) mapped to 6 axes via PubChem API~520 ingredients enriched
COSING (EU)INCI identity, Annex II–VI regulatory statusBaseline identity for all EU-listed ingredients

All extraction pipelines run automatically. Regulatory source pages across 16 markets are monitored daily for changes. When a regulatory authority updates an ingredient classification or restriction, the change is detected via content hashing, logged as a structured event, and propagated to affected safety scores without manual intervention.


Safety scores are only half the picture

An ingredient can score well on all six hazard axes and still be prohibited in your target market. Formaldehyde releasers are a clear example — permitted with concentration limits in some markets, banned outright in others. Safety scoring without market-specific regulatory status is incomplete.

Every ingredient in our system carries per-market regulatory status across 16 markets: EU, US (FDA), China (NMPA IECIC), Japan (MHLW), South Korea (MFDS), India (BIS IS 4707), Canada, Australia (TGA), Brazil (ANVISA), Thailand, Malaysia (NPRA), Singapore (HSA), Indonesia, Vietnam, Philippines, and ASEAN harmonised standards. Status categories include: permitted, restricted (with maximum concentrations, required labelling, and product-type limitations), prohibited, and not listed.


What we deliberately exclude

No certification claims. We do not recommend or imply COSMOS, Ecocert, Vegan, Halal, or any other certification status. Certifications are granted to finished products by certifying bodies based on the full supply chain — they cannot be determined from an ingredient's INCI name alone. Any platform that tells you "this ingredient is COSMOS-approved" based solely on identity data is misleading you.

No trade names. Every ingredient is identified by its INCI name only. We do not surface supplier trade names, branded ingredient names, or proprietary blend names in any output. This ensures supplier neutrality — our recommendations are based on chemistry, not commercial relationships.

No EWG or SkinSafe data. We do not use, reference, or derive scores from EWG Skin Deep or SkinSafe databases. Our scoring is built entirely from primary regulatory and scientific sources. This is a deliberate architectural decision, not an oversight.

No "clean beauty" judgments. We provide hazard data and regulatory status. We do not label ingredients as "clean," "toxic," "natural," or "synthetic" — these are marketing categories, not scientific ones. Formulators make their own informed decisions based on the data we provide.


This methodology is not static. As regulatory authorities update classifications, as new clinical data enters the peer-reviewed literature, and as our automated pipelines expand coverage, scores are refined. We publish this methodology because transparency is a prerequisite for trust — and trust is what separates a tool formulators rely on from one they dismiss.

Last updated: May 2026