AI Transparency

Our AI Principles

Four commitments that govern how AI works in IT Folder.

No Model Training

Your data is never used to train, fine-tune, or improve any AI model — ours or any provider's. Ever.

PII Redaction

Sensitive data — SSNs, credit cards, API keys, credentials — is automatically detected and redacted before reaching any AI provider.

Admin Control

AI features are opt-in and can be disabled at any time at the organization level. No hidden processing, no surprises.

Human in the Loop

AI assists — it doesn't decide. Every output is reviewed by a human before it matters. No automated decisions are made about people.

What Our AI Features Do

Each feature processes only the data it needs, and only when you use it.

Documentation Assistance

Generates summaries, templates, and content suggestions from your inputs and existing docs. Sends the prompt text to your configured AI provider.

Semantic Search

Finds relevant results across your documentation using vector embeddings. Search queries may be sent to OpenAI for embedding; results come from your tenant database only.

Text-to-SQL Agent

Translates natural-language questions into database queries using Google Gemini. Only your question and schema metadata are sent — never row-level data. Sensitive columns are masked in results.

Embedding Generation

Creates vector representations of your documents for search. Content is cleaned, chunked, and PII-scanned before being sent to OpenAI or processed locally.

Image & Photo Analysis

Generates descriptions and searchable metadata for uploaded images using CLIP (runs locally, no external calls) and optional AI captioning.

Document Summarization

Creates concise summaries of long documentation using DistilBART (runs locally) or your configured AI provider for longer content.

AI Providers

IT Folder integrates with the following AI providers. Your organization controls which are active.

Provider	Used For	Key Type	Data Sent
OpenAI (GPT-5.2, text-embedding-3-small)	Documentation assistance, embeddings, search, Q&A	Your org's API key	Prompts, document chunks (PII-redacted)
Anthropic (Claude)	Documentation assistance, content generation	Your org's API key	Prompts (PII-redacted)
Google (Gemini 2.5 Flash)	Text-to-SQL agent (natural-language queries)	IT Folder-managed	Natural-language questions + schema metadata only
SentenceTransformers (all-MiniLM-L6-v2)	Local text embeddings (fallback)	Runs locally	Nothing sent externally
CLIP (clip-ViT-B-32)	Image search and visual embeddings	Runs locally	Nothing sent externally
DistilBART (distilbart-cnn-12-6)	Document summarization	Runs locally	Nothing sent externally

For the full list of sub-processors, see our Sub-Processors page.

Privacy Safeguards

Technical controls that protect your data at every step of the AI pipeline.

PII Scanner

Presidio-based detection runs on all content before it reaches any external AI provider. Detects and redacts:

Social Security numbers
Credit card numbers
Email addresses & phone numbers
API keys & secrets (OpenAI, AWS, GitHub, Google)
Bank account & IBAN numbers

Tenant Isolation

Every organization gets its own PostgreSQL database. AI processing is scoped to your tenant — no data mixing, no cross-tenant access.

Separate database per tenant
Organization-scoped API keys
Embeddings stored in your tenant DB only
Connection pooling with per-tenant limits

Encryption

Your API keys and data are protected at every layer:

API keys encrypted with AES-256-GCM via AWS KMS
All AI traffic over HTTPS/TLS
PostgreSQL connections encrypted
Logging fields PII-redacted before DB write

Query Safeguards

The text-to-SQL agent has defense-in-depth protections:

SQLGlot AST validation (SELECT-only whitelist)
Read-only transactions with statement timeout
SQL injection pattern rejection
Sensitive column masking in results

Audit Trail

All AI activity is logged for compliance and troubleshooting:

Token usage tracked per request
Agent queries logged with full audit trail
Log content PII-redacted automatically
Cached responses expire after 30 days

Admin Controls

Organization administrators have full control:

Enable or disable AI features entirely
Manage provider API keys
Choose which AI provider to use
No per-user consent override — org-level only

What Happens to Your Data

What we send to AI providers

User prompts and questions (PII-redacted)
Document chunks for embedding (PII-redacted, cleaned of images/scripts/base64)
Natural-language questions for query generation (schema metadata only, no row data)

What we never send

Raw passwords, SSNs, credit card numbers, or API keys (auto-redacted)
Data from other tenants
Data from organizations that have disabled AI features
Your data for model training or fine-tuning

What stays local

CLIP image embeddings (processed on our infrastructure)
SentenceTransformers text embeddings (fallback, runs locally)
DistilBART summarization (runs locally)
All vector data stored in your tenant database

Retention

Cached AI responses expire after 30 days
Embeddings persist until the source document is deleted
AI usage logs are retained per your organization's data retention policy
Third-party providers process data for inference only — refer to their API terms for transient processing details