Skip to main content

File Processing

Learn how Glossa processes different file types to extract requirements, create citations, and generate acceptance criteria.

Written by Ali
Updated over a month ago

Overview

When you upload a file to Glossa, it goes through an automated processing pipeline that analyzes the content, identifies requirements, and creates detailed citations linking requirements back to specific sections or timestamps in the source file.

Understanding how file processing works helps you upload the right files, troubleshoot processing issues, and get the best results from Glossa's AI-powered requirements generation.

Processing Pipeline

Step 1: Upload and Validation

When you upload a file:

  1. File is validated for type and size

  2. File is encrypted and stored securely

  3. File status changes to Pending

Step 2: Queue and Processing

Once uploaded:

  1. File enters the processing queue

  2. Status changes to Processing

  3. Glossa's AI analyzes the file content

  4. Requirements are extracted and generated

  5. Citations are created linking to source content

Step 3: Completion

After processing:

  1. Status changes to Ready

  2. Generated requirements appear in Requirements tab

  3. File is available for preview with citation highlights

  4. "Files currently processing" counter decreases

Processing Time

Processing time varies based on several factors:

File Size

  • Small files (< 5 MB): 2-4 minutes

  • Medium files (5-50 MB): 4-8 minutes

  • Large files (50-500 MB): 8-15 minutes

  • Very large files (500 MB - 1 TB): 15-30+ minutes

File Type

  • Text documents (PDF, Word): Fastest (2-6 minutes)

  • Spreadsheets (Excel): Fast (3-7 minutes)

  • Images with OCR: Moderate (5-10 minutes)

  • Audio files: Slower (transcription required, 5-15 minutes)

  • Video files: Slowest (video + audio processing, 10-30 minutes)

Content Complexity

  • Simple, structured documents: Process faster

  • Complex formatting: May take longer

System Load

  • Off-peak times: Faster processing

  • Peak usage times: May be slower

  • Multiple large files uploading: Process sequentially

How Different File Types Are Processed

Documents (PDF, Word, PowerPoint)

Processing method:

  • Text extraction from document structure

  • Maintains formatting and section organization

  • Identifies headings, lists, tables

Requirements extraction:

  • Analyzes paragraphs for requirement-like content

  • Identifies feature requests, business rules, user needs

  • Captures process descriptions and workflows

Citations created:

  • Highlight specific paragraphs or sections

  • Show relevant portions in yellow highlighting

  • Link to exact page and section in document

Best results:

  • Well-structured documents with clear headings

Spreadsheets (Excel, CSV)

Processing method:

  • Reads cell data, rows, and columns

  • Understands table structure and headers

  • Processes formulas and data relationships

Requirements extraction:

  • Identifies requirement lists in rows

  • Extracts business rules from data patterns

  • Captures field definitions and validation rules

Citations created:

  • Reference specific cells or ranges

  • Show relevant rows and columns

  • Maintain table context

Best results:

  • Data dictionaries and field specifications

  • Business rules documented in spreadsheets

Audio Files (MP3, WAV, M4A)

Processing method:

  • Transcribes audio to text using AI

  • Identifies speaker changes (when possible)

  • Timestamps all content

Requirements extraction:

  • Analyzes transcription for requirements

  • Identifies feature discussions

  • Captures decisions and action items

Citations created:

  • Include start and end timestamps

  • Link to specific moments in audio

  • Play from cited timestamp when clicked

Best results:

  • Clear audio quality (minimal background noise)

  • Single speaker or distinct speakers

  • Structured discussions (meetings, interviews)

Video Files (MP4, MOV, AVI)

Processing method:

  • Transcribes audio track

  • Timestamps all content

Requirements extraction:

  • Analyzes audio transcription

Citations created:

  • Include start and end timestamps

  • Link to specific moments in video

  • Play from cited timestamp when clicked

Best results:

  • Clear audio (use microphone, not camera audio)

  • Structured meetings or demos

Images (JPG, PNG, TIFF)

Processing method:

  • OCR (Optical Character Recognition) extracts text

  • Analyzes image structure and layout

  • Identifies text regions

Requirements extraction:

  • Analyzes OCR text for requirements

  • Processes whiteboard photos

  • Reads handwritten notes (if legible)

Citations created:

  • Show full image with extracted text highlighted

  • Reference regions where text was found

Best results:

  • High-resolution images

  • Clear, readable text (typed or printed)

  • Well-lit photos of whiteboards

  • Scanned documents with good contrast

Requirements Generation

How AI Identifies Requirements

Glossa's AI looks for:

  • Feature requests: "The system should...", "Users need to..."

  • Business rules: "When X happens, do Y"

  • User needs: "As a user, I want..."

  • Process steps: "First..., then..., finally..."

  • Constraints: "Must support...", "Cannot exceed..."

  • Decisions: "We decided to...", "The client wants..."

Requirements Granularity

The level of detail in generated requirements is controlled by the Requirement Detail & Visibility Level in Project Settings:

Broad Strokes:

  • Fewer, high-level requirements

  • Each requirement covers broader scope

  • Good for sales discovery, initial scoping

Balanced:

  • Moderate number of requirements

  • Good level of detail for most projects

  • Default setting, recommended for discovery

Granular:

  • Many detailed requirements

  • Each requirement is very specific

  • Good for technical implementation planning

Important: Detail level for the project is captured at the time of upload. Files processed with "Balanced" cannot be retroactively changed to "Granular"—you must delete and re-upload.

Quality Scoring

Each generated requirement receives an AI quality score:

  • Assesses clarity, completeness, testability

  • Identifies vague or ambiguous requirements

  • Suggests improvements

See the AI Review article for details.

Citation Creation

Document Citations

For text-based files:

  • Paragraph-level precision: Highlights specific paragraphs

  • Yellow highlighting: Shows exact content that supports requirement

  • Multiple paragraphs: Citations can span multiple paragraphs if needed

  • Preview opens to citation: Clicking citation jumps to highlighted section

Audio/Video Citations

For recordings:

  • Timestamp ranges: Shows start and end time (e.g., 15:30 - 17:45)

  • Playback from timestamp: Clicking citation jumps to that moment

  • Transcript excerpts: Shows what was said at that timestamp

  • Speaker identification: May show who said it (when detectable)

Multiple Citations

One requirement can have citations from multiple sources:

  • Mentioned in a meeting AND an email

  • Discussed in multiple meetings

  • Referenced in multiple documents

  • All citations appear in the Reference Data tab

Processing Status and Indicators

Status Values

Pending:

  • File uploaded successfully

  • Waiting in queue to begin processing

  • Usually brief (< 1 minute)

Processing:

  • AI is actively analyzing the file

  • Requirements being generated

  • Citations being created

  • Duration varies by file (see Processing Time section)

Ready:

  • Processing complete

  • Requirements have been generated

  • File is fully searchable and citable

Error:

  • Processing failed

  • Detailed error message explains why

  • File needs attention (see Troubleshooting section)

Processing Counter

At the bottom of the Files tab:

Files currently processing: [number]

This shows:

  • How many files are actively being processed

  • Updates in real-time as files complete

  • When the indicator is not displayed, all processing is complete

What's NOT Processed

Glossa does NOT generate requirements from:

Administrative content:

  • Meeting scheduling emails

  • Calendar invites

  • Out-of-office messages

  • Pleasantries and greetings

Irrelevant content:

  • Invoices and receipts

  • Blank templates

  • Marketing materials (unless about requirements)

  • Legal boilerplate

Low-quality input:

  • Severely corrupted files

  • Completely illegible handwriting

  • Unintelligible audio

  • Files with no extractable content

If a file contains no requirements, processing will complete successfully but no requirements will be generated.

Troubleshooting

File Stuck in Pending

If a file stays "Pending" for more than 5 minutes:

Possible causes:

  • High system load

  • Processing queue backed up

  • File validation issue

Solutions:

  1. Wait 10-15 minutes and refresh

  2. Check "Files currently processing" counter

  3. If still pending after 30 minutes, contact support

File Stuck in Processing

If a file stays "Processing" longer than expected:

Possible causes:

  • Very large file (1+ hour video)

  • Complex file structure

  • Processing in progress (may just need more time)

Solutions:

  1. Check file size and type—large videos can take 30+ minutes

  2. Wait for processing to complete (don't interrupt)

  3. Refresh page after 20-30 minutes to check status

  4. If processing for multiple hours without progress, contact support

Error Status with Detailed Message

When a file shows Error status, Glossa displays a specific error message explaining what went wrong.

Common error types:

1. "File is corrupted or unreadable"

  • File is damaged

  • File format is invalid

  • File has internal errors

Solutions:

  • Try opening file on your computer

  • Re-export from source application

  • Use a different file format

  • Try a different copy of the file

2. "File is password-protected"

  • Document requires password to open

  • Cannot extract content from encrypted file

Solutions:

  • Remove password protection

  • Export unprotected version

  • Provide decrypted copy

3. "File type not supported"

  • File extension doesn't match supported types

  • File is actually a different format than extension suggests

Solutions:

  • Convert to supported format (PDF, Word, MP4, MP3, etc.)

  • Check file extension matches actual file type

  • See Uploading Files for supported types

4. "Processing timeout"

  • File is too large or complex

  • Processing took longer than maximum allowed time

Solutions:

  • Split large file into smaller chunks

  • Reduce file size (compress video, reduce resolution)

  • Simplify file structure

  • Contact support for assistance with very large files

No Requirements Generated

If processing completes but no requirements appear:

This is normal when:

  • File contains no actionable requirements

  • Content is too vague or general

  • File is a template or example

  • Content is purely administrative

What to check:

  1. Open the file preview in Glossa

  2. Review content—does it actually describe requirements?

  3. Check if content is too high-level

  4. Verify Requirements Granularity Dial setting

If requirements should exist:

  1. Try uploading a more detailed version

  2. Consider manually creating requirements and adding file as citation

  3. Check granularity setting and adjust if needed

  4. Contact support with file details

Low-Quality Results

If generated requirements are poor quality:

Possible causes:

  • Source content is vague or incomplete

  • Audio/video quality is poor

  • Document is poorly structured

  • Wrong granularity setting

Solutions:

  1. Upload more detailed source material

  2. For meetings, ensure clear audio and structured discussions

  3. For documents, use clear headings and structure

  4. Adjust Requirements Detail & Visibility Level

  5. Review and edit requirements manually

  6. Use AI Review to identify improvements

Citations Not Working

If citations don't link correctly:

Possible causes:

  • File was deleted after requirement creation

  • Citation link is broken

  • Preview not loading

Solutions:

  1. Verify file still exists in Files tab

  2. Try clicking citation again

  3. Refresh the page

  4. If file was deleted, re-upload it

  5. Contact support if citations consistently fail

Best Practices

Optimize File Quality

For documents:

  • Use clear headings and structure

  • Write in complete sentences

  • Use bullet points for lists

  • Include requirement keywords ("must", "should", "will")

For audio/video:

  • Use good quality microphone

  • Minimize background noise

  • Speak clearly and at moderate pace

  • Record in quiet environment

For images:

  • Use high resolution (300+ DPI for scans)

  • Ensure good lighting and contrast

  • Keep text horizontal and readable

  • Avoid shadows and glare

Upload Structured Content

Files with clear structure process better:

  • Meeting agendas with discussion topics

  • Documents with numbered requirements

  • Spreadsheets with requirement tables

  • Organized notes with headings

Set Granularity Before Upload

  1. Decide what level of detail you need

  2. Set Requirements Granularity Dial

  3. Upload files

  4. Avoid needing to delete and re-upload

Monitor Processing

After uploading:

  1. Check "Files currently processing" counter

  2. Refresh periodically for large files

  3. Review generated requirements promptly

  4. Address any errors immediately

Handle Errors Quickly

When files show Error status:

  1. Read the detailed error message

  2. Fix the underlying issue

  3. Delete the error file

  4. Re-upload corrected version

Don't leave error files in your project—they clutter the Files tab and can be confusing.

Performance Tips

Batch Similar Files

Upload similar files together:

  • All meeting recordings from one week

  • All discovery documents for one feature

  • Related files process more efficiently

Avoid Very Large Batches

Instead of uploading 50 files at once:

  • Upload 10-15 at a time

  • Let batch complete before uploading more

  • Easier to monitor and troubleshoot

Use Appropriate File Formats

Choose the best format for your content:

  • PDFs for finalized documents

  • Word for editable documents

  • MP3 for audio-only (smaller than video)

  • MP4 for video with screen shares

  • Images for whiteboards and handwritten notes

Compress When Possible

For large files:

  • Compress videos before uploading

  • Reduce image resolution if very high

  • Remove unnecessary content from documents

  • Balance quality vs. file size

Security & Privacy

Processing Location

  • Files are processed using AI services

  • Processing follows Glossa's Data Processing Agreement (see glossapro.ai/dpa)

  • Content is encrypted in transit and at rest

Data Retention

  • Processed files stored permanently in project

  • AI processing data not retained after requirements generation

  • Files deleted within 60 days of account termination

Compliance

  • See glossapro.ai/dpa for Data Processing Agreement

  • See glossapro.ai/security-policy for security details

  • GDPR and CCPA compliant

Did this answer your question?