Menu

Menu

  1. Oct 9, 2025

    Structured vs. Unstructured Data in Commercial Real Estate Underwriting

Every commercial real estate transaction generates data. Rent rolls arrive as Excel files. Leases come as scanned PDFs. Operating statements appear in inconsistent formats across sellers. Photos, site plans, environmental reports, and offering memorandums pile into data rooms alongside executed amendments and tenant correspondence. The underwriting challenge is not a shortage of information. It is transforming that information into something you can actually analyze.

This is where the distinction between structured and unstructured data becomes critical. Understanding what each type is, how they differ, and what it takes to move from one to the other determines how efficiently and accurately you can underwrite a deal. It also explains why AI-powered document processing has become essential for CRE teams handling volume.

What Is Structured Data?

Structured data is information organized into a predefined format with consistent fields, labels, and relationships. It lives in rows and columns. Each data point has a designated place, a known type (text, number, date), and a predictable relationship to other data points.

In commercial real estate underwriting, structured data includes:

  • Rent roll exports with standardized columns for tenant name, suite number, square footage, lease start, lease end, base rent, and expense reimbursements

  • Property management system outputs with consistent field definitions

  • Database records from CMBS servicers or loan tracking systems

  • Standardized financial templates where line items map to defined categories

The defining characteristic of structured data is machine readability. A computer can parse it without interpretation. If you need the total base rent for a property, you sum a column. If you need all leases expiring within 24 months, you filter by date. Queries are deterministic, and validation is straightforward (does the sum of rows equal the stated total?).

What Is Unstructured Data?

Unstructured data lacks a predefined schema. The information exists, but its location, format, and labeling vary from document to document. Extracting meaning requires interpretation, not just parsing.

In commercial real estate, unstructured data dominates deal documents:

  • Lease agreements where the base rent might appear on page 3 in one lease and page 7 in another, labeled "Base Rent," "Minimum Rent," or "Annual Fixed Rent" depending on the drafter

  • Offering memorandums with narrative descriptions, embedded tables, and footnoted assumptions

  • Operating statements in PDF format where column headers shift, line items are inconsistently named, and subtotals appear in unpredictable locations

  • Scanned documents with handwritten amendments, stamps, and signatures

  • Property condition reports, environmental assessments, and appraisal narratives

Unstructured data requires context to interpret. A human reader understands that "NNN" in one lease means the same thing as "Triple Net" in another. A machine does not, unless trained or instructed.

Why the Distinction Matters for Underwriting

The gap between structured and unstructured data creates three practical challenges for CRE underwriting teams.

1. Processing speed. Structured data can be ingested, validated, and analyzed in seconds. Unstructured data requires extraction, which historically meant manual re-keying. A single lease might take 20 to 45 minutes to abstract by hand. A 50-tenant property with amendments and renewals can consume days of analyst time before underwriting even begins.

2. Error rates. Manual extraction from unstructured sources introduces transcription errors, inconsistent formatting, and missed fields. These errors propagate into underwriting models, distorting NOI projections, valuation outputs, and risk assessments. Structured data, by contrast, arrives in a format that resists (though does not eliminate) human error.

3. Cross-document validation. Underwriting requires reconciling data across multiple sources. Does the rent roll match the executed leases? Does the T-12 align with the operating budget? When data is structured, these comparisons are automated. When data is unstructured, each validation requires a human to locate, interpret, and compare values manually.

The Conversion Challenge

Effective underwriting depends on converting unstructured data into structured formats. This conversion process, often called document abstraction or data extraction, is where AI has made the most significant impact on CRE workflows.

The conversion workflow involves several steps:

  1. Document classification. Identify what type of document you are processing (lease, amendment, rent roll, T-12) so the system knows what fields to extract.

  2. Field extraction. Locate and capture specific data points: tenant names, rent amounts, dates, square footage, expense categories. This is where unstructured formats create difficulty, since the same information appears in different locations and formats across documents.

  3. Normalization. Standardize extracted values so they are comparable. "ABC Corp," "ABC Corporation," and "A.B.C. Corp." should resolve to the same entity. Dates should follow a consistent format. Rent figures should carry explicit units (monthly vs. annual, per SF vs. absolute).

  4. Validation. Check extracted values against internal logic (do the rows sum to the total?) and cross-document consistency (does the lease rent match the rent roll?).

  5. Output to structured format. Deliver the validated data in a schema that underwriting models can consume: standardized tables, database records, or API-ready JSON.

AI accelerates steps 2 through 4 by recognizing patterns across document types, extracting values with confidence scores, and flagging discrepancies for human review. The result is structured data generated in minutes rather than hours, with error rates that decrease as models learn from corrections.

Common Data Quality Issues by Type

Structured and unstructured data each carry distinct quality risks.

Structured data risks:

Issue

Example

Impact

Stale data

Rent roll exported three months before closing

Underwriting against outdated tenant roster

Formula errors

Broken Excel formula in subtotal row

Incorrect occupancy or rent totals

Mislabeled fields

"Base Rent" column actually contains gross rent

Overstated revenue if expenses are double-counted

Truncated exports

System export cuts off tenants beyond row 500

Missing tenants in large portfolios

Unstructured data risks:

Issue

Example

Impact

Ambiguous terminology

Lease says "rent" without specifying base vs. gross

Incorrect expense assumptions

Superseded terms

Amendment modifies rent, but original lease is abstracted

Underwriting against outdated terms

Missed footnotes

Rent abatement noted in footnote, not body text

Overstated Year 1 revenue

Poor scan quality

Handwritten amendment illegible in PDF

Missing material lease terms

Understanding these failure modes helps underwriting teams design validation checkpoints appropriate to each data source.

Practical Implications for CRE Teams

The structured/unstructured distinction has direct implications for how underwriting teams should organize their workflows.

Prioritize structured sources when available. If the seller provides both a PDF rent roll and an Excel export, start with the Excel. It reduces extraction effort and error risk. Use the PDF as a validation reference, not the primary source.

Invest in conversion infrastructure. The bottleneck in most underwriting workflows is the conversion from unstructured to structured data. AI-powered extraction tools, standardized abstraction templates, and quality control processes for validating conversions yield compounding returns across deals.

Define your target schema. Before extracting data, define what fields you need, in what format, and at what level of granularity. A clear target schema prevents extraction efforts from capturing information that is irrelevant to underwriting while missing fields that are essential.

Build validation into the workflow. Structured data is only as reliable as its source. Cross-reference extracted values against multiple documents, flag conflicts for resolution, and maintain audit trails showing where each figure originated.

Conclusion

The distinction between structured and unstructured data is foundational to understanding why CRE underwriting has historically been slow and error-prone, and why AI is changing that equation. Unstructured documents contain the information underwriters need, but in formats that resist efficient analysis. Structured data enables the speed and accuracy that modern deal velocity demands. The conversion process between the two is where technology, workflow design, and human expertise intersect. Teams that master this conversion gain a durable advantage: faster underwriting cycles, fewer errors, and data they can defend when investors or lenders ask hard questions.

Request a Free Trial

See how Eagle Eye brings clarity, accuracy, and trust to deal documents.

Request a Free Trial

See how Eagle Eye brings clarity, accuracy, and trust to deal documents.