An offering memorandum is a marketing document with a financial document buried inside it. The first half is narrative: location summary, demographic charts, tenant profiles, value-add story. The second half is the rent roll, the T-12, and the proforma. A broker spends a month producing the OM. A firm spends ninety seconds deciding whether to read it.
The compression problem at the firm is a structural mismatch between how OMs are written and how OMs are consumed. The OM is dense because the broker has to convince a buyer. The buyer wants the structured fields that drive a screening decision. Most of the dense content gets discarded by the buyer not because it is unimportant but because it is in the wrong format.
A complete OM extract restores the information without forcing a human to read the document. It produces a structured record with every field a screening decision could touch, each field cited back to the page and language in the source OM. The extract is not a summary. It is a translation.
Why Summary Extracts Fail
The default extraction approach is to read the OM and capture the headline fields: address, asset type, units, asking price, cap rate, broker. This produces a record that fits in a row of a spreadsheet. It also discards the information that determines whether the deal is actually a fit.
What Summary Extracts Capture | What They Discard |
|---|---|
Asking price | Underlying assumptions in the proforma |
Headline cap rate | Rent growth assumptions producing it |
Unit count | Unit mix and floor plan distribution |
Year built | Renovation history and capital plan |
In-place NOI | Trailing twelve months of expense detail |
Tenant count | Concentration, credit, and lease maturity ladder |
Business plan label | The actual narrative and operator thesis |
A firm screening on summary extracts is screening on the broker's headline numbers. Those numbers are designed to attract attention, not to support a decision. The deal that fails on the headline cap rate may pass on the actual yield-on-cost after capex. The deal that passes on the headline cap rate may fail when the proforma rent assumptions are pulled apart.
The Field Categories a Complete Extract Covers
A complete extract has to capture every field a buy box could query and every field an underwriter would touch in a first model. The categories below are the minimum for multifamily and commercial deals.
Category | Sub-fields |
|---|---|
Identity | Property name, addresses, parcel IDs, MSA, submarket |
Asset | Type, subtype, year built, year renovated, condition, building count |
Size | Units, NRA, GBA, parking, land area |
Tenancy | Tenant list, leased SF, lease maturities, credit, concentration |
Rent | Current, market, growth assumed, in-place vs. proforma |
Operating | Trailing-12 expenses by line, OpEx ratio, recovery method |
Capex | Renovation history, planned capital, deferred maintenance |
Debt | Existing loan, balance, rate, maturity, assumability, prepay |
Pricing | Asking, whisper, cap rate, price per unit or SF |
Returns | Untrended yield, levered IRR, equity multiple, hold |
Sponsor | Seller, broker, marketing process, timeline |
Narrative | Business plan, value-add story, market thesis |
The exact fields vary by asset class. A retail OM has tenant sales, occupancy cost, and co-tenancy provisions that a multifamily OM does not. An industrial OM has clear height, dock count, and trailer parking. The field standard has to be configurable by asset type without losing the cross-asset comparability that makes the database useful.
Extraction From Narrative, Not Just Tables
The fields that matter most for screening often live in narrative sections, not in tables. A broker explains in prose why the rents are below market, why the property is mismanaged, why the tenant base has more upside than the trailing data suggests. This narrative is the deal thesis. It is also the part most extraction systems ignore.
A complete extract captures the narrative as structured fields, not just as text. The "value-add thesis" field is not the entire two-page section. It is the extracted operator hypothesis: rents 12% below market, units in original condition, planned $8k per unit renovation, target rent premium of $180. The numbers come out of the narrative even when the narrative does not put them in a table.
The same applies to risk factors. A complete extract captures the risks the broker discloses (deferred maintenance, tenant concentration, environmental issue, zoning ambiguity) as discrete fields with the source language attached. The risks the broker does not disclose are a separate problem and require diligence, not extraction. But the disclosed risks should never be lost because they live in prose.
Source Citation Requirements
Every extracted field needs a citation back to the OM. The citation has to identify the page, the section, and the source language that produced the value. This is the same standard that applies to lease abstraction, and it applies here for the same reason: the extract is a record that has to be defensible when questioned.
Field | Citation Required |
|---|---|
Asking price | Page, paragraph, exact phrasing |
In-place rent | Source table, row, column |
Proforma assumption | Section, paragraph, narrative excerpt |
Risk factor | Section, paragraph, source language |
Tenant credit | Source table or narrative claim |
Without citations, the extract is a set of values that look authoritative but are not auditable. With citations, an analyst reviewing the extract can verify any field in seconds. A principal questioning a number in screening can see the source without opening the PDF.
Confidence Scores and Exception Routing
Extraction is not perfect. The system has to know when it is uncertain and route those fields to human review. The confidence score is the mechanism.
A field extracted from a clearly labeled table at 98% confidence does not need review. A field extracted from a narrative section at 72% confidence does. A firm that treats every extraction as authoritative will accumulate errors. A firm that reviews only the low-confidence extractions will catch most errors with a fraction of the review time.
Confidence Tier | Treatment |
|---|---|
High (95%+) | Auto-accept, no review |
Medium (80-95%) | Review on request or for material fields |
Low (below 80%) | Mandatory review before screening decision |
The tiering allows the firm to scale review effort with deal volume without losing accuracy. The principal sees a queue of deals where the high-confidence extractions are ready to score and the low-confidence ones are flagged for the analyst.
What "Done" Looks Like
A complete OM extract meets the following criteria:
Every screening-relevant field is populated, not just the headline fields.
Narrative sections are parsed for embedded structured data, not just stored as text.
Every field carries a citation back to the source page and language.
Every field carries a confidence score.
Asset-class-specific fields are configurable without losing comparability.
Low-confidence fields route to human review automatically.
If the principal still has to open the PDF to make a screening decision, the extract is incomplete.
Conclusion
The OM contains far more than firms typically extract from it, and most of what gets discarded is exactly what the screening decision actually needs. A summary extract is faster than reading the OM and worse than not extracting at all, because it produces a false sense of structure around a small subset of fields. A complete extract turns the OM into a record that can be queried, scored, and audited. The work shifts from reading every OM to verifying the extractions that need verification, which is the only version of this workflow that scales with the volume of deal flow a serious firm sees.