%iKnow.Queries.SentenceAPI
Class %iKnow.Queries.SentenceAPI Extends %iKnow.Queries.AbstractAPI [ Deprecated, System = 4 ]
The InterSystems IRIS NLP iKnow technology is now deprecated. Please see the product documentation for more detail.
Main Query API class to retrieve sentence information.
Parameters
GetPartsRT
Parameter GetPartsRT = "entOccId:%Integer,entUniId:%Integer,literal:%String,role:%Integer,stemUniId:%Integer";
GetBySourceRT
Parameter GetBySourceRT = "sentId:%Integer,sentenceValue:%String,sentenceIsTruncated:%Boolean";
GetByEntitiesRT
Parameter GetByEntitiesRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
GetByEntityIdsRT
Parameter GetByEntityIdsRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
GetByCrcsRT
Parameter GetByCrcsRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
GetByCrcIdsRT
Parameter GetByCrcIdsRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
GetByCrcMaskRT
Parameter GetByCrcMaskRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
GetByPathIdsRT
Parameter GetByPathIdsRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
GetNewBySourceRT
Parameter GetNewBySourceRT = "sentId:%Integer,sentenceValue:%String,score:%Numeric";
GetHighlightedEXP
Parameter GetHighlightedEXP [ Internal ] = 0;
GetAttributesRT
Parameter GetAttributesRT = "attTypeId:%Integer,attType:%String,start:%Integer,span:%Integer,wordPositions:%String,properties:%String,level:%Integer";
Methods
GetValue
ClassMethod GetValue(domainid As %Integer, sentenceid As %Integer, Output fullSentence As %Boolean = 1, vSrcId As %Integer = 0) As %String(MAXLEN=32767)
This method rebuilds a sentence based on the literals and entities it is composed of.
The string returned is the first part, up to the maximum string length, whereas the output parameter fullSentence is an array containing all the parts in the right order, containing a %Boolean value at the top level indicating whether the returned string is the full sentence (1) or (if 0) the user should have to look into this array to learn the full sentence.
If a Virtual Source ID is specified, the sentence ID is treated as a virtual one, in the context of the supplied vSrcId.
GetSourceId
ClassMethod GetSourceId(domainId As %Integer, sentenceId As %Integer) As %Integer
Returns the source ID in which the supplied sentence ID occurs
GetPosition
ClassMethod GetPosition(domainId As %Integer, sentenceId As %Integer, vSrcId As %Integer = 0) As %Integer
Returns the position within the source this sentence occurs at (1-based).
GetPartLiteral
ClassMethod GetPartLiteral(domainId As %Integer, sentenceId As %Integer, position As %Integer, vSrcId As %Integer = 0) As %String
Returns the literal of the entity or nonrelevant at the specified position.
GetLanguage
ClassMethod GetLanguage(domainid As %Integer, sentenceid As %Integer, Output confidence As %Numeric = "", vSrcId As %Integer = 0) As %String
Retrieves the language of the given sentence, as derived by the Automatic Language Identification algorithm or, if ALI was disabled, the language specified when indexing this sentence.
The confidence level is returned as well through an output parameter. If the confidence level is 0, this means ALI was not used and the language was defined by the user loading the source.
If a Virtual Source ID is specified, the sentence ID is treated as a virtual one, in the context of the supplied vSrcId.
GetParts
ClassMethod GetParts(ByRef result, domainid As %Integer, sentenceid As %Integer, includeCRCMarkers As %Boolean = 0, includePathMarkers As %Boolean = 0, vSrcId As %Integer = 0) As %Status
Returns the elements (concepts, relations and nonrelevants) that make up the sentence, optional including markers for the beginning and end of any CRCs or Paths in the sentence. This information can be used to display the sentence value (see also GetValue) and/or highlight specific elements of interest.
Output structure:
result(pos) = $lb(entOccId, entUniId, entity, role)
when includeCRCMarkers = 1, adds
result(pos, [CRCHEAD | CRCRELATION | CRCTAIL]) = $lb(crcOccId, crcUniId)
when includePathMarkers = 1, adds
result(pos, [PATHBEGIN | PATHEND]) = $lb(pathId)
Note: the subscript levels for CRC and Path markers are not available in the QAPI and WSAPI versions of this query.
If a Virtual Source ID is specified, the sentence ID is treated as a virtual one, in the context of the supplied vSrcId.
GetBySource
ClassMethod GetBySource(ByRef result, domainid As %Integer, sourceid As %Integer, page As %Integer = 1, pagesize As %Integer = 10) As %Status
Returns the sentences for the given source. A negative source ID is interpreted as a Virtual Source.
GetCountByDomain
ClassMethod GetCountByDomain(domainid As %Integer, filter As %iKnow.Filters.Filter = "", Output sc As %Status = {$$$OK}) As %Integer
Returns the total number of sentences for a given domain, optionally filtered to those sources satisfying a %iKnow.Filters.Filter object passed in through filter.
GetCountBySource
ClassMethod GetCountBySource(domainid As %Integer, sourceidlist As %List, Output sc As %Status = {$$$OK}) As %Integer
Returns the total number of sentences for the given sources. Negative Source IDs are interpreted as referring to Virtual Sources.
GetByEntities
ClassMethod GetByEntities(ByRef result, domainid As %Integer, entitylist As %List, filter As %iKnow.Filters.Filter = "", page As %Integer = 1, pagesize As %Integer = 10, setop As %Integer = {$$$UNION}, pActualFormOnly As %Boolean = 0) As %Status
This method will retrieve all sentences containing any (if setop = $$$UNION) or all (if setop = $$$INTERSECT) of the entities supplied through entitylist, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
If stemming is enabled for this domain through $$$IKPSTEMMING, sentences containing any actual form of the entities in entityList will be returned. Use pActualFormOnly=1 to retrieve only those sentences containing the actual forms in entitylist. This argument is ignored if stemming is not enabled.
GetByEntityIds
ClassMethod GetByEntityIds(ByRef result, domainid As %Integer, entityidlist As %List, filter As %iKnow.Filters.Filter = "", page As %Integer = 1, pagesize As %Integer = 10, setop As %Integer = {$$$UNION}, pActualFormOnly As %Boolean = 0) As %Status
Retrieves all sentences containing the given entity IDs., optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer. In this case, entityidlist is expected to contain virtual Entity IDs.
See also GetByEntities for a description of the parameters.
GetByEntitiesInternal
ClassMethod GetByEntitiesInternal(ByRef result, domainid As %Integer, ByRef entitylist, filter As %iKnow.Filters.Filter, page As %Integer, pagesize As %Integer, setop As %Integer) As %Status [ Internal ]
GetCountByEntities
ClassMethod GetCountByEntities(domainid As %Integer, entitylist As %List, filter As %iKnow.Filters.Filter = "", setop As %Integer = {$$$UNION}, Output sc As %Status = {$$$OK}, pActualFormOnly As %Boolean = 0) As %Integer
Retrieves the number of sentences containing the given entities, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
See also GetByEntities for a description of the parameters.
GetCountByEntityIds
ClassMethod GetCountByEntityIds(domainid As %Integer, entityidlist As %List, filter As %iKnow.Filters.Filter = "", setop As %Integer = {$$$UNION}, Output sc As %Status = {$$$OK}, pActualFormOnly As %Boolean = 0) As %Integer
Retrieves the nubmer of sentences containing the given entity ids. For querying Virtual Sources, set filter to a single, negative integer. In this case, entityidlist is expected to contain virtual Entity IDs.
See also GetByEntities for a description of the parameters.
If stemming is enabled for this domain through $$$IKPSTEMMING, sources containing any actual form of the entities in entityidlist will be returned. Use pActualFormOnly=1 to retrieve only those sources containing the actual forms in entityidlist. This argument is ignored if stemming is not enabled.
GetCountByEntitiesInternal
ClassMethod GetCountByEntitiesInternal(domainid As %Integer, ByRef entitylist, filter As %iKnow.Filters.Filter, setop As %Integer, Output sc As %Status = {$$$OK}) As %Integer [ Internal ]
GetByCrcs
ClassMethod GetByCrcs(ByRef result, domainid As %Integer, crclist As %List, filter As %iKnow.Filters.Filter = "", page As %Integer = 1, pagesize As %Integer = 10, setop As %Integer = {$$$UNION}) As %Status
Retrieves all sentences containing the given CRCs, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
See also GetByEntities for a description of the parameters.
GetByCrcIds
ClassMethod GetByCrcIds(ByRef result, domainid As %Integer, crcidlist As %List, filter As %iKnow.Filters.Filter = "", page As %Integer = 1, pagesize As %Integer = 10, setop As %Integer = {$$$UNION}) As %Status
Retrieves all sentences containing the given CRC ids, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer. In this case, crcidlist is expected to contain virtual Entity IDs.
See also GetByEntities for a description of the parameters.
GetByCrcMask
ClassMethod GetByCrcMask(ByRef result, domainid As %Integer, head As %String = {$$$WILDCARD}, relation As %String = {$$$WILDCARD}, tail As %String = {$$$WILDCARD}, filter As %iKnow.Filters.Filter = "", page As %Integer = 1, pagesize As %Integer = 10, setop As %Integer = {$$$UNION}, pActualFormOnly As %Boolean = 0) As %Status
Retrieves all sentences containing a CRC satisfying the given CRC Mask, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
See also GetByEntities for a description of the parameters.
GetByCrcsInternal
ClassMethod GetByCrcsInternal(ByRef result, domainid As %Integer, crcIdsGlob As %String, filter As %iKnow.Filters.Filter = "", page As %Integer, pagesize As %Integer, setop As %Integer) As %Status [ Internal ]
GetCountByCrcs
ClassMethod GetCountByCrcs(domainid As %Integer, crclist As %List, filter As %iKnow.Filters.Filter = "", setop As %Integer = {$$$UNION}, Output sc As %Status = {$$$OK}) As %Integer
Retrieves the number of sentences containing the given CRCs, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
See also GetByEntities for a description of the parameters.
GetCountByCrcIds
ClassMethod GetCountByCrcIds(domainid As %Integer, crcidlist As %List, filter As %iKnow.Filters.Filter = "", setop As %Integer = {$$$UNION}, Output sc As %Status = {$$$OK}) As %Integer
Retrieves the number of sentences containing the given CRC ids, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer. In this case, crcidlist is expected to contain virtual Entity IDs.
See also GetByEntities for a description of the parameters.
GetCountByCrcMask
ClassMethod GetCountByCrcMask(domainid As %Integer, head As %String = {$$$WILDCARD}, relation As %String = {$$$WILDCARD}, tail As %String = {$$$WILDCARD}, filter As %iKnow.Filters.Filter = "", setop As %Integer = {$$$UNION}, Output sc As %Status = {$$$OK}, pActualFormOnly As %Boolean = 0) As %Integer
Retrieves the number of sentences containing a CRC satisfying the given CRC Mask, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
See also GetByEntities for a description of the parameters.
GetCountByCrcsInternal
ClassMethod GetCountByCrcsInternal(domainid As %Integer, crcIdsGlob As %String, filter As %iKnow.Filters.Filter = "", setop As %Integer, Output sc As %Status = {$$$OK}) As %Integer [ Internal ]
GetByPathIds
ClassMethod GetByPathIds(ByRef result, domainid As %Integer, pathidlist As %List, sourceidlist As %List, page As %Integer = 1, pagesize As %Integer = 10) As %Status
Retrieves all sentences containing the given path IDs.
See also GetByEntities for a description of the parameters.
GetByPathsInternal
ClassMethod GetByPathsInternal(ByRef result, domainid As %Integer, ByRef pathlist, sourceidlist As %List, page As %Integer, pagesize As %Integer) As %Status [ Internal ]
GetCountByPathIds
ClassMethod GetCountByPathIds(domainid As %Integer, pathidlist As %List, sourceidlist As %List, Output sc As %Status = {$$$OK}) As %Integer
Retrieves the number of sentences containing the given path IDs.
See also GetByEntities for a description of the parameters.
GetCountByPathsInternal
ClassMethod GetCountByPathsInternal(domainid As %Integer, ByRef pathlist, sourceidlist As %List, Output sc As %Status = {$$$OK}) As %Integer [ Internal ]
GetNewBySource
ClassMethod GetNewBySource(ByRef result, domainid As %Integer, sourceid As %Integer, length As %Integer = 5, filter As %iKnow.Filters.Filter = "", algorithm As %String = {$$$NEWENTSIMPLE}, algorithmParams As %List = "", newEntitiesWindow As %Integer = 100, skipListIds As %List = "") As %Status
Retrieves the sentences with the most significant concepts compared to the rest of the domain (or optionally a subset thereof as filtered through filter). This array of sentences is based on results of the GetNewBySource query in %iKnow.Queries.EntityAPI, using the supplied algorithm and parameter values. The scores of the first [newEntitiesWindow] concepts are aggregated across sentences to produce the result of this query.
Please refer to the documentation of the GetNewBySource query in %iKnow.Queries.EntityAPI for more details on the parameters and available algorithms.
GetHighlighted
ClassMethod GetHighlighted(pDomainId As %Integer, pSentenceId As %Integer, ByRef pHighlight = "", vSrcId As %Integer = 0, Output pFullSentence = "", Output pSC As %Status = {$$$OK}, pEscapeHTML As %Boolean = 1) As %String
Highlighting
This is a flexible method to highlight specific elements within a sentence using user-supplied markup passed in through the pHighlight argument (by reference) in a multidimensional form: set pHighlight("FLAG") = "markup" set pHighlight("FLAG", id) = "markup"
The first option will highlight any element of the type identified by "FLAG", the second option allows refining this to a particular instance, identified by id, overriding any eventual definitions at the generic "FLAG" level.
Note: unless explicitly stated otherwise, all highlighting is based on the entity level.
Markup options
Any single (opening) HTML tag can be specified on the value side of pHighlight and will automatically be wrapped around every entity. The closing tag will be automatically derived from the opening tag supplied through pHighlight
HTML markup supplied this way supports a basic means of annotating with metadata about the particular thing being highlighted. Any occurrences of "$$$ID" in the HTML tag will be substituted with the relevant identifier of what's being highlighted, such as entity IDs for entity markup, CRC IDs for CRC markup or match IDs for dictionary matching markup. Most entity-level markup also supports the $$$LITERAL tag to replace with the original text string for that entity.
For example, the following highlight spec would add links to an info page that takes entity IDs as a URL parameter:
Note that in some cases, such as dictionary matches, ther may be multiple IDs associated with the same highlighted entity. These will be provided as a comma-separated list replacing the $$$ID placeholder.
As an alternative to HTML markup, you can also supply two-character strings that will be used to wrap entities that need highlighting. For example, this array will put square brackets around all concepts and curly braces around relationships:
` set tHighlight("ROLE", "concept") = "[]" set tHighlight("ROLE", "relation") = "{}"`
### Highlighting specific entities, CRCs and paths
To highlight all occurrences of a particular entity, stem, CRC, CC or path, use the corresponding flag. For entities, you can also supply the string value (except when the string value is an integer number itself). ```` set tHighlight("ENTITY", 123) = "**" set tHighlight("ENTITY", "snow storm") = "" set tHighlight("STEM", 234) = "" set tHighlight("CRC", 345) = "" set tHighlight("PATH", 456) = ""**
**### Highlighting based on role**
**The "ROLE" flag can be used to mark concepts, relations and non-relevants, either by using the corresponding integer code (i.e. $$$ENTTYPECONCEPT) or a simple string value. Note that in some cases, some words inside a relationship entity may be marked as non-relevant. These will be highlighted at the word level (only if there is a specific highlighting spec for non-relevants) and are an exception to the general rule that all highlighting happens at the entity level.**
**``` set tHighlight("ROLE", "concept") = "" set tHighlight("ROLE", "relation") = "" set tHighlight("ROLE", "non-relevant") = "()" set tString = "The newspaper published the article and it sold very well." write $system.iKnow.Highlight(tString, .tHighlight)**
**The above example would print:**
**> (The) <c>newspaper</c> <r>published</r> (the) <c>article</c> <r>and (it) sold very well</r>.**
**### Highlighting based on attributes**
**Attributes can be highlighted at two levels. Using the regular "ATTRIBUTE" flag will highlight all entities affected by the attribute specified by attribute ID (such as $$$IKATTNEGATION). However, some attributes support more fine-grained annotation at the word level, marking those words that actually caused the attribute to apply to an entity or part of a path. These can be highlighted individually through the "ATTRIBUTEWORDS" flag and are an exception to the general rule that highlighting happens per-entity.**
**`` set tHighlight("ATTRIBUTE", $$$IKATTNEGATION) = "" set tHighlight("ATTRIBUTEWORDS", $$$IKATTNEGATION) = "" set tString = "The landlord doesn't accept late payments, but makes exceptions for students." write $system.iKnow.Highlight(tString, .tHighlight)**
**The above example would display as:**
**> The landlord doesn't accept late payments, but makes exceptions for students.**
**For some attributes, such as certain expressions of measurements, the engine is able to extract additional data elements which are exposed as "measurement properties". You can include these properties in your highlighted text by including the $$$PROPERTIES placeholder. Note that these are typically identified for the word level.**
**### Highlighting based on matching results**
**Dictionary matches can be highlighted using the "MATCH" flag, optionally restricted to a particular dictionary ID. To refine to a particular dictionary item, use the "MATCHITEM" flag. Highlighting can further be refined to distinguish based on full or partial matches using the "FULL" and "PARTIAL" flags as an additional subscript. Please note this is a refinement and the parent node (ID-specific or generic) should contain a value:**
**Additional information about the matches themselves is available through the metadata rewrite mechanism: $$$TERM, $$$TERMID, $$$ITEM, $$$ITEMID, $$$ITEMURI, $$$DICT, $$$DICTID. Note that the regular $$$ID markers will be replaced with dictionary match IDs, not the IDs of the Dictionary or Dictionary Items.**
**` set tHighlight("MATCH") = "` `` ```** ````
" set tHighlight("MATCH", "FULL") = ""
### Highlighting based on character position
If external tooling provided annotations based on character positions, use the "CHARS" flag to highlight those annotations by providing the start and end positions as second and third subscripts of the highlight spec array. This will highlight the entities "covering" these start and end positions, starting with the entity which includes the character at the designated start position and ending with the entity including the character at the designated end position.
` set tHighlight("CHARS", 13, 21) = "`" set tHighlight("CHARS", 71, 75) = "" set tString = "The instant Project X party was not well-received by the cummunity of Haren in the Netherlands." write $system.iKnow.Highlight(tString, .tHighlight)
The above example will annotate the entire entities "instant Project X party" and "Haren".
Note that the iKnow indexing engine in certain cases may modify input text while processing text and therefore, character position based informations from external sources that based themselves on the original text, may no longer point to the expected positions. The two most important cases where this can happen is when User Dictionaries are used to rewrite the input explicitly or when duplicate whitespace is normalized by the engine. To work around this issue, present the output of the iKnow engine (as retrieved through GetValue to these external tools to be sure the same normalizations are applied.
In cases where the externally provided character positions span more than a single sentence, you can pass an offset as the data element of the main "CHARS" node to mark the character position that corresponds the start of this sentence. This should be easier than recalculating all character positions and allows you to reuse the entire array for successive calls to GetHighlighted.
### Style precedence
For the purpose of HTML styling precedence, this is the order in which tags are wrapped around entities, from innermost to outermost:
1. ATTRIBUTEWORDS (wrapped around individual words)
2. ATTRIBUTE - ID-specific (attribute type ID)
3. ATTRIBUTE - generic
4. ENTITY - ID-specific
5. STEM - ID-specific
6. CRC - ID-specific
7. CC - ID-specific
8. MATCHITEM - ID-specific (dictionary item ID)
9. MATCH - ID-specific (dictionary ID)
10. MATCHITEM - generic
11. MATCH - generic
12. PATH - ID-specific
13. ROLE - ID-specific (role)
14. CHARS
GetAttributes
ClassMethod GetAttributes(ByRef pResult, pDomainId As %Integer, pSentId As %Integer, vSrcId As %Integer = 0, pIncludePathAttributes As %Boolean = 0) As %Status
Returns all attributes for a given sentence. By default, only entity-level attributes are returned, with the wordPositions result column referring which words within the affected entities are actually attributed. Using pIncludePathAttributes, also path-level attributes (such as implied negation) can be returned, but these will have no values for the wordPositions column. Also note that the start and span columns for path-level results will refer to positions within those paths and not entity positions within the sentence. See also GetAttributes in %iKnow.Queries.PathAPI and GetOccurrenceAttributes in %iKnow.Queries.EntityAPI.
Any named attribute properties are also included through sub-nodes (not available through SQL or SOAP):
pResult(rowNumber, propertyName) = propertyValue
The returned wordPositions apply to the entities starting from start up to offset and only extend to the last attributed word position (there might be more words within the entity).