SQA Higher Haggis Context Free Syntax

SQA Higher Haggis Context Free Syntax
Richard Connor∗
January 7, 2014
1
Introduction
Please note that at time of writing (January 7, 2014) this document is still
a draft and should not be taken as authoritative.
This formal specification goes together with a syntax checker available
here1 . The syntax checker there is implemented directly from, and versionstepped with, this definition, and should therefore be completely consistent.
This document gives lexical rules and a context-free syntax defined in
BNF. A context-sensitive syntax and operational semantics are defined separately2 .
As the current specification of Haggis (both in SQA documentation and
as defined by Michaelson and Cutts3 ) is strictly defined as a pseudocode
language, some minor restrictions have had to be made to the language in
order to render it suitable for formal specification. These should not impact
negatively on any pseudocode use, whilst also allowing mechanised syntax
checking.
2
Lexical Rules and Microsyntax
Reserved words (keywords) are those defined as terminal symbols in any version of the Haggis language. No reserved word may be used as an identifier,
∗
mailto:[email protected]
http://haggis4sqa.appspot.com/haggisParser.html?version=higher
2
work in progress. . .
3
http://haggis4sqa.appspot.com/haggisDocs/Haggis_2_2.pdf
1
1
even those which are not defined in this version. All reserved words are in
uppercase; it is therefore safest, and good practice, not to use uppercase for
user-defined identifiers.
The boolean literal values true and false , and the operator mod , may
not be used as user-defined identifiers.
hIdentifieris must start with an uppercase or lowercase Roman letter, and
may be followed by any contiguous sequence of letters, digits, underscores,
hyphens and full stops.
hIntegerLiteralsis comprise any sequence of digits; leading zeros are allowed.
hFloatLiteralis comprise two hIntegerLiteralis separated by a full stop.
hBooleanLiteralis comprise the characters sequences true and false .
hStringLiteralis comprise any sequence of Unicode characters4 , on a single line, between two inverted commas. In this context, “inverted comma”
means the ascii character number 22 (Unicode 0022), the character which
is typically entered into a browser text area when the inverted comma key
is pressed. Other Unicode characters for inverted commas, notably Unicode
characters 201C and 201D often used by word processors for left and right
inverted commas respectively (“smart quotes”), may not be used, but may
be included in string literals.
Comments are allowed within a single line only, starting with # . This
character, and any following it, are disregarded up until the end of the line
which contains it. Comments starting with an asterisk, i.e. #∗ , are generated by the checking system and, whilst valid comments, will be removed
when the program is checked via the implemented interface.
An elision, consisting of any characters (other than > ) contained within
< · · · > , within a single line, may be used in place of any command or toplevel expression. Elision may not be used as a sub-expression: for example
SET x TO < elision >
is allowed, but
SET x TO 3 + < elision >
is not, to avoid potential confusion with the < and > operators. Also for this
reason, and to promote good layout practice, there is a further lexical rule
4
Inverted commas and backslashes can be included using \ as an “escape” character
i.e. \” and \\ respectively.
2
that infix operators must be placed on the same line as their first operand,
although the second operand may appear after a line break.
Unicode characters, such as 6=, are not yet allowed in this system5 to ensure maximum compatibility. For this reason the checking system is defined
in terms of ascii characters, so for example the symbol pair ! = is used in
place of 6=, >= in place of ≥ etc.
3
Context Free Syntax
A standard pure BNF notation is used here; note that there is no use of
meta-symbols other than ::= within productions, so any character appearing
outside angle brackets is a terminal symbol of the language.
3.1
Command Sequences
A program is a sequence of commands.
hProgrami
hSequencei
hSequencei
::=
::=
::=
hSequencei
hCommandi
hCommandi; hSequencei
A lexical rule allows the elision of the semicolon in any context where this
coincides with a new line, and thus semicolons are only actually required
when more than one command is joined on the same line. It is expected that
most programs will not contain semicolons, and this character is used mostly
for convenience in the formal specification.
5
although they are a part of the language definition
3
3.2
Commands
hCommandi
hCommandi
hCommandi
hCommandi
hCommandi
hCommandi
hCommandi
hCommandi
::=
::=
::=
::=
::=
::=
::=
::=
hAssignmenti
hConditionali
hRepetitioni
hIterationi
hInputOutputCommandi
hSubprogramDeclarationi
hProcedureCalli
An empty command is allowed, inductively allowing an empty sequence.
3.2.1
Assignment
hAssignmenti
::=
SET hLocationi TO hExpressioni
Assignment may be made to any location, where a hLocationi is either a
locally declared variable or an element of an array. This syntax is overloaded
for both the introduction of new variables, and the updating of existing
variables. When it is used to introduce a new variable, hLocationi should be
an identifier6 .
3.2.2
Conditional Execution
hConditionali
hCondBodyi
hCondBodyi
3.2.3
IF hExpressionihCondBodyi END IF
THEN hSequencei
THEN hSequencei ELSE hSequencei
Repetition
hRepetitioni
hRepetitioni
6
::=
::=
::=
::=
::=
WHILE hExpressioni DO hSequencei END WHILE
REPEAT hSequencei UNTIL hExpressioni END REPEAT
this is not formally defined in the context-free syntax
4
3.2.4
Iteration
hIterationi
hIterationi
hRangei
hIterationi
hIterationi
3.2.5
::=
::=
::=
::=
::=
REPEAT hExpressioni TIMES hSequencei END REPEAT
FOR hIdentifierihRangei DO hSequencei END FOR
FROM hExpressioni TO hExpressioni
FOR EACH hIdentifieri FROM hExpressioni DO hSequencei END FOR EACH
FOREACH hIdentifieri FROM hExpressioni DO hSequencei END FOREACH
Input and Output
hInputOutputCommandi
hInputOutputCommandi
hInputOutputCommandi
hInputi
hOutputi
hFileCommandi
hFileCommandi
hFileCommandi
3.2.6
::=
::=
::=
::=
::=
::=
::=
::=
hInputi
hOutputi
hFileCommandi
RECEIVE hLocationi FROM ( hTypei ) hExpressioni
SEND hExpressioni TO hExpressioni
OPEN hExpressioni
CREATE hExpressioni
CLOSE hExpressioni
Subprogram Declaration
hSubprogramDeclarationi
hSubprogramDeclarationi
hProcedurei
hFunctioni
::=
::=
::=
::=
hProcedurei
hFunctioni
PROCEDURE hFormalParametersihProcedureBodyi END PROCEDURE
hTypei FUNCTION hFormalParametersihFunctionBodyi END FUNCTION
5
hFormalParametersi
hFormalParametersi
::=
::=
( )
( hFormalParameterListi )
hFormalParameterListi
hFormalParameterListi
hFormalParameteri
hFormalParameteri
::=
::=
::=
::=
hFormalParameteri
hFormalParameteri , hFormalParameterListi
hIdentifieri
hTypeihIdentifieri
hProcedureBodyi
hFunctionBodyi
::=
::=
hSequencei
hSequencei; RETURN hExpressioni
3.2.7
Procedure Call
hProcedureCalli
hActualParametersi
hActualParametersi
hActualParameterListi
hActualParameterListi
3.3
hIdentifierihActualParametersi
( )
( hActualParameterListi )
hExpressioni
hExpressioni , hActualParameterListi
::=
::=
hSequencei of hTypei
hBaseTypei
Types
hTypei
hTypei
3.3.1
::=
::=
::=
::=
::=
Structured Types
hSequencei
hTypei
::=
::=
6
ARRAY
STRING
3.3.2
Base Types
hBaseTypei
hBaseTypei
hBaseTypei
hBaseTypei
3.4
::=
::=
::=
::=
INTEGER
REAL
BOOLEAN
CHARACTER
Expressions
Two versions of expression syntax are given here. First, an ambiguous, easyto-read version is given. Second, for the cognoscenti (and to guarantee a
faithful parser implementation!) a disambiguated version, effectively defining
operator precedence for the first version, is also given. In most cases, the
ambiguous version along with well-established precedence rules is sufficient
to understand the language, but the latter version should be read as the
formal definition of the language.
3.4.1
Ambiguous Expressions
hExpressioni
hExpressioni
hExpressioni
hExpressioni
hExpressioni
3.4.2
::=
::=
::=
::=
::=
hExpressionihExpOpihExpressioni
hExpressionihRelOpihExpressioni
hExpressionihMultOpihExpressioni
hExpressionihAddOpihExpressioni
hBaseExpressioni
Disambiguated Expressions
hExpressioni
hExpressioni
hExpression1i
hExpression1i
hExpression2i
hExpression2i
hExpression3i
hExpression3i
::=
::=
::=
::=
::=
::=
::=
::=
hExpression1i
hExpression1ihRelOpihExpression1i
hExpression2i
hExpression2ihAddOpihExpression2i
hExpression3i
hExpression3ihMultOpihExpression3i
hBaseExpressioni
hBaseExpressionihExpOpihBaseExpressioni
7
3.4.3
Operators
hLogicalOpi
hLogicalOpi
::=
::=
AND
OR
hRelOpi
hRelOpi
hRelOpi
hRelOpi
hRelOpi
hRelOpi
::=
::=
::=
::=
::=
::=
=
6=
<
≤
>
≥
hAddOpi
hAddOpi
::=
::=
+
−
hMultOpi
hMultOpi
hMultOpi
hMultOpi
::=
::=
::=
::=
∗
/
mod
&
hExpOpi
::=
ˆ
8
3.4.4
Base Expressions
hBaseExpressioni
hBaseExpressioni
hBaseExpressioni
hBaseExpressioni
hBaseExpressioni
hBaseExpressioni
hBaseExpressioni
hBaseExpressioni
hBaseExpressioni
hBaseExpressioni
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
hUnaryOpExpi
hLocationi
hFunctionCalli
( hExpressioni )
hSequenceLiterali
hIntegerLiterali
hFloatLiterali
hStringLiterali
hBooleanLiterali
hKeywordLiterali
hUnaryOpExpi
hUnaryOpExpi
::=
::=
− hBaseExpressioni
NOT hBaseExpressioni
hLocationi
hLocationi
hArrayDereferencei
::=
::=
::=
hIdentifieri
hArrayDereferencei
hExpressioni [ hExpressioni ]
hFunctionCalli
::=
hIdentifierihActualParametersi
hSequenceLiterali
hSequenceLiterali
hValueListi
hValueListi
::=
::=
::=
::=
[ ]
[ hValueListi ]
hExpressioni
hExpressioni , hValueListi
hKeywordLiterali
hKeywordLiterali
::=
::=
KEYBOARD
DISPLAY
9