Compare commits

...

7 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
7bc2d0ca11
Fix spelling: multibyte -> multi-byte in Unicode test subdescription
Agent-Logs-Url: https://github.com/mikefarah/yq/sessions/6e7b304b-5b52-4e89-8bad-ba22813305c7

Co-authored-by: mikefarah <1151925+mikefarah@users.noreply.github.com>
2026-04-06 08:51:11 +00:00
copilot-swe-agent[bot]
adb150d1fb
S2-S4: tighten lexer condition, fix doc header, add Unicode example
Agent-Logs-Url: https://github.com/mikefarah/yq/sessions/ec06083e-e20a-45d2-bf7e-4e1fa7be1073

Co-authored-by: mikefarah <1151925+mikefarah@users.noreply.github.com>
2026-04-06 08:45:00 +00:00
copilot-swe-agent[bot]
1afa3c73dc Merge master: use clampSliceIndex in sliceStringNode, resolve conflicts
Co-authored-by: mikefarah <1151925+mikefarah@users.noreply.github.com>
2026-04-06 08:44:35 +00:00
Jan Dubois
b0ba9589d7
Fix findInArray misuse on MappingNodes in equality and contains (#2645)
recurseNodeObjectEqual and containsObject both used findInArray to
locate keys in a MappingNode's Content array. findInArray steps by 1,
so it matches against both keys (even indices) and values (odd indices).

In recurseNodeObjectEqual, when a null key in the LHS matched a null
value in the RHS at the last position, rhs.Content[indexInRHS+1]
accessed an out-of-bounds index, causing a panic.

In containsObject, a %2 guard prevented the panic but introduced false
negatives: when a null value appeared before the actual null key,
findInArray returned the value's odd index, the guard rejected it, and
the function reported the key as missing.

Both functions now use findKeyInMap, which steps by 2 and compares only
key positions. The %2 guard in containsObject is removed.

Reproducer for the panic (recurseNodeObjectEqual):

    echo '? [{~: ~}]
    : v1
    ? [{2: ~}]
    : v2' | yq '. += .'

Reproducer for the false negative (containsObject):

    printf '? 1\n: ~\n? ~\n: x\n' | yq 'contains({~: "x"})'

Found by OSS-Fuzz via the lima project's FuzzEvaluateExpression target.
https://issues.oss-fuzz.com/issues/383860504

Signed-off-by: Jan Dubois <jan@jandubois.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:30:44 +10:00
Jan Dubois
80139ae1cc
Fix panic on negative slice indices that underflow after adjustment (#2646)
sliceArrayOperator adjusts negative indices by adding Content length,
but does not clamp the result. When the absolute value of a negative
index exceeds Content length (e.g. .[-99999:3] on a 3-element array),
the adjusted index remains negative and causes an out-of-bounds access
in the Content slice loop.

Extract the adjust-and-clamp logic into clampSliceIndex and use it for
both index positions.

Reproducer (panics before this fix, returns full array after):

    echo '[a, b, c]' | yq '.[-99999:3]'

Found by OSS-Fuzz via the lima project's FuzzEvaluateExpression target.
https://issues.oss-fuzz.com/issues/438776028

Signed-off-by: Jan Dubois <jan@jandubois.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:27:02 +10:00
Jan Dubois
0374ad6b4b
Fix stack overflow from circular alias in traverse (#2647)
go-yaml accepts cross-document alias references, which the YAML spec
forbids (anchors are scoped to a single document). When a nested
assignment targets such an alias, UpdateFrom copies the Alias field
between nodes, creating a self-referencing AliasNode. Both traverse()
and traverseArrayIndices() then follow this cycle indefinitely.

Extract resolveAliasChain(), which follows aliases iteratively with a
visited set and returns an error on cycles. Both traverse() and
traverseArrayIndices() now call it, eliminating the recursive alias
handling in both code paths.

Note: traverseMergeAnchor() also dereferences aliases (lines 358 and
371) but with single-step assignment, not recursion. A self-referencing
alias there falls through the kind switch silently rather than
crashing. Using resolveAliasChain() in that function would produce a
clear error instead of silently dropping the node.

Reproducer (stack overflow before this fix, returns error after):

    echo '&-- a
    ---
    *--' | yq eval-all '. = (.x = 1)'

Found by OSS-Fuzz via the lima project's FuzzEvaluateExpression target.
https://issues.oss-fuzz.com/issues/390467412

Signed-off-by: Jan Dubois <jan@jandubois.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:25:13 +10:00
Jan Dubois
2ef934281e
Fix panic and OOM in repeatString for large repeat counts (#2644)
The existing check (count > 10 million) does not account for string
length. A 68-byte string repeated 35 trillion times passes the count
check but panics in strings.Repeat with "makeslice: len out of range".
Smaller counts (e.g. 10 million * 6-byte string = 60 MB) cause OOM on
memory-constrained environments like OSS-Fuzz (2560 MB limit).

Replace the count-only check with a result size check: the product of
string length and repeat count must not exceed 10 MiB. Use division
(len > limit/count) instead of multiplication (len*count > limit) to
avoid integer overflow — a large count can wrap the product to a
negative value, bypassing the guard entirely.

Fixes at least four OSS-Fuzz bugs found via Lima's FuzzEvaluateExpression:
  https://issues.oss-fuzz.com/issues/418818862 (makeslice overflow)
  https://issues.oss-fuzz.com/issues/422001683 (timeout from huge alloc)
  https://issues.oss-fuzz.com/issues/383195001 (OOM, 3 GB allocation)
  https://issues.oss-fuzz.com/issues/385180606 (OOM, 97 TB allocation)

Signed-off-by: Jan Dubois <jan@jandubois.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:22:46 +10:00
14 changed files with 235 additions and 70 deletions

View File

@ -1,4 +1,4 @@
# Slice/Splice Array or String
# Slice Array or String
The slice operator works on both arrays and strings. Like the `jq` equivalent, `.[10:15]` will return a subarray (or substring) of length 5, starting from index 10 inclusive, up to index 15 exclusive. Negative numbers count backwards from the end of the array or string.

View File

@ -1,4 +1,4 @@
# Slice/Splice Array or String
# Slice Array or String
The slice operator works on both arrays and strings. Like the `jq` equivalent, `.[10:15]` will return a subarray (or substring) of length 5, starting from index 10 inclusive, up to index 15 exclusive. Negative numbers count backwards from the end of the array or string.
@ -165,3 +165,19 @@ will output
ralia
```
## Slicing strings - Unicode
Indices are rune-based, so multi-byte characters are handled correctly
Given a sample.yml file of:
```yaml
greeting: héllo
```
then
```bash
yq '.greeting[1:3]' sample.yml
```
will output
```yaml
él
```

View File

@ -127,10 +127,15 @@ func handleToken(tokens []*token, index int, postProcessedTokens []*token) (toke
if tokenIsOpType(currentToken, createMapOpType) {
log.Debugf("tokenIsOpType: createMapOpType")
// check the previous token is '[', means we are slice, but dont have a first number
if index > 0 && (tokens[index-1].TokenType == traverseArrayCollect || tokens[index-1].TokenType == openCollect) {
if index > 0 && tokens[index-1].TokenType == traverseArrayCollect {
log.Debugf("previous token is : traverseArrayOpType")
// need to put the number 0 before this token, as that is implied
postProcessedTokens = append(postProcessedTokens, &token{TokenType: operationToken, Operation: createValueOperation(0, "0")})
} else if index >= 2 && tokens[index-1].TokenType == openCollect &&
(tokens[index-2].TokenType == operationToken || tokens[index-2].TokenType == closeCollect || tokens[index-2].TokenType == closeCollectObject) {
log.Debugf("previous token is : openCollect following a traversal, implying 0 start")
// need to put the number 0 before this token, as that is implied
postProcessedTokens = append(postProcessedTokens, &token{TokenType: operationToken, Operation: createValueOperation(0, "0")})
}
}

View File

@ -80,7 +80,7 @@ func recurseNodeObjectEqual(lhs *CandidateNode, rhs *CandidateNode) bool {
key := lhs.Content[index]
value := lhs.Content[index+1]
indexInRHS := findInArray(rhs, key)
indexInRHS := findKeyInMap(rhs, key)
if indexInRHS == -1 || !recursiveNodeEqual(value, rhs.Content[indexInRHS+1]) {
return false

View File

@ -300,6 +300,24 @@ func TestRecurseNodeObjectEqual(t *testing.T) {
test.AssertResult(t, true, recurseNodeObjectEqual(obj1, obj2))
test.AssertResult(t, false, recurseNodeObjectEqual(obj1, obj3))
test.AssertResult(t, false, recurseNodeObjectEqual(obj1, obj4))
// A null key must not match a null value in the other map.
// Regression test for https://issues.oss-fuzz.com/issues/383860504
nullKey := &CandidateNode{Kind: ScalarNode, Tag: "!!null"}
nullVal := &CandidateNode{Kind: ScalarNode, Tag: "!!null"}
intKey := createScalarNode(2, "2")
intKey.Tag = "!!int"
intVal := &CandidateNode{Kind: ScalarNode, Tag: "!!null"}
mapWithNullKey := &CandidateNode{
Kind: MappingNode,
Content: []*CandidateNode{nullKey, nullVal},
}
mapWithIntKey := &CandidateNode{
Kind: MappingNode,
Content: []*CandidateNode{intKey, intVal},
}
test.AssertResult(t, false, recurseNodeObjectEqual(mapWithNullKey, mapWithIntKey))
}
func TestParseInt(t *testing.T) {

View File

@ -527,6 +527,18 @@ var addOperatorScenarios = []expressionScenario{
expression: `.a += [2]`,
expectedError: "!!seq () cannot be added to a !!str (a)",
},
{
// Regression test for https://issues.oss-fuzz.com/issues/383860504
// Adding a map to itself must not panic when sequence keys contain
// single-entry mappings with a null key in one and a non-null key
// in the other.
skipDoc: true,
document: "? [{~: ~}]\n: v1\n? [{2: ~}]\n: v2",
expression: `. += .`,
expected: []string{
"D0, P[], (!!map)::? [{~: ~}]\n: v1\n? [{2: ~}]\n: v2\n",
},
},
}
func TestAddOperatorScenarios(t *testing.T) {

View File

@ -46,9 +46,9 @@ func containsObject(lhs *CandidateNode, rhs *CandidateNode) (bool, error) {
rhsKey := rhs.Content[index]
rhsValue := rhs.Content[index+1]
log.Debugf("Looking for %v in the lhs", rhsKey.Value)
lhsKeyIndex := findInArray(lhs, rhsKey)
lhsKeyIndex := findKeyInMap(lhs, rhsKey)
log.Debugf("index is %v", lhsKeyIndex)
if lhsKeyIndex < 0 || lhsKeyIndex%2 != 0 {
if lhsKeyIndex < 0 {
return false, nil
}
lhsValue := lhs.Content[lhsKeyIndex+1]

View File

@ -65,6 +65,16 @@ var containsOperatorScenarios = []expressionScenario{
"D0, P[], (!!bool)::false\n",
},
},
{
// Regression: findInArray could match a null key against a null
// value at an earlier odd index, producing a false negative.
skipDoc: true,
document: "? 1\n: ~\n? ~\n: x",
expression: `contains({~: "x"})`,
expected: []string{
"D0, P[], (!!bool)::true\n",
},
},
{
description: "String contains substring",
document: `"foobar"`,

View File

@ -155,8 +155,10 @@ func repeatString(lhs *CandidateNode, rhs *CandidateNode) (*CandidateNode, error
return nil, err
} else if count < 0 {
return nil, fmt.Errorf("cannot repeat string by a negative number (%v)", count)
} else if count > 10000000 {
return nil, fmt.Errorf("cannot repeat string by more than 100 million (%v)", count)
}
maxResultLen := 10 * 1024 * 1024 // 10 MiB
if count > 0 && len(stringNode.Value) > maxResultLen/count {
return nil, fmt.Errorf("result of repeating string (%v bytes) by %v would exceed %v bytes", len(stringNode.Value), count, maxResultLen)
}
target.Value = strings.Repeat(stringNode.Value, count)

View File

@ -237,12 +237,11 @@ var multiplyOperatorScenarios = []expressionScenario{
expectedError: "cannot repeat string by a negative number (-4)",
},
{
description: "Multiply string X by more than 100 million",
// very large string.repeats causes a panic
description: "Multiply string by count that exceeds result size limit",
skipDoc: true,
document: `n: 100000001`,
expression: `"banana" * .n`,
expectedError: "cannot repeat string by more than 100 million (100000001)",
expectedError: "result of repeating string (6 bytes) by 100000001 would exceed 10485760 bytes",
},
{
description: "Multiply int node X string",
@ -693,6 +692,27 @@ var multiplyOperatorScenarios = []expressionScenario{
"D0, P[], (!!null)::null\n",
},
},
{
// Regression test for https://issues.oss-fuzz.com/issues/418818862
// Large repeat count with a long string must not panic.
skipDoc: true,
expression: `"abc" * 99999999`,
expectedError: "result of repeating string (3 bytes) by 99999999 would exceed 10485760 bytes",
},
{
// Regression test for https://issues.oss-fuzz.com/issues/383195001
// Product of string length * repeat count must be bounded.
skipDoc: true,
expression: `"x" * 99999999`,
expectedError: "result of repeating string (1 bytes) by 99999999 would exceed 10485760 bytes",
},
{
// The size guard must not overflow: len * count can wrap to
// a negative or small value on 64-bit, bypassing the check.
skipDoc: true,
expression: `"ab" * 4611686018427387904`,
expectedError: "result of repeating string (2 bytes) by 4611686018427387904 would exceed 10485760 bytes",
},
}
func TestMultiplyOperatorScenarios(t *testing.T) {

View File

@ -16,34 +16,33 @@ func getSliceNumber(d *dataTreeNavigator, context Context, node *CandidateNode,
return parseInt(result.MatchingNodes.Front().Value.(*CandidateNode).Value)
}
// clampSliceIndex resolves a possibly-negative slice index against
// length and clamps the result to [0, length].
func clampSliceIndex(index, length int) int {
if index < 0 {
index += length
}
if index < 0 {
return 0
}
if index > length {
return length
}
return index
}
func sliceStringNode(lhsNode *CandidateNode, firstNumber int, secondNumber int) *CandidateNode {
runes := []rune(lhsNode.Value)
length := len(runes)
relativeFirstNumber := firstNumber
if relativeFirstNumber < 0 {
relativeFirstNumber = length + firstNumber
}
if relativeFirstNumber < 0 {
relativeFirstNumber = 0
}
relativeSecondNumber := secondNumber
if relativeSecondNumber < 0 {
relativeSecondNumber = length + secondNumber
} else if relativeSecondNumber > length {
relativeSecondNumber = length
}
log.Debugf("sliceStringNode: slice from %v to %v", relativeFirstNumber, relativeSecondNumber)
if relativeFirstNumber > length {
relativeFirstNumber = length
}
relativeFirstNumber := clampSliceIndex(firstNumber, length)
relativeSecondNumber := clampSliceIndex(secondNumber, length)
if relativeSecondNumber < relativeFirstNumber {
relativeSecondNumber = relativeFirstNumber
}
log.Debugf("sliceStringNode: slice from %v to %v", relativeFirstNumber, relativeSecondNumber)
slicedString := string(runes[relativeFirstNumber:relativeSecondNumber])
replacement := lhsNode.CreateReplacement(ScalarNode, lhsNode.Tag, slicedString)
replacement.Style = lhsNode.Style
@ -76,28 +75,8 @@ func sliceArrayOperator(d *dataTreeNavigator, context Context, expressionNode *E
continue
}
relativeFirstNumber := firstNumber
if relativeFirstNumber < 0 {
relativeFirstNumber = len(lhsNode.Content) + firstNumber
}
if relativeFirstNumber < 0 {
relativeFirstNumber = 0
} else if relativeFirstNumber > len(lhsNode.Content) {
relativeFirstNumber = len(lhsNode.Content)
}
relativeSecondNumber := secondNumber
if relativeSecondNumber < 0 {
relativeSecondNumber = len(lhsNode.Content) + secondNumber
}
if relativeSecondNumber < 0 {
relativeSecondNumber = 0
} else if relativeSecondNumber > len(lhsNode.Content) {
relativeSecondNumber = len(lhsNode.Content)
}
if relativeSecondNumber < relativeFirstNumber {
relativeSecondNumber = relativeFirstNumber
}
relativeFirstNumber := clampSliceIndex(firstNumber, len(lhsNode.Content))
relativeSecondNumber := clampSliceIndex(secondNumber, len(lhsNode.Content))
log.Debugf("calculateIndicesToTraverse: slice from %v to %v", relativeFirstNumber, relativeSecondNumber)

View File

@ -99,17 +99,32 @@ var sliceArrayScenarios = []expressionScenario{
},
},
{
// Regression test for https://issues.oss-fuzz.com/issues/438776028
// Negative second index that underflows after adjustment must
// clamp to zero, yielding an empty sequence.
skipDoc: true,
document: `[cat, dog, frog]`,
expression: `.[-100:]`,
document: `[a, b, c]`,
expression: `.[0:-99999]`,
expected: []string{
"D0, P[], (!!seq)::- cat\n- dog\n- frog\n",
"D0, P[], (!!seq)::[]\n",
},
},
{
// First-index underflow: without clamping, the loop starts at a
// negative index and panics on Content access.
skipDoc: true,
document: `[cat, dog, frog]`,
expression: `.[:-100]`,
document: `[a, b, c]`,
expression: `.[-99999:3]`,
expected: []string{
"D0, P[], (!!seq)::- a\n- b\n- c\n",
},
},
{
// Both indices underflow: both clamp to zero, yielding an empty
// sequence.
skipDoc: true,
document: `[a, b, c]`,
expression: `.[-99999:-99998]`,
expected: []string{
"D0, P[], (!!seq)::[]\n",
},
@ -184,10 +199,10 @@ var sliceArrayScenarios = []expressionScenario{
},
},
{
skipDoc: true,
description: "Unicode string slicing",
document: `greeting: héllo`,
expression: `.greeting[1:3]`,
description: "Slicing strings - Unicode",
subdescription: "Indices are rune-based, so multi-byte characters are handled correctly",
document: `greeting: héllo`,
expression: `.greeting[1:3]`,
expected: []string{
"D0, P[greeting], (!!str)::él\n",
},

View File

@ -36,9 +36,33 @@ func traversePathOperator(_ *dataTreeNavigator, context Context, expressionNode
return context.ChildContext(matches), nil
}
// resolveAliasChain follows an alias chain iteratively, returning the
// first non-alias node. Returns an error if a cycle is detected.
func resolveAliasChain(node *CandidateNode) (*CandidateNode, error) {
if node.Kind != AliasNode {
return node, nil
}
visited := map[*CandidateNode]bool{}
for node.Kind == AliasNode {
if visited[node] {
return nil, fmt.Errorf("alias cycle detected")
}
visited[node] = true
log.Debug("its an alias!")
node = node.Alias
}
return node, nil
}
func traverse(context Context, matchingNode *CandidateNode, operation *Operation) (*list.List, error) {
log.Debugf("Traversing %v", NodeToString(matchingNode))
var err error
matchingNode, err = resolveAliasChain(matchingNode)
if err != nil {
return nil, err
}
if matchingNode.Tag == "!!null" && operation.Value != "[]" && !context.DontAutoCreate {
log.Debugf("Guessing kind")
// we must have added this automatically, lets guess what it should be now
@ -62,10 +86,6 @@ func traverse(context Context, matchingNode *CandidateNode, operation *Operation
log.Debugf("its a sequence of %v things!", len(matchingNode.Content))
return traverseArray(matchingNode, operation, operation.Preferences.(traversePreferences))
case AliasNode:
log.Debug("its an alias!")
matchingNode = matchingNode.Alias
return traverse(context, matchingNode, operation)
default:
return list.New(), nil
}
@ -129,7 +149,13 @@ func traverseNodesWithArrayIndices(context Context, indicesToTraverse []*Candida
return context.ChildContext(matchingNodeMap), nil
}
func traverseArrayIndices(context Context, matchingNode *CandidateNode, indicesToTraverse []*CandidateNode, prefs traversePreferences) (*list.List, error) { // call this if doc / alias like the other traverse
func traverseArrayIndices(context Context, matchingNode *CandidateNode, indicesToTraverse []*CandidateNode, prefs traversePreferences) (*list.List, error) {
var err error
matchingNode, err = resolveAliasChain(matchingNode)
if err != nil {
return nil, err
}
if matchingNode.Tag == "!!null" {
log.Debugf("OperatorArrayTraverse got a null - turning it into an empty array")
// auto vivification
@ -142,9 +168,6 @@ func traverseArrayIndices(context Context, matchingNode *CandidateNode, indicesT
}
switch matchingNode.Kind {
case AliasNode:
matchingNode = matchingNode.Alias
return traverseArrayIndices(context, matchingNode, indicesToTraverse, prefs)
case SequenceNode:
return traverseArrayWithIndices(matchingNode, indicesToTraverse, prefs)
case MappingNode:

View File

@ -665,6 +665,16 @@ var traversePathOperatorScenarios = []expressionScenario{
"D0, P[a], (!!null)::null\n",
},
},
{
// Regression test for https://issues.oss-fuzz.com/issues/390467412
// go-yaml accepts cross-document alias references (invalid per
// YAML spec). A nested assignment on such an alias can create a
// circular alias node, which must not cause a stack overflow.
skipDoc: true,
document: "&-- a\n---\n*--",
expression: ". = (.x = 1)",
expectedError: "alias cycle detected",
},
}
func TestTraversePathOperatorScenarios(t *testing.T) {
@ -682,3 +692,58 @@ func TestTraversePathOperatorAlignedToSpecScenarios(t *testing.T) {
appendOperatorDocumentScenario(t, "traverse-read", fixedTraversePathOperatorScenarios)
ConfiguredYamlPreferences.FixMergeAnchorToSpec = false
}
// Regression test for https://issues.oss-fuzz.com/issues/390467412
// A circular alias (alias pointing back to itself) must not cause a
// stack overflow. resolveAliasChain should detect the cycle and return
// an error; both traverse() and traverseArrayIndices() use it.
func TestTraverseAliasCycle(t *testing.T) {
aliasNode := &CandidateNode{
Kind: AliasNode,
}
aliasNode.Alias = aliasNode // A -> A
op := &Operation{
OperationType: traversePathOpType,
Value: "key",
StringValue: "key",
Preferences: traversePreferences{},
}
_, err := traverse(Context{}, aliasNode, op)
if err == nil {
t.Fatal("expected error for alias cycle, got nil")
}
if err.Error() != "alias cycle detected" {
t.Fatalf("expected 'alias cycle detected', got %q", err.Error())
}
// Same cycle must be caught through the array traversal path.
_, err = traverseArrayIndices(Context{}, aliasNode, nil, traversePreferences{})
if err == nil {
t.Fatal("expected error for alias cycle via traverseArrayIndices, got nil")
}
if err.Error() != "alias cycle detected" {
t.Fatalf("expected 'alias cycle detected', got %q", err.Error())
}
}
func TestTraverseAliasCycleChain(t *testing.T) {
nodeA := &CandidateNode{Kind: AliasNode}
nodeB := &CandidateNode{Kind: AliasNode}
nodeA.Alias = nodeB
nodeB.Alias = nodeA // A -> B -> A
op := &Operation{
OperationType: traversePathOpType,
Value: "key",
StringValue: "key",
Preferences: traversePreferences{},
}
_, err := traverse(Context{}, nodeA, op)
if err == nil {
t.Fatal("expected error for alias cycle chain, got nil")
}
if err.Error() != "alias cycle detected" {
t.Fatalf("expected 'alias cycle detected', got %q", err.Error())
}
}