# metaplay-gameserver

The `metaplay-gameserver` chart is used to deploy a Metaplay game backend to
a Kubernetes cluster.

## Requirements

* Helm 3.7 or later
* AWS EKS 1.20 or later
* `metaplay-services` chart (or similar tooling) for the cluster:
  * `external-dns` for management of DNS naming
  * `ingress-nginx` for management of Ingress endpoints

### Compatibility

Details about the compatibility of the chart with different Metaplay SDK releases and infrastructure versions can be found from the [Metaplay Compatibility Matrix](https://docs.metaplay.io/miscellaneous/sdk-updates/compatibility.html).

## Installing the chart

The chart can be installed with Helm:

```sh
$ helm install \
    --repo "https://charts.metaplay.dev/" \
    --version "0.0.8" \
    --set image.repository="333344445555.dkr.ecr.eu-west-1.amazonaws.com/metaplay-idler-develop-server" \
    --set image.tag="v0.0.1" \
    --namespace "idler-develop" \
    idler-develop metaplay-gameserver
```

The configuration section has details on the supported parameters.

## Uninstalling the chart

The chart can be uninstalled with Helm:

```sh
helm delete --namespace "idler-develop" idler-develop
```

## Defining game endpoints

The externally available game endpoints are defined through the `.service`
configuration:

```yaml
service:
  enabled: true
  ports:
  - port: 9339
    name: game-default
    tls: true
  - port: 1234
    name: game-without-tls
    tls: false
  tls:
    enabled: true
```

### IPv6

To allow IPv6 endpoints to be provided for the game servers, you must meet the
following requirements:

* Infrastructure must be using at `infra-modules` v0.1.2 or later.
* The deployed infrastructure stack must be running metaplay-operator v0.0.5
  or later.

If you meet these requirements, you can enable IPv6 endpoints by making sure
that you have NLB load balancers enabled (these are enabled by default) using
the `service.loadbalancerType` parameter.

Additionally you must toggle on the experimental IPv6 dualstack support with
the `experimental.gameServerIpv6Enabled` parameter.

Starting from chart v0.3.0, IPv6 endpoints are enabled by default and can be
toggled with the `service.ipv6Enabled` parameter.

When enabled, the chart will inject additional annotations for the game server
Service resource, which allows the metaplay-operator to identify the load
balancer and switch it to dualstack mode. If you have given a hostname to the
load balancer with `service.hostname`, the value of the hostname will be
adjusted with the `-ipv6` suffix at the end of the first part of the hostname
(e.g. a hostname of `idler-develop.d1.metaplay.io` would be transposed to an
IPv6 endpoint name of `idler-develop-ipv6.d1.metaplay.io`).

## Shards

Game servers consist of multiple shards, which can be configured to handle
certain types of entities. The two basic setups are either to run a game server
as a single shard setup or as a multi-shard setup. For a single shard setup,
this can be configured as follows:

```yaml
shards:
- name: all
  singleton: true
  requests:
    cpu: 100m
    memory: 200Mi
```

To split entities across different shards, consult [Specifying a Custom Cluster Topology](https://docs.metaplay.io/game-server-programming/how-to-guides/configuring-cluster-topology.html#specifying-a-custom-cluster-topology) in Metaplay Documentation. For example, the default topology can be achieved with:

```yaml
shards:
- name: service
  nodeCount: 1
  admin: true
  requests:
    cpu: 1000m
    memory: 2000Mi
- name: logic
  nodeCount: 2
  connection: true
  requests:
    cpu: 1000m
    memory: 2000Mi
```

### Shards on dedicated cluster nodes

Shards can be scheduled on dedicated underlying Kubernetes node pools and nodes
by configuring `dedicatedShardNodes` parameter. This will configure the specific
shards to be deployed with Kubernetes taints and tolerations to schedule the
game server shard on specific nodes.

For example, changing the above configuration as follows:

```yaml
dedicatedShardNodes: true
shards:
- name: service
  ...
- name: logic
  ...
```

This will mean that `service` pods will be configured:

* Node selector tries to find `metaplay.io/pool-type: shard-service`
* Tolerations will tolerate:
  * Key: `metaplay.io/shard-type`
  * Operator: `Equal`
  * Value: `shard-service`
  * Effect: `NoSchedule`

For more details on Kubernetes taints and tolerations, please check the
[Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/).

## Admin access

An administrator dashboard can be enabled using the `admin.enabled` parameter.
If enabled, the `hostname` value will be used to determine an admin hostname
to use for the endpoint (typically by adding a `-admin` suffix to the first
part of the hostname, e.g. `idler-develop.p1.metaplay.io` would have an admin
endpoint at `idler-develop-admin.p1.metaplay.io`). `admin.hostname` can be also
configured to override the hostname.

By default no authentication is configured for the endpoint, so it is publicly
fully open. This is dangerous and you should configure authentication for this
endpoint. Please see the sub-section below.

### Authentication

Authentication to the game server admin endpoints and observability tools (Grafana)
is handled by the infrastructure, namely StackAPI, load balancer and nginx ingress.
The game server and LiveOps Dashboard have further permission checks based on the
roles that the user has.

For Metaplay managed environments, the only available authentication method is
Metaplay Auth which is automatically configured. For self-hosted stacks, the
configuration is done via terraform when provisioning the environments.

## API access

Starting from chart v0.3.0, `api` configuration block is deprecated and removed.
From now on machine users interacting with the game server API must use the same
game server admin Ingress as the LiveOps dashboard users.

You can access the game server API under the `/api/` path of the admin dashboard
FQDN (e.g. `https://idler-develop-admin.p1.metaplay.io/api/`).

As API access is often programmatic and done by machines, by default the access
control allows JWT bearer tokens to be used when accessing the API. These
bearer tokens must be issued by the same identity source as the admin users.

### Additional public endpoints

If you wish to publish other endpoints from the game servers, e.g. for serving
publicly accessible web hooks or similar other use cases, we provide currently
an experimental Helm value parameter to allow you to expose additional HTTP
endpoints and target them towards specific shards.

At present to configure e.g. a webhook endpoint for `idler-develop`, we could
do:

```yaml
publicApiEndpoints:
- hostname: idler-develop-webhook.d1.metaplay.io
  target:
    root: "/webhook/"
```

The above will register the DNS entry provided by hostname and rewrite all
requests to the address to go towards the server's AdminApi entity and force
the path to be rewritten to contain `/webhook/` as the prefix.

If you were to run an alternative entity kind to provide the endpoint, you
can adjust the endpoint targeting to point specifically to the shard that is
running the entity kind, e.g.:

```yaml
shards:
- name: service
experimental:
  publicApiEndpoints:
  - hostname: idler-develop-webhook.d1.metaplay.io
    target:
      shard: service
      root: "/webhook/"
```

Please note that there is currently a limitation of only being able to target
shards on port 80/tcp. At present this means that if you intend to run your
endpoint on a different entity than the AdminApi, you must take care to run the
entity on a shard that is not running AdminApi. This also complicates singleton
cases.

## Logging and monitoring

By default game servers log to stdout and you can use cluster-level logging
solutions to capture those logs.

If you do not have access to such tooling, you can enable tooling baked into
this chart by enabling Grafana, Prometheus, Loki, fluentd and fluent-bit. This
set of tooling can seem fairly heavy, but the minimal set of configurations
needed would be:

```yaml
logging:
  file:
    enabled: true
  fluentbit:
    enabled: true
  fluentd:
    enabled: true
grafana:
  enabled: true
prometheus:
  enabled: true
loki:
  enabled: true
```

The above will achieve the following:

* `logging.file.enabled` will configure the game server to also log data to
  files within the log.
* `logging.fluentbit.enabled` will enable a sidecar container inside the game
  server pod, which will monitor the file-based logging and push logs forwards.
* `logging.fluentd.enabled` will enable a centralized fluentd setup, which will
  capture the logs captured by fluent-bit from the game server.
* `loki.enabled` will enable Loki and allow fluentd to push logs to it.
* `grafana.enabled` will enable Grafana as a UI for logs and monitoring data.
* `prometheus.enabled` will enable Prometheus for collecting local game data.

Grafana is the main component which provides a UI in this stack, and it is
exposed via the same path as the admin dashboard. Enabling Grafana will deploy
an additional `/grafana` path to the admin ingress and can be accessed via
http(s)://{{ .admin.hostname }}/grafana/
(e.g. <https://idler-develop-admin.d1.metaplay.io/grafana/>).

If enabling the observability tools, you should only do so in environments that
require authentication. This will also protect Grafana and ensure that unwanted
parties are unable to access it. For self-hosted environments, this can be done
from the terraform configs for the environment.

### Loki and log persistence

If Loki is enabled alongside the game server, by default a persistent volume is
created by the Loki subchart, which is used for persisting Loki data. If the
game server is run for extended periods of time, the volume may overflow. In
this case it may be advisable to configure Loki to retain only a certain amount
of logs. This can be done by adjusting Helm values by adding appropriate
`chunk_store_config` and `retention_period` parameters. These values should be
a multiple of the index period. For example, retaining a week of logs can be
achieved with the additional Helm values:

```yaml
loki:
  config:
    chunk_store_config:
      max_look_back_period: 168h
    table_manager:
      retention_deletes_enabled: true
      retention_period: 168h
```

In the event that the game server is intended to run for longer periods of time
and long log retention is desired, it is advisable to create separate, more
robust storage targets (e.g. S3 buckets) and configure Loki appropriately. More
details can be found from the [Loki documentation](https://grafana.com/docs/loki/latest/configuration/)
on [storage configs](https://grafana.com/docs/loki/latest/configuration/#storage_config).

### Structured logs

It is possible to use the chart to instruct the server shards to output JSON
logs with the `logging.file.type` parameter set to `json`, which is the default
value for the chart.

Toggling the `logging.file.type` will alter the behavior of the side-car
fluent-bit container, which is deployed alongside every game server shard pod.
If the output is defined to be `text`, fluent-bit will attempt to parse the
output against a standard Metaplay text log format and extract items like the
log level before forwarding logs onwards.

### Sending log data to other destinations

By default this chart allows forwarding of log data to the Loki instance
provided by the chart (if Loki is enabled). You can extend fluentd to send logs
to other destinations as well using the `logging.fluentd.additionalStores`
parameter.

The `logging.fluentd.additionalStores` is a string of fluentd configurations,
specifically the `<store>` sections of the `copy` plugin. The detailed
documentation is available on the
[fluentd copy documentation page](https://docs.fluentd.org/output/copy).

As a practical example, if you wanted to persist your logs also to an S3 bucket,
you could use the following value:

```yaml
logging:
  fluentd:
    additionalStores: |
      <store>
        @type s3

        aws_key_id YOUR_AWS_KEY_ID
        aws_sec_eky YOUR_AWS_SECRET_KEY
        s3_bucket YOUR_S3_BUCKET
        s3_region YOUR_S3_REGION
        path logs/

        <buffer>
          @type file
          path /var/log/s3
          timekey 3600
          timekey_wait 10m
          chunk_limit_size 20m
        </buffer>
      </store>
```

## Tweaking Kernel parameters

Kernel parameters for the underlying system running the game server can be
tweaked using the `securityContext.sysctls` array. Please refer to the page
[Using sysctls in a Kubernetes Cluster](https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/)
for more details.

The `securityContext.sysctls` parameter is an array of objects with `name` and
`value` keys. Both must be strings.

By default only safe kernel parameters are allowed (see above link for details).
If you need access to more powerful kernel parameters, you should request your
infrastructure administrator to configure `kubelet` to allow the parameters you
need.

## Debugging flags

Various debug settings are exposed via `debug.*` parameter. These are intended
as temporary switches that may cause instability, lowered performance and other
undesired properties. They should are enabled as needed and disabled when
unneeded.

### Running `linux-perf`

Recording samples with `linux-perf` requires elevated privileges and a mapping
file for resolving JIT-generated code back to the original function names.
These can be enabled via `debug.enablePerfTools`. Note that this does not
install `linux-perf` in the image or in the runtime container.

```yaml
debug:
  enablePerfTools: true
```

## RBAC

The RBAC support can be enabled in a tenant namespace with the following configuration:

```yaml
rbac:
  serviceAccount:
    enabled: true # toggle which controls whether to enable ServiceAccount for gameserver
    create: true # toggle that controls the creation of SA.
    name: "gameserver" # SA name, with the default value of "gameserver"
    annotations: {} # additional annotations to the SA
  role:
    create: true # toggle which controls whether create K8S Role/RoleBinding in tenant namespace
    name: gameserver # name for Role/RoleBinding, with the default value of "gameserver"
    annotations: {} # additional annotations to Role/RoleBinding
```

Please note that if you have set both `rbac.serviceAccount.enabled = true` and `rbac.serviceAccount.create = false`, you must pass in a valid name to the `rbac.serviceAccount.name`. Such configuration is useful when a ServiceAccount has been created outside metaplay-gameserver Helm chart, e.g., a ServiceAccount created manually or with Terraform.

## Configuration

| Parameter | Description | Default |
|-----------|-------------|---------|
| `environment` | Environment type | `""` |
| `environmentFamily` | Environment family (must be one of `Development`, `Staging`, `Production`), used by the game server to determine which default value sets to use | |
| `hostname` | Hostname to be used for different endpoints | |
| `image.repository` | Image repository of game server image | `""` |
| `image.tag` | Image tag of the game server image | `""` |
| `image.pullSecrets` | Pull secret name, if needed | `""` |
| `image.pullPolicy` | Pull policy for server image | `IfNotPresent` |
| `config.infra.secretName` | Preferred Secret name to use for Metaplay config | `metaplay-config` |
| `config.infra.secretKeyName` | Key name in `secretName` with Metaplay config JSON file | `metaplay-config.json` |
| `config.files` | List of file paths to load into game server | `["./Config/AWSConfig.json"]` |
| `config.extraEnv` | Additional environment variables to pass to game server shards (must adhere to K8s environment value configurations) | |
| `experimental` | Miscellaneous experimental features | |
| `database.backend` | Type of database backend (`Sqlite` or `MySql`) | `MySql` |
| `rbac.serviceAccount.enabled` | Enable game server service account| `true` |
| `rbac.serviceAccount.create` | Create a service account for game server shards | `false` |
| `rbac.serviceAccount.name` | Name to give to the service account | |
| `rbac.serviceAccount.annotations` | Additional annotations for the service account | |
| `dedicatedShardNodes` | Does cluster have dedicated shards (if enabled, use tolerations to allow for placement of game server pods) | `false` |
| `shards` | Shard configurations (see section above) | |
| `debug` | Debug switches (see section above) | |
| `logging.file.enabled` | Enable game server logging to file | `false` |
| `logging.file.type` | Log format to use (`text` or `json`) | `json` |
| `logging.file.sizeLimit` | Log file size limit before rotation (in bytes) | `104857600` (100 MB) |
| `logging.file.retainCount` | Number of log files to retain | `10` |
| `logging.fluentbit.enabled` | Enable fluent-bit sidecar for game server log sending | `true` |
| `logging.fluentbit.image.repository` | fluent-bit repository name | `fluent/fluent-bit` |
| `logging.fluentbit.image.tag` | fluent-bit tag name | `1.4` |
| `logging.fluentbit.image.pullPolicy` | fluent-bit image pull policy | `IfNotPresent` |
| `logging.fluentbit.resources.limits.cpu` | CPU limits for fluent-bit | |
| `logging.fluentbit.resources.limits.memory` | Memory limits for fluent-bit | |
| `logging.fluentbit.resources.requests.cpu` | CPU requests for fluent-bit | |
| `logging.fluentbit.resources.requests.memory` | Memory requests for fluent-bit | |
| `logging.fluentd.enabled` | Enable fluentd as aggregator for fluent-bit logs | `true` |
| `logging.fluentd.image.repository` | fluentd repository name | `metaplay/metaplay-fluentd` |
| `logging.fluentd.image.tag` | fluentd tag name | `v0.0.3` |
| `logging.fluentd.image.pullPolicy` | fluentd image pull policy | `IfNotPresent` |
| `logging.fluentd.additionalStores` | Additional fluentd copy store definitions (see examples above) | |
| `service.enabled` | Enable game server public endpoint | `true` |
| `service.hostname` | Hostname to use external-dns to configure for service endpoint | |
| `service.annotations` | Additional annotations to provide to game server service endpoint | |
| `service.loadbalancerType` | AWS load balancer type to use for service endpoint | `nlb` |
| `service.externalTrafficPolicy` | Game server external load balancer traffic policy type | `Cluster` |
| `service.ipv6Enabled` | Enable IPv6 | `true` |
| `service.tls.enabled` | Enable TLS certificate termination for service | `false` |
| `service.tls.sslCertArn` | AWS ACM certificate ARN to use for service TLS termination (must be provided if `service.tls.enabled` is true) | |
| `admin.enabled` | Enable game server admin endpoint | `false` |
| `admin.hostname` | Hostname to use external-dns to configure for admin endpoint | |
| `admin.tls.enabled` | Enable TLS certification termination for adming endpoint | `false` |
| `admin.tls.sslCertArn` | AWS ACM certificate ARN to use for admin TLS termination | |
| `publicApiEndpoints` | Public API endpoints for game servers (see section above) | |
| `grafana.enabled` | Enable game server specific Grafana | `false` |
| `grafana.*` | Other Grafana configurarations as supported by the [Grafana Helm chart](https://github.com/helm/charts/tree/master/stable/grafana) (please note that `values.yaml` contains a sane set for configuring Grafana for game server level monitoring) | |
| `prometheus.enabled` | Enable game server specific Prometheus | `false` |
| `prometheus.*` | Other Prometheus configurations as supported by the [Prometheus Helm chart](https://github.com/helm/charts/tree/master/stable/prometheus) (please note that `values.yaml` contains a sane set for configuring Prometheus for game server level monitoring) | |
| `loki.enabled` | Enable game server specific Loki | `false` |
| `loki.*` | Other Loki configurations as supported by the [Loki Helm chart](https://github.com/grafana/loki/tree/master/production/helm/loki) (please note that `values.yaml` contains a sane set for configuring Loki for game server level monitoring) | |
