pyspark.sql.DataFrameWriter.csv

DataFrameWriter.csv(path: str, mode: Optional[str] = None, compression: Optional[str] = None, sep: Optional[str] = None, quote: Optional[str] = None, escape: Optional[str] = None, header: Union[bool, str, None] = None, nullValue: Optional[str] = None, escapeQuotes: Union[bool, str, None] = None, quoteAll: Union[bool, str, None] = None, dateFormat: Optional[str] = None, timestampFormat: Optional[str] = None, ignoreLeadingWhiteSpace: Union[bool, str, None] = None, ignoreTrailingWhiteSpace: Union[bool, str, None] = None, charToEscapeQuoteEscaping: Optional[str] = None, encoding: Optional[str] = None, emptyValue: Optional[str] = None, lineSep: Optional[str] = None) → None[source]

Saves the content of the DataFrame in CSV format at the specified path.

New in version 2.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
pathstr

the path in any Hadoop supported file system

modestr, optional

specifies the behavior of the save operation when data already exists.

  • append: Append contents of this DataFrame to existing data.

  • overwrite: Overwrite existing data.

  • ignore: Silently ignore this operation if data already exists.

  • error or errorifexists (default case): Throw an exception if data already

    exists.

Other Parameters
Extra options

For the extra options, refer to Data Source Option for the version you use.

Examples

Write a DataFrame into a CSV file and read it back.

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as d:
...     # Write a DataFrame into a CSV file
...     df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}])
...     df.write.csv(d, mode="overwrite")
...
...     # Read the CSV file as a DataFrame with 'nullValue' option set to 'Hyukjin Kwon'.
...     spark.read.schema(df.schema).format("csv").option(
...         "nullValue", "Hyukjin Kwon").load(d).show()
+---+----+
|age|name|
+---+----+
|100|NULL|
+---+----+